LUQ API Documentation
Methods
BaseUQModel
Source code in luq/methods/base_uq_model.py
compute_sequence_probability(logprobs, seq_prob_mode=SeqProbMode.PROD)
Computes the probability of a response sequence from log-probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
logprobs
|
Tensor
|
A tensor containing log-probabilities of each token in the sequence. |
required |
seq_prob_mode
|
SeqProbMode
|
The method to compute the sequence probability. Options are SeqProbMode.PROD for product and SeqProbMode.AVG for average. Defaults to SeqProbMode.PROD. |
PROD
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The computed sequence probability. |
Raises:
Type | Description |
---|---|
ValueError
|
If an unknown |
Source code in luq/methods/base_uq_model.py
estimate_uncertainty(prompt, *args, **kwargs)
Estimates the uncertainty for a given prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt
|
str
|
The input prompt to estimate uncertainty for. |
required |
*args
|
Additional positional arguments. |
()
|
|
**kwargs
|
Additional keyword arguments. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The estimated uncertainty value. |
Raises:
Type | Description |
---|---|
NotImplementedError
|
This method must be implemented in a subclass. |
Source code in luq/methods/base_uq_model.py
normalize_sequence_probs(probs, tolerance=1e-09)
Normalizes a list of sequence probabilities so they sum to 1.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
probs
|
List[float]
|
A list of raw sequence probabilities. |
required |
tolerance
|
float
|
A small threshold below which the sum is considered zero to avoid division by zero. Defaults to 1e-9. |
1e-09
|
Returns:
Type | Description |
---|---|
List[float]
|
List[float]: A list of normalized probabilities summing to 1. |
Source code in luq/methods/base_uq_model.py
KernelLanguageEntropyEstimator
Bases: BaseUQModel
Source code in luq/methods/kernel_language_entropy.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
__init__()
compute_entropy(kernel, normalize=False)
Computes the von Neumann entropy of a given unit-trace kernel matrix (semantic kernel matrix).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kernel
|
Tensor
|
The kernel matrix. |
required |
normalize
|
bool
|
If True, normalize the kernel before computing entropy. Defaults to False. |
False
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The computed Kernel Language Entropy. |
Source code in luq/methods/kernel_language_entropy.py
estimate_uncertainty(samples, seq_prob_mode=SeqProbMode.PROD, kernel_type=KernelType.HEAT, nli_model=None, nli_table=None, construct_kernel=None, **kwargs)
Estimates uncertainty by computing the von Neumann entropy of a semantic similarity kernel.
One of nli_model
or nli_table
must be provided to compute the semantic similarity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples
|
LLMSamples
|
The language model samples to analyze. |
required |
seq_prob_mode
|
SeqProbMode
|
Mode for sequence probability aggregation. Defaults to SeqProbMode.PROD. |
PROD
|
kernel_type
|
KernelType
|
The predefined kernel type to use if |
HEAT
|
nli_model
|
NLIWrapper | None
|
A model for natural language inference. Defaults to None. |
None
|
nli_table
|
NLITable | None
|
A precomputed NLI similarity table. Defaults to None. |
None
|
construct_kernel
|
Callable | None
|
A custom kernel construction function. Defaults to None. |
None
|
**kwargs
|
Additional keyword arguments. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The estimated uncertainty value. |
Raises:
Type | Description |
---|---|
ValueError
|
If neither or both |
Source code in luq/methods/kernel_language_entropy.py
get_kernel(samples, kernel_type=None, construct_kernel=None, nli_model=None, nli_table=None)
Constructs a kernel matrix from language model samples.
Either kernel_type
or construct_kernel
must be provided, but not both.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples
|
LLMSamples
|
The language model samples. |
required |
kernel_type
|
KernelType | None
|
The predefined kernel type to use. Defaults to None. |
None
|
construct_kernel
|
Callable | None
|
A custom kernel construction function. Defaults to None. |
None
|
nli_model
|
NLIWrapper | None
|
A model for natural language inference. Defaults to None. |
None
|
nli_table
|
NLITable | None
|
A precomputed NLI similarity table. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: The normalized kernel matrix. |
Raises:
Type | Description |
---|---|
ValueError
|
If both or neither |
ValueError
|
If an unknown kernel type is specified. |
Source code in luq/methods/kernel_language_entropy.py
KernelType
Bases: Enum
Enumeration of supported kernel types.
Attributes:
Name | Type | Description |
---|---|---|
HEAT |
str
|
Heat kernel type. |
MATERN |
str
|
Matern kernel type. |
Source code in luq/methods/kernel_utils.py
LLMOutput
dataclass
Represents the output of a language model.
Attributes:
Name | Type | Description |
---|---|---|
answer |
str
|
The generated text answer from the language model. |
logprobs |
Tensor | None
|
Optional tensor containing the log probabilities associated with the generated tokens. |
Source code in luq/models/llm.py
LLMSamples
dataclass
Contains multiple samples generated by a language model along with metadata.
Attributes:
Name | Type | Description |
---|---|---|
samples |
List[LLMOutput]
|
A list of multiple LLMOutput samples. |
answer |
LLMOutput
|
The selected or final answer output. |
params |
Dict[str, Any]
|
Parameters used to generate the samples. |
Source code in luq/models/llm.py
__len__()
LLMWrapper
Source code in luq/models/llm.py
__call__(*args, **kwargs)
Abstract base wrapper for language model interfaces.
This class is meant to be subclassed to implement specific LLM calls.
Source code in luq/models/llm.py
MaxProbabilityEstimator
Bases: BaseUQModel
Uncertainty estimator that uses the probability of the most likely sequence.
This class estimates uncertainty by computing the probability of each sequence in a set of samples, and returning one minus the maximum probability, which serves as a measure of uncertainty.
Source code in luq/methods/max_probability.py
estimate_uncertainty(samples, seq_prob_mode=SeqProbMode.PROD, **kwargs)
Estimate uncertainty from a list of LLM output samples.
This method calculates the sequence probability for each sample using the specified
sequence probability mode and returns an uncertainty score equal to 1 - max(sequence_probs)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples
|
List[LLMOutput]
|
A list of language model outputs with associated log probabilities. |
required |
seq_prob_mode
|
SeqProbMode
|
Mode for aggregating token probabilities into
sequence probabilities (e.g., product or average). Defaults to |
PROD
|
**kwargs
|
Additional keyword arguments (unused here but kept for compatibility). |
{}
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
Uncertainty score, where higher values indicate more uncertainty. |
Source code in luq/methods/max_probability.py
NLIWrapper
Abstract wrapper class for Natural Language Inference (NLI) models.
Source code in luq/models/nli.py
__call__(*args, **kwargs)
Runs the NLI model on input arguments.
Returns:
Type | Description |
---|---|
List[NLIOutput]
|
List[NLIOutput]: A list of NLI model outputs. |
Raises:
Type | Description |
---|---|
NotImplementedError
|
If not implemented in a subclass. |
Source code in luq/models/nli.py
PredictiveEntropyEstimator
Bases: BaseUQModel
Source code in luq/methods/predictive_entropy.py
compute_entropy(sequence_probs)
Computes the entropy over a list of sequence probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sequence_probs
|
list or Tensor
|
List or tensor of sequence probabilities. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The entropy value computed from the normalized probability distribution. |
Source code in luq/methods/predictive_entropy.py
estimate_uncertainty(samples, seq_prob_mode=SeqProbMode.PROD, **kwargs)
Uncertainty is estimated by computing the entropy of probabilities obtained from sampled sequences.
:param prompt: The input prompt for LLM. :param seq_prob_mode: Describes how token probabilities are translated into sequence probabilities :return: entropy score
Source code in luq/methods/predictive_entropy.py
generate_logits(prompt, num_samples=10)
Generates multiple responses from the language model and extracts their logits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt
|
str
|
The input prompt for the language model. |
required |
num_samples
|
int
|
Number of samples to generate. Defaults to 10. |
10
|
Returns:
Name | Type | Description |
---|---|---|
List |
List
|
A list of logit sequences corresponding to the generated samples. |
Raises:
Type | Description |
---|---|
ValueError
|
If the internal language model is not an instance of LLMWrapper. |
Source code in luq/methods/predictive_entropy.py
SemanticEntropyEstimator
Bases: BaseUQModel
Source code in luq/methods/semantic_entropy.py
__init__()
compute_entropy(cluster_assignments, sequence_probs)
Computes entropy over semantic clusters.
Entropy is calculated either using: - Cluster sizes (discrete entropy), or - Weighted sequence probabilities assigned to clusters (continuous entropy).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cluster_assignments
|
List[int]
|
List mapping each response to a cluster ID. |
required |
sequence_probs
|
List[float] | None
|
List of sequence probabilities. If None, discrete entropy is computed based on cluster sizes. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
Entropy value representing semantic uncertainty. |
Source code in luq/methods/semantic_entropy.py
estimate_uncertainty(samples, seq_prob_mode=SeqProbMode.PROD, nli_model=None, nli_table=None, **kwargs)
Estimates uncertainty based on the semantic diversity of LLM responses.
Semantic uncertainty is computed by clustering responses into meaning-based groups using an NLI model or precomputed NLI table, and then calculating entropy across these clusters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples
|
LLMSamples
|
List of LLM responses containing text and log-probabilities. |
required |
seq_prob_mode
|
SeqProbMode
|
Defines how to compute sequence probabilities from token log-probabilities. Defaults to SeqProbMode.PROD. |
PROD
|
nli_model
|
NLIWrapper | None
|
NLI model used to compute entailment-based similarity. |
None
|
nli_table
|
NLITable | None
|
Precomputed NLI similarity table to avoid recomputation. |
None
|
**kwargs
|
Additional arguments for future extensibility. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
Estimated entropy based on semantic clustering. |
Raises:
Type | Description |
---|---|
ValueError
|
If neither or both of |
Source code in luq/methods/semantic_entropy.py
SeqProbMode
Bases: Enum
Enumeration for modes of combining token probabilities in a sequence.
Attributes:
Name | Type | Description |
---|---|---|
PROD |
Use the product of probabilities. |
|
AVG |
Use the average of probabilities. |
Source code in luq/utils/utils.py
TopKGapEstimator
Bases: BaseUQModel
Source code in luq/methods/top_k_gap.py
estimate_uncertainty(samples, seq_prob_mode=SeqProbMode.PROD, k=2, **kwargs)
Estimates uncertainty using the gap between the top-k sequence probabilities.
The method computes sequence-level probabilities from the sampled responses,
identifies the k
highest probabilities, and returns a normalized uncertainty
score as 1 - (gap between top-1 and top-k probabilities)
.
A smaller gap between top-k and top-1 implies higher uncertainty (less confident top choice), while a large gap suggests stronger model confidence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples
|
List[LLMOutput]
|
A list of LLM outputs containing log-probabilities. |
required |
seq_prob_mode
|
SeqProbMode
|
Method for combining token log-probabilities into
a single sequence probability. Defaults to |
PROD
|
k
|
int
|
The number of top probabilities to compare. Must be >= 2. Defaults to 2. |
2
|
**kwargs
|
Additional arguments for extensibility (unused). |
{}
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
A normalized uncertainty score based on the gap between top-1 and top-k probabilities. |
Raises:
Type | Description |
---|---|
ValueError
|
If |
AssertionError
|
If any sample does not contain log-probabilities. |
Source code in luq/methods/top_k_gap.py
construct_nli_table(samples, nli_model)
Constructs a table of NLI results for all pairs of generated samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples
|
LLMSamples
|
The generated language model outputs. |
required |
nli_model
|
NLIWrapper
|
An NLI model wrapper used to evaluate relationships between outputs. |
required |
Returns:
Name | Type | Description |
---|---|---|
NLITable |
NLITable
|
A dictionary mapping (answer1, answer2) pairs to NLIOutput results. |
Source code in luq/models/nli.py
entropy(probabilities)
Computes the entropy of a probability distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
probabilities
|
Union[List[float], Tensor]
|
A list or tensor of probabilities. The probabilities should sum to 1 and represent a valid distribution. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: The computed entropy as a scalar tensor value. |
Notes
Adds a small epsilon (1e-9) to probabilities to avoid log(0).
Source code in luq/utils/utils.py
hard_nli_clustering(samples, nli_table)
Performs hard clustering of samples based on mutual entailment using NLI results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples
|
LLMSamples
|
The list of LLM-generated samples. |
required |
nli_table
|
NLITable
|
A dictionary of NLI outputs between sample pairs. |
required |
Returns:
Type | Description |
---|---|
List[int]
|
List[int]: A list of cluster assignments (by index) for each sample. |
Source code in luq/models/nli.py
von_neumann_entropy(rho)
Compute the von Neumann entropy of a density matrix.
The von Neumann entropy is defined as
S(ρ) = -Tr(ρ log(ρ))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rho
|
Tensor
|
A Hermitian, positive semi-definite matrix (density matrix). |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: Scalar tensor representing the entropy. |
Source code in luq/methods/kernel_utils.py
Models
AzureCustomGPT4Wrapper
Wrapper for Azure-hosted GPT-4 model using OpenAI-compatible API.
Source code in luq/models/llm.py
__call__(input)
Generates a response using the Azure-hosted GPT-4 model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input
|
str
|
User input prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
LLMOutput |
LLMOutput
|
Generated answer and optional log probabilities. |
Source code in luq/models/llm.py
__init__(openai_endpoint_url, api_key)
Initializes the Azure GPT-4 wrapper.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
openai_endpoint_url
|
str
|
The base URL of the Azure OpenAI endpoint. |
required |
api_key
|
str
|
Azure API key for authentication. |
required |
Source code in luq/models/llm.py
BatchLLMWrapper
Abstract class for batch LLM interfaces that return multiple LLM outputs.
Source code in luq/models/llm.py
__call__(*args, **kwargs)
Generate multiple responses from an LLM.
Returns:
Type | Description |
---|---|
List[LLMOutput]
|
List[LLMOutput]: List of outputs from the model. |
Raises:
Type | Description |
---|---|
NotImplementedError
|
If not implemented in a subclass. |
Source code in luq/models/llm.py
ClaudeWrapper
Wrapper for Anthropic's Claude models.
Source code in luq/models/llm.py
__call__(prompt, model='claude-3-opus-20240229', temperature=1.0, max_tokens=1024)
Generates a response from Claude with optional parameters.
Note
Claude's API currently does not support returning log probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt
|
str
|
Input prompt for Claude. |
required |
model
|
str
|
Claude model to use. |
'claude-3-opus-20240229'
|
temperature
|
float
|
Sampling temperature. |
1.0
|
max_tokens
|
int
|
Maximum number of tokens to generate. |
1024
|
Returns:
Name | Type | Description |
---|---|---|
LLMOutput |
LLMOutput
|
Generated answer text. |
Source code in luq/models/llm.py
__init__(api_key)
Initializes the ClaudeWrapper.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
api_key
|
str
|
Anthropic API key. |
required |
HFLLMWrapper
Bases: LLMWrapper
Hugging Face LLM wrapper using a tokenizer and model from the transformers library.
Source code in luq/models/llm.py
__call__(prompt, temperature=1.0, max_new_tokens=1024, *args, **kwargs)
Generates a response from the Hugging Face model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt
|
str
|
The prompt to send to the model. |
required |
temperature
|
float
|
Sampling temperature. |
1.0
|
max_new_tokens
|
int
|
Maximum number of tokens to generate. |
1024
|
*args
|
Additional positional arguments. |
()
|
|
**kwargs
|
Additional keyword arguments. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
LLMOutput |
LLMOutput
|
The generated text and associated log probabilities. |
Source code in luq/models/llm.py
__init__(tokenizer, model)
Initializes the HFLLMWrapper with a tokenizer and model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tokenizer
|
AutoTokenizer
|
A Hugging Face tokenizer instance. |
required |
model
|
PreTrainedModel
|
A Hugging Face model instance. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the tokenizer or model is not a valid Hugging Face object. |
Source code in luq/models/llm.py
LLMOutput
dataclass
Represents the output of a language model.
Attributes:
Name | Type | Description |
---|---|---|
answer |
str
|
The generated text answer from the language model. |
logprobs |
Tensor | None
|
Optional tensor containing the log probabilities associated with the generated tokens. |
Source code in luq/models/llm.py
LLMSamples
dataclass
Contains multiple samples generated by a language model along with metadata.
Attributes:
Name | Type | Description |
---|---|---|
samples |
List[LLMOutput]
|
A list of multiple LLMOutput samples. |
answer |
LLMOutput
|
The selected or final answer output. |
params |
Dict[str, Any]
|
Parameters used to generate the samples. |
Source code in luq/models/llm.py
__len__()
LLMWrapper
Source code in luq/models/llm.py
__call__(*args, **kwargs)
Abstract base wrapper for language model interfaces.
This class is meant to be subclassed to implement specific LLM calls.
Source code in luq/models/llm.py
all_logits_present(samples)
Checks whether all LLM outputs contain log probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples
|
List[LLMOutput]
|
A list of LLM output samples. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if all samples include logprobs, False otherwise. |
Source code in luq/models/llm.py
generate_n_samples_and_answer(llm, prompt, temp_gen=1.0, temp_answer=0.1, top_p_gen=0.9, top_k_gen=16, top_p_ans=0.7, top_k_ans=4, n_samples=10)
Generates multiple LLM samples and a single final answer using specified parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
llm
|
LLMWrapper
|
The language model wrapper to use. |
required |
prompt
|
str
|
The prompt to pass to the LLM. |
required |
temp_gen
|
float
|
Temperature for generating samples. |
1.0
|
temp_answer
|
float
|
Temperature for the final answer. |
0.1
|
top_p_gen
|
float
|
Nucleus sampling parameter for generation. |
0.9
|
top_k_gen
|
int
|
Top-k sampling parameter for generation. |
16
|
top_p_ans
|
float
|
Nucleus sampling parameter for answer. |
0.7
|
top_k_ans
|
int
|
Top-k sampling parameter for answer. |
4
|
n_samples
|
int
|
Number of samples to generate. |
10
|
Returns:
Name | Type | Description |
---|---|---|
LLMSamples |
LLMSamples
|
A collection of generated samples, the final answer, and parameters used. |
Raises:
Type | Description |
---|---|
NotImplementedError
|
If |
Source code in luq/models/llm.py
update_base_url(request, openai_endpoint_url)
Modifies the base URL of an OpenAI request to route to a custom Azure endpoint.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
request
|
Request
|
The HTTPX request object to be modified. |
required |
openai_endpoint_url
|
str
|
The Azure OpenAI endpoint path to use. |
required |
Source code in luq/models/llm.py
Datasets
GenerationDataset
Bases: DatasetDict
Source code in luq/datasets/dataset.py
__init__(data_path=None, arrow_table=None)
Initializes the dataset object. :param data_path: Path to the JSON file containing the dataset.
Source code in luq/datasets/dataset.py
from_dataset(dataset)
classmethod
load_from_json(data_path)
staticmethod
Loads the dataset from a JSON file and converts it into a Hugging Face Dataset object.
Source code in luq/datasets/dataset.py
split_dataset(train_size=0.8)
Splits the dataset into train and test sets. :param train_size: Proportion of the dataset to include in the train split. :return: Dictionary containing train and test GenerationDataset objects
Source code in luq/datasets/dataset.py
Utility Functions
SeqProbMode
Bases: Enum
Enumeration for modes of combining token probabilities in a sequence.
Attributes:
Name | Type | Description |
---|---|---|
PROD |
Use the product of probabilities. |
|
AVG |
Use the average of probabilities. |
Source code in luq/utils/utils.py
entropy(probabilities)
Computes the entropy of a probability distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
probabilities
|
Union[List[float], Tensor]
|
A list or tensor of probabilities. The probabilities should sum to 1 and represent a valid distribution. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: The computed entropy as a scalar tensor value. |
Notes
Adds a small epsilon (1e-9) to probabilities to avoid log(0).