consensus
Usage
Consensus mode: query multiple models, compare outputs, flag disagreements.
Classes
| Name | Description |
|---|---|
| ConsensusResult | Result of running consensus across multiple model responses. |
| ConsensusStrategy | Strategy for resolving consensus across multiple model responses. |
| Disagreement | A detected disagreement between model responses. |
| ModelResponse | A single model’s response to a prompt. |
ConsensusResult
Result of running consensus across multiple model responses.
Usage
ConsensusResult(
winner,
winner_model,
agreement_score,
strategy,
responses=list(),
disagreements=list(),
consensus_reached=True
)Parameters
winner: str-
The winning/selected response text.
winner_model: str-
The model that produced the winning response.
agreement_score: float-
Overall agreement score from 0.0 (complete disagreement) to 1.0 (unanimous).
strategy: ConsensusStrategy-
The consensus strategy that was used.
responses: list[ModelResponse] = list()-
All individual model responses that were compared.
disagreements: list[Disagreement] = list()-
Detected disagreements between responses.
consensus_reached: bool = True- Whether consensus was successfully reached (relevant for UNANIMOUS strategy).
ConsensusStrategy
Strategy for resolving consensus across multiple model responses.
Usage
ConsensusStrategy()Attributes
MAJORITY-
The response most similar to the majority wins.
UNANIMOUS-
All responses must substantially agree; otherwise consensus fails.
MOST_COMMON-
Select the most frequently occurring response (exact or near-match).
WEIGHTED- Responses are weighted by model quality/cost tier (higher tier = more weight).
Disagreement
A detected disagreement between model responses.
Usage
Disagreement(description, models, severity="moderate")Parameters
description: str-
Human-readable description of the disagreement.
models: tuple[str, …]-
The models involved in the disagreement.
severity: str = "moderate"-
Severity level:
"minor"(stylistic),"moderate"(factual nuance),"major"(contradictory claims).
ModelResponse
A single model’s response to a prompt.
Usage
ModelResponse(model, text, latency_ms=None, token_count=None, weight=1.0)Parameters
model: str-
Model identifier (e.g.,
"anthropic:claude-sonnet-4-6"). text: str-
The response text from the model.
latency_ms: float | None = None-
Response latency in milliseconds, if measured.
token_count: int | None = None-
Number of tokens in the response, if known.
weight: float = 1.0- Optional weight for weighted consensus (default 1.0).
Functions
| Name | Description |
|---|---|
| consensus() | Determine consensus across multiple model responses. |
| find_disagreements() | Detect disagreements between model responses. |
consensus()
Determine consensus across multiple model responses.
Usage
consensus(
responses, *, strategy=ConsensusStrategy.MAJORITY, unanimous_threshold=0.7
)Compares the provided responses and selects a winner based on the chosen strategy. Also detects and reports disagreements.
Parameters
responses: list[ModelResponse]-
List of model responses to compare. Must contain at least one response.
strategy: ConsensusStrategy = ConsensusStrategy.MAJORITY-
The consensus strategy to use.
unanimous_threshold: float = 0.7-
For the
UNANIMOUSstrategy, the minimum pairwise similarity required for consensus to be reached (default 0.7).
Returns
ConsensusResult- The consensus outcome including winner, agreement score, and disagreements.
Raises
ValueError-
If
responsesis empty.
Examples
import talk_box as tb
responses = [
tb.ModelResponse(model="anthropic:claude-sonnet-4-6", text="Python is a programming language."),
tb.ModelResponse(model="openai:gpt-4o", text="Python is a high-level programming language."),
tb.ModelResponse(model="google:gemini-2.5-flash", text="Python is an interpreted programming language."),
]
result = tb.consensus(responses, strategy=tb.ConsensusStrategy.MAJORITY)
result.winner # "Python is a high-level programming language."
result.agreement_score # ~0.75
result.consensus_reached # Truefind_disagreements()
Detect disagreements between model responses.
Usage
find_disagreements(responses)Compares each pair of responses and flags significant differences. Uses word-level similarity to classify severity.
Parameters
responses: list[ModelResponse]- List of model responses to compare.
Returns
list[Disagreement]- Detected disagreements, sorted by severity (major first).
Examples
import talk_box as tb
responses = [
tb.ModelResponse(model="model_a", text="Python was created in 1991."),
tb.ModelResponse(model="model_b", text="Python was created in 1989."),
]
disagreements = tb.find_disagreements(responses)
disagreements[0].severity # "major"