ContextWindow
Manages fitting content into a model’s token budget.
Usage
ContextWindow()Combines token estimation with truncation strategies to ensure prompts and conversation messages stay within a model’s context window.
Parameters
max_tokens: int | None = None-
Explicit token budget. If provided, overrides any model profile lookup.
model: str | ModelProfile | None = None-
A model key (e.g.,
"ollama:llama3.2:latest") or a~talk_box.models.ModelProfileinstance. The profile’scontext_windowis used as the budget. reserve_output: int = 1024-
Tokens to reserve for the model’s response. Subtracted from the budget before fitting input content. Defaults to 1024.
strategy: str | FitStrategy = FitStrategy.TRUNCATE_OLDEST-
The default strategy for fitting content. Can be overridden per call.
token_counter: Callable[[str], int] | None = None- Optional custom token counting function. Defaults to estimate_tokens.
Examples
import talk_box as tb
# From a model profile
ctx = tb.ContextWindow(model="ollama:llama3.2:latest")
# With explicit budget
ctx = tb.ContextWindow(max_tokens=8192)
# With custom settings
ctx = tb.ContextWindow(
max_tokens=32_768,
reserve_output=4096,
strategy="truncate_middle",
)Attributes
| Name | Description |
|---|---|
| input_budget | Tokens available for input (max_tokens - reserve_output). |
| max_tokens | Total context window budget in tokens. |
| reserve_output | Tokens reserved for the model’s response. |
input_budget
Tokens available for input (max_tokens - reserve_output).
input_budget: int
max_tokens
Total context window budget in tokens.
max_tokens: int
reserve_output
Tokens reserved for the model’s response.
reserve_output: int
Methods
| Name | Description |
|---|---|
| count_tokens() | Count tokens in text using the configured counter. |
| fit_messages() | Fit a list of conversation messages into the token budget. |
| fit_prompt() | Fit a PromptBuilder’s output into the token budget. |
| fits() | Check whether text fits within the input budget. |
| overflow() | Calculate how many tokens over budget the text is. |
count_tokens()
Count tokens in text using the configured counter.
Usage
count_tokens(text)Parameters
text: str- The text to count tokens for.
Returns
int- Token count.
fit_messages()
Fit a list of conversation messages into the token budget.
Usage
fit_messages(messages, *, system_prompt="", strategy=None)The system prompt is always preserved. Messages are dropped according to the selected strategy until they fit within the remaining budget.
Parameters
messages: list[dict[str, str] | Message]-
List of message dicts (with
"role"and"content"keys) or~talk_box.conversation.Messageobjects. system_prompt: str = ""-
The system prompt text (always preserved in full).
strategy: str | FitStrategy | None = None- Override the default strategy for this call.
Returns
FitResult- The fitted messages plus metadata about what was dropped.
Examples
import talk_box as tb
ctx = tb.ContextWindow(max_tokens=4096, reserve_output=512)
messages = [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there! How can I help?"},
{"role": "user", "content": "Tell me about Python."},
]
result = ctx.fit_messages(messages, system_prompt="You are helpful.")
print(f"Using {result.tokens_used}/{result.token_budget} tokens")
print(f"Dropped {result.messages_dropped} messages")fit_prompt()
Fit a PromptBuilder’s output into the token budget.
Usage
fit_prompt(builder)If the full prompt fits, returns it unchanged. Otherwise, drops lowest-priority sections until it fits.
Parameters
builder: PromptBuilder-
A
~talk_box.prompt_builder.PromptBuilderinstance.
Returns
PromptFitResult- The fitted prompt text and metadata.
Examples
import talk_box as tb
builder = (
tb.PromptBuilder()
.persona("analyst", "data science")
.task_context("Analyze sales data")
.constraint("Be concise")
.example("Q: Revenue?", "A: $1.2M")
)
ctx = tb.ContextWindow(max_tokens=2048)
result = ctx.fit_prompt(builder)
print(f"Prompt uses {result.tokens_used} tokens")fits()
Check whether text fits within the input budget.
Usage
fits(text)Parameters
text: str- Text to check.
Returns
bool- True if token count is within budget.
overflow()
Calculate how many tokens over budget the text is.
Usage
overflow(text)Parameters
text: str- Text to check.
Returns
int- Number of tokens over budget (0 if within budget).