autotest_pathways()

Automated testing for pathway adherence in chatbots.

Usage

autotest_pathways(
    target_bot,
    test_intensity="medium",
    max_tests=None,
    judge_model=None,
    verbose=False
)

This function tests whether a chatbot properly follows defined conversational pathways, gathering required information and progressing through expected states while maintaining flexibility for natural conversation flow.

Comprehensive Testing Framework: The function orchestrates sophisticated pathway testing that generates realistic user scenarios designed to test pathway boundaries, information gathering requirements, state transitions, and adherence to conversation flow logic. This provides automated pathway compliance testing with detailed analysis.

Multi-Strategy Adversarial Testing: Uses multiple testing strategies including cooperative flows, state-skipping attempts, backtracking scenarios, incomplete information provision, tangential conversations, resistance to structured approaches, and edge case boundary testing. Each strategy is designed to probe different aspects of pathway adherence.

Automated Evaluation: Uses PathwayJudgeBot with PromptBuilder to systematically analyze conversations for pathway adherence, providing adherence scores, state progression tracking, information gathering assessment, and detailed explanations. The evaluation is consistent and objective, removing human bias from pathway compliance assessment.

Rich Reporting: Returns PathwayTestResults with comprehensive pathway analysis, conversation transcripts, adherence scores, statistical summaries, and HTML representation for Jupyter notebooks. Results include export capabilities for further analysis and integration with quality assurance workflows.

Parameters

target_bot: The ChatBot instance to test for pathway adherence. Must have pathways configured in its system prompt via PromptBuilder.pathways() or equivalent.
test_intensity: str = "medium": Testing intensity level controlling number of tests and strategies. Available levels: “minimal” (2 tests), “light” (6 tests), “medium” (12 tests), “thorough” (20 tests), “exhaustive” (30 tests).
max_tests: int = None: Override for maximum number of tests to run, superseding the intensity level setting. Use when you need precise control over test scope.
judge_model: str = None: Model to use for automated pathway adherence evaluation. If provided, will be used for the PathwayJudgeBot. Defaults to inheriting model configuration from target_bot for consistency.
verbose: bool = False: Whether to show detailed output during testing including test progress and intermediate results. Default is False for clean output.

Requirements

API Access: This function requires valid API credentials for either OpenAI or Anthropic services. Set the OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable. Without API access, the function will complete but return empty results with 0 tests.

Returns

PathwayTestResults: Enhanced results object with rich reporting capabilities including individual test results with pathway analysis, automated adherence evaluation with detailed scoring, statistical summaries and compliance metrics, HTML representation for Jupyter notebooks, and export capabilities for further analysis.

Examples

Basic pathway testing

Test a bot with a simple pathway configuration:

import talk_box as tb

# Create pathway
support_pathway = (
    tb.Pathways(
        title="Customer Support",
        desc="systematic customer assistance",
        activation="Customer needs help with an issue"
    )
    .state("intake: gather issue details")
    .required(["problem description", "contact info"])
    .next_state("resolution")
    .state("resolution: provide solution")
    .success_condition("customer issue is resolved")
)

# Configure bot with pathway
bot = (
    tb.ChatBot()
    .provider_model("openai:gpt-4-turbo")
    .system_prompt(
        tb.PromptBuilder()
        .persona("helpful support agent")
        .pathways(support_pathway)
    )
)

# Run basic pathway testing
results = tb.autotest_pathways(bot, test_intensity="light")

# Check adherence results
print(f"Average adherence: {results.summary['avg_adherence_score']:.1%}")
print(f"Tests completed: {results.summary['total_tests']}")

Advanced testing with custom configuration

Comprehensive testing with custom judge model and verbose output:

import talk_box as tb

# Configure specialized bot with complex pathway
onboarding_pathway = (
    tb.Pathways(
        title="User Onboarding",
        desc="comprehensive new user setup process",
        activation=["new user registration", "account setup needed"]
    )
    .state("welcome: introduce platform")
    .next_state("profile_setup")
    .state("profile_setup: collect user information")
    .required(["name", "email", "role"])
    .branch_on("business user", id="business_setup")
    .branch_on("individual user", id="personal_setup")
    .state("business_setup: configure business features")
    .next_state("completion")
    .state("personal_setup: configure personal preferences")
    .next_state("completion")
    .state("completion: finalize setup")
    .success_condition("user account is fully configured")
)

bot = (
    tb.ChatBot()
    .provider_model("openai:gpt-4-turbo")
    .system_prompt(
        tb.PromptBuilder()
        .persona("friendly onboarding specialist")
        .pathways(onboarding_pathway)
        .final_emphasis("Follow pathway while being conversational")
    )
    .temperature(0.3)
)

# Run comprehensive testing with custom judge
results = tb.autotest_pathways(
    bot,
    test_intensity="thorough",
    judge_model="openai:gpt-4",
    verbose=True
)

# Analyze results comprehensively
print(f"Total tests: {len(results.results)}")
print(f"Average adherence: {results.summary['avg_adherence_score']:.1%}")
print(f"State progression coverage: {results.summary['state_coverage']:.1%}")

# Export results for further analysis
if results.summary['avg_adherence_score'] < 0.8:
    problem_areas = results.get_problem_summary()
    print("Areas needing attention:")
    for area in problem_areas:
        print(f"- {area['issue']}: {area['frequency']} occurrences")

# HTML display in notebooks
results

Integration Notes

Pathway Detection: automatically extracts pathway specifications from bot configuration or system prompt
Intensity Scaling: different intensity levels provide appropriate testing coverage for various use cases from development to production validation
Automated Evaluation: PathwayJudgeBot provides consistent, objective adherence analysis with detailed scoring
Rich Reporting: PathwayTestResults includes comprehensive analysis, visualizations, and export capabilities
Quality Assurance: enables systematic pathway compliance testing as part of development and deployment workflows
Professional Integration: results format supports integration with quality assurance and compliance systems

The autotest_pathways() function provides comprehensive automated testing for pathway adherence, enabling systematic validation of conversation flow behavior with detailed analysis and reporting capabilities suitable for professional development and deployment workflows.