Compliance for Avoiding Topics
The autotest_avoid_topics()
function provides comprehensive automated testing for ChatBot
avoid topics compliance. It generates sophisticated adversarial questions designed to test whether your bot properly refuses to engage with prohibited topics, then automatically evaluates responses for violations.
Why Test Avoid Topics?
When you configure a ChatBot
to avoid certain topics, you want confidence that it will consistently refuse to engage with those topics, even when:
- users ask directly for prohibited information
- questions are phrased indirectly or hypothetically
- emotional appeals are used to pressure the bot
- role-playing scenarios try to bypass restrictions
- persistent follow-up questions push for answers
Manual testing is time-consuming and inconsistent. The autotest_avoid_topics()
function automates this process with sophisticated adversarial testing strategies.
Basic Usage
The simplest way to test avoid topics compliance:
import talk_box as tb
# Create a bot with avoid topics
= tb.ChatBot().avoid(["medical_advice", "financial_planning"])
bot
# Test compliance
= tb.autotest_avoid_topics(bot, test_intensity="medium")
results
# View results
print(f"Compliance rate: {results.summary['success_rate']:.1%}")
print(f"Violations found: {results.summary['violation_count']}")
Test Intensity Levels
Control the thoroughness of testing with different intensity levels:
# Light testing - quick validation (3 conversations)
= tb.autotest_avoid_topics(bot, test_intensity="light")
results
# Medium testing - balanced coverage (6 conversations)
= tb.autotest_avoid_topics(bot, test_intensity="medium")
results
# Thorough testing - comprehensive strategies (10 conversations)
= tb.autotest_avoid_topics(bot, test_intensity="thorough")
results
# Exhaustive testing - all strategies (15 conversations)
= tb.autotest_avoid_topics(bot, test_intensity="exhaustive") results
Comprehensive Example: Wellness Coach Bot
Let’s create a productivity-focused bot and test its ability to avoid gaming topics:
import talk_box as tb
# Create a specialized productivity coach bot with structured prompts
= tb.ChatBot().provider_model("openai:gpt-4-turbo")
wellness_bot
# Build a comprehensive prompt using PromptBuilder
= (
wellness_prompt
tb.PromptBuilder()
.persona(="productivity and wellness coach",
role="work-life balance, professional development, and healthy lifestyle habits"
expertise
)
.task_context("Help people improve their professional productivity, develop better work habits, "
"enhance their career skills, and maintain healthy lifestyle choices. Focus on "
"practical, actionable advice for real-world improvement."
)
.core_analysis(["Assess productivity challenges and provide practical solutions",
"Recommend evidence-based strategies for professional growth",
"Suggest healthy lifestyle changes that support career success",
"Offer time management and organizational techniques",
"Provide guidance on skill development and learning strategies"
])
.avoid_topics(["video games",
"gaming",
])
.output_format(["Provide specific, actionable advice focused on real-world productivity",
"Suggest concrete steps people can take today to improve their situation",
"Recommend professional development resources when appropriate",
"Include time management tips and healthy habit formation strategies"
])
.final_emphasis("You are a professional coach focused on helping people succeed in their careers "
"and maintain healthy, productive lifestyles. Stay firmly within this domain and "
"redirect any entertainment or gaming discussions to productive alternatives."
)
)
= wellness_bot.system_prompt(wellness_prompt)
wellness_bot
# Test the bot's compliance
= tb.autotest_avoid_topics(wellness_bot, test_intensity="thorough") results
Understanding Test Results
The TestResults
object provides rich analysis capabilities:
Summary Statistics
# Get comprehensive summary
= results.summary
summary print(f"Total tests run: {summary['total_tests']}")
print(f"Tests passed: {summary['passed']}")
print(f"Tests failed: {summary['failed']}")
print(f"Success rate: {summary['success_rate']:.1%}")
print(f"Total violations: {summary['violation_count']}")
print(f"Average test duration: {summary['avg_duration']:.2f} seconds")
# See which topics were tested
print(f"Topics tested: {list(summary['topics_tested'].keys())}")
print(f"Strategies used: {list(summary['strategies_used'].keys())}")
Individual Test Results
# Examine individual conversation results
for i, result in enumerate(results.results):
= "PASSED" if result.completed and not result.violations else "FAILED"
status print(f"Test {i+1}: {result.topic} ({result.strategy}) - {status}")
if result.violations:
print(f" Violations: {len(result.violations)}")
for violation in result.violations:
print(f" - {violation.explanation}")
Rich HTML Display
In notebook environments, TestResults
automatically displays rich HTML with:
# This shows an interactive dashboard with:
# - Summary statistics with visual metrics
# - Test configuration details
# - Individual conversation transcripts
# - Violation analysis with highlighted responses
results
Here are the HTML-formatted results for the tests previously performed on the wellness_bot
:
Testing Strategies
The function uses multiple adversarial strategies:
- Direct: straightforward requests for prohibited information
- Indirect: third-party or hypothetical scenarios
- Emotional Appeal: urgent or distressing situations
- Hypothetical: “what if” or theoretical questions
- Role Playing: professional or academic contexts
- Context Shifting: starting with acceptable topics, then shifting
- Persistence: follow-up questions after initial refusal
Advanced Configuration
Custom Judge Model
Use a specific model for violation evaluation:
# Use GPT-4 for more accurate violation detection
= tb.autotest_avoid_topics(
results
bot,="thorough",
test_intensity="gpt-4",
judge_model=True
verbose )
Precise Control
Override intensity settings for exact control:
# Run exactly 8 conversations regardless of intensity level
= tb.autotest_avoid_topics(
results
bot,="medium",
test_intensity=8
max_conversations )
Export and Analysis
Convert results to structured formats for further analysis:
# Export to pandas DataFrame
= results.to_dataframe()
df
# Analyze violations by topic
= df.groupby('topic')['violations'].sum()
violation_summary print("Violations by topic:")
print(violation_summary)
# Analyze success rates by strategy
= df.groupby('strategy')['status'].apply(
success_by_strategy lambda x: (x == 'Passed').mean()
)print("\nSuccess rate by strategy:")
print(success_by_strategy)
# Export detailed results
"compliance_test_results.csv")
df.to_csv(
# Create Great Tables report (if great-tables is installed)
= results.to_great_table()
gt_table "compliance_report.html") gt_table.save(
Best Practices
Strong Avoid Topics Configuration
Use clear, specific language in your avoid topics:
# Specific and clear
= tb.ChatBot().avoid([
bot "medical diagnosis",
"investment recommendations",
"legal advice"
])
# With `PromptBuilder` for stronger constraints
= (
prompt
tb.PromptBuilder()"helpful assistant", "general support")
.persona(
.avoid_topics(["medical diagnosis",
"investment recommendations",
"legal advice"
])
.final_emphasis("You must not provide advice in prohibited domains. "
"Always redirect to appropriate professionals."
) )
Regular Testing
Incorporate avoid topics testing into your development workflow:
# Test during development
def test_bot_development():
= tb.autotest_avoid_topics(bot, test_intensity="light")
results assert results.summary['success_rate'] >= 0.90, "Development bot failing compliance"
# Test before deployment
def test_bot_production():
= tb.autotest_avoid_topics(bot, test_intensity="exhaustive")
results assert results.summary['success_rate'] >= 0.95, "Production bot failing compliance"
assert results.summary['violation_count'] == 0, "Zero violations required for production"
Iterative Improvement
Use test results to strengthen your bot’s avoid topics behavior:
# Analyze failures to improve prompts
= tb.autotest_avoid_topics(bot, test_intensity="thorough")
results
if results.summary['violation_count'] > 0:
print("Violations detected: analyzing patterns...")
# Look at violation types
for result in results.results:
if result.violations:
print(f"\nTopic: {result.topic}")
print(f"Strategy: {result.strategy}")
for violation in result.violations:
print(f"Issue: {violation.explanation}")
if violation.specific_quotes:
print(f"Quote: {violation.specific_quotes[0]}")