EnrichmentPipeline

Configurable pipeline for enriching documents in a knowledge graph.

Usage

EnrichmentPipeline()

The pipeline applies an enrichment function to document nodes and creates entity nodes, topic nodes, and relationship edges in the graph. Enrichment is incremental: documents are only re-enriched when their content changes.

Parameters

enrich_fn: EnrichmentFn: A callable (title, content) -> EnrichmentResult that performs the actual extraction. This is where LLM calls happen.
entity_prefix: str = "entity": Prefix for generated entity node IDs.
topic_prefix: str = "topic": Prefix for generated topic node IDs.

Examples

import talk_box as tb

def my_enricher(title: str, content: str) -> tb.EnrichmentResult:
    # Call your LLM here
    return tb.EnrichmentResult(
        entities=[tb.ExtractedEntity(name="Python", entity_type="technology")],
        topics=["programming"],
        summary="A document about Python.",
    )

pipeline = tb.EnrichmentPipeline(enrich_fn=my_enricher)
kg = tb.KnowledgeGraph(":memory:")
# ... add document nodes via sync() ...
result = pipeline.run(kg)
result.enriched  # number of documents enriched

Methods

Name	Description
run()	Run enrichment on document nodes in the knowledge graph.

run()

Run enrichment on document nodes in the knowledge graph.

Usage

Source

run(kg, *, limit=100, force=False)

Parameters

kg: Any: A ~talk_box.knowledge_graph.KnowledgeGraph instance.
limit: int = 100: Maximum number of documents to enrich per run.
force: bool = False: If True, re-enrich all documents regardless of whether they’ve been enriched before.

Returns

PipelineResult: Summary of enrichment activity.