EnrichmentPipeline

Configurable pipeline for enriching documents in a knowledge graph.

Usage

Source

EnrichmentPipeline()

The pipeline applies an enrichment function to document nodes and creates entity nodes, topic nodes, and relationship edges in the graph. Enrichment is incremental: documents are only re-enriched when their content changes.

Parameters

enrich_fn: EnrichmentFn

A callable (title, content) -> EnrichmentResult that performs the actual extraction. This is where LLM calls happen.

entity_prefix: str = "entity"

Prefix for generated entity node IDs.

topic_prefix: str = "topic"
Prefix for generated topic node IDs.

Examples

import talk_box as tb

def my_enricher(title: str, content: str) -> tb.EnrichmentResult:
    # Call your LLM here
    return tb.EnrichmentResult(
        entities=[tb.ExtractedEntity(name="Python", entity_type="technology")],
        topics=["programming"],
        summary="A document about Python.",
    )

pipeline = tb.EnrichmentPipeline(enrich_fn=my_enricher)
kg = tb.KnowledgeGraph(":memory:")
# ... add document nodes via sync() ...
result = pipeline.run(kg)
result.enriched  # number of documents enriched

Methods

Name Description
run() Run enrichment on document nodes in the knowledge graph.

run()

Run enrichment on document nodes in the knowledge graph.

Usage

Source

run(kg, *, limit=100, force=False)
Parameters
kg: Any

A ~talk_box.knowledge_graph.KnowledgeGraph instance.

limit: int = 100

Maximum number of documents to enrich per run.

force: bool = False
If True, re-enrich all documents regardless of whether they’ve been enriched before.
Returns
PipelineResult
Summary of enrichment activity.