DirectoryConnector
Ingest text-based files from a directory by extension.
Usage
DirectoryConnector()Scans a directory recursively and yields files matching the given extensions as Document objects. Binary files (e.g. PDF) are skipped with a note in metadata — full PDF extraction is left to enrichment plugins.
Parameters
path: str | Path-
Path to the directory to scan.
extensions: list[str] | None = None-
File extensions to include (e.g.
[".txt", ".md", ".py"]). Defaults to[".txt", ".md"]. name: str = "directory"-
Connector name (defaults to
"directory").
Examples
import talk_box as tb
connector = tb.DirectoryConnector(
"~/Documents/work/",
extensions=[".txt", ".md", ".py"],
)
for doc in connector.scan():
print(doc.title, len(doc.content))Methods
| Name | Description |
|---|---|
| scan() | Yield documents from the directory tree. |
scan()
Yield documents from the directory tree.
Usage
scan()