Uploading a document
Navigate to Brain → Sources and use the Upload Data interface (or simply drop a file into the Corgtex Slack bot, if configured). Supported formats include:- DOCX
- TXT
- MD / MDX
- CSV
The ingestion and chunking process
When you upload a file, Corgtex doesn’t just store the blob. The worker processes the file through our background pipeline:- Extraction: Specialized parsers (like
pdf-parseormammoth) extract the raw text content. - Chunking & Storage: The text is sliced into manageable semantic chunks (
KnowledgeChunk), embedded, and saved to the vector database.
PII Classification
Before any text is embedded, it runs through the Zero-LLM PII Regex Classifier. This is a crucial safety step that happens entirely locally (without sending your data to external APIs):- The system scans the chunks using deterministic regex patterns.
- If it detects Personal Identifiable Information (PII) or confidential markers, it tags the chunk accordingly.
- Tagged chunks are strictly isolated and are only returned to workflows explicitly authorized to handle them. They will never be exposed in standard conversational agent responses.