Format Handlers Overview
Format handlers live in the src/formats/ module. Each handler is responsible for:
- Parsing the source document structure (cells, paragraphs, PDF content streams, etc.).
- Extracting text units to translate.
- Delegating translation to
TranslationService. - Reconstructing the output document with translated text, preserving all non-text structure.
The central router is formats::translate_file, which reads the file extension from RuntimeConfig and dispatches to the correct handler. An unrecognised extension returns AppError::UnsupportedFormat.
Handlers at a Glance
| Format | Handler module | Key challenge |
|---|---|---|
| CSV / TSV | formats::csv_tsv | Preserve column structure across all rows |
| DOCX | formats::docx | Translate paragraph runs while keeping styles, numbering, and tables intact |
formats::pdf | Parse content streams, rasterise background, render translated text at correct positions |