Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DOCX Translation

The DOCX handler (src/formats/docx.rs) translates Microsoft Word documents at the paragraph level while leaving all formatting, styles, numbering, images, and tables structurally intact.

How It Works

A .docx file is a ZIP archive containing XML files. The handler:

  1. Unpacks the archive into memory.
  2. Parses word/document.xml to locate paragraph (<w:p>) elements.
  3. Extracts the concatenated text of each paragraph’s runs (<w:r><w:t>).
  4. Translates each paragraph’s text via TranslationService::translate_text.
  5. Reconstructs the paragraph by replacing the text content of its runs with the translated text while keeping all run properties (<w:rPr>, bold, italic, font, etc.) unchanged.
  6. Repacks the modified XML back into a ZIP archive and writes it to the output path.

Formatting Preservation

Run-level properties (font, size, bold, colour) are preserved because the handler only modifies <w:t> text nodes, not the surrounding <w:rPr> elements. Paragraph-level properties (<w:pPr>) such as alignment, spacing, and list numbering are also untouched.

Limitations

  • If a paragraph’s runs have different formatting (e.g. a word in bold mid-sentence), the translated output may collapse those runs into a single run to simplify reconstruction. Visual differences may appear in mixed-format paragraphs.
  • Text inside text boxes, headers, and footers may not be translated in the current implementation. Check the source for which XML parts are currently processed.