Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Translation Service Layer

The Translation Service Layer sits between the document format handlers and the external TMT API. Its job is to take large, unstructured text extracted from a document and produce translated text efficiently — without hammering the API or doing redundant work.

Responsibilities

  • Text segmentation — split raw strings into sentences using linguistic delimiters (., !, ?, and the Devanagari danda ).
  • Concurrency management — an async Semaphore limits the number of simultaneous API requests.
  • Deduplication & caching — identical sentences within or across documents are translated only once.
  • In-flight merging — if two tasks request the same sentence at the same time, only one network call is made; the second waits on a tokio::sync::oneshot channel for the result.

Service Constraints

ConstraintValue / Mechanism
Max request sizeMAX_REQUEST_TEXT_BYTES (5,000 chars)
Concurrencytokio::sync::Semaphore (capacity = --concurrency)
Sentence delimiters. ! ?
Minimum fragment length5 characters (shorter fragments are merged)
Deduplicationtokio::sync::oneshot channels in a shared HashMap

Pipeline Flow

Document Handler
      │
      │  raw text block
      ▼
TranslationService::translate_text()
      │
      ├─ split_sentences()          # split on . ! ? ।
      │
      ├─ for each sentence:
      │     ├─ cache hit?  ──▶  return cached result
      │     ├─ in-flight?  ──▶  wait on oneshot channel
      │     └─ new request ──▶  acquire Semaphore permit
      │                              │
      │                        TmtClient::translate()
      │                              │
      │                         (rate limiting / retry)
      │                              │
      │                         translated sentence
      │
      └─ rejoin sentences ──▶  translated text block

Sub-pages