Translation Service Layer
The Translation Service Layer sits between the document format handlers and the external TMT API. Its job is to take large, unstructured text extracted from a document and produce translated text efficiently — without hammering the API or doing redundant work.
Responsibilities
- Text segmentation — split raw strings into sentences using linguistic delimiters (
.,!,?, and the Devanagari danda।). - Concurrency management — an async
Semaphorelimits the number of simultaneous API requests. - Deduplication & caching — identical sentences within or across documents are translated only once.
- In-flight merging — if two tasks request the same sentence at the same time, only one network call is made; the second waits on a
tokio::sync::oneshotchannel for the result.
Service Constraints
| Constraint | Value / Mechanism |
|---|---|
| Max request size | MAX_REQUEST_TEXT_BYTES (5,000 chars) |
| Concurrency | tokio::sync::Semaphore (capacity = --concurrency) |
| Sentence delimiters | . ! ? । |
| Minimum fragment length | 5 characters (shorter fragments are merged) |
| Deduplication | tokio::sync::oneshot channels in a shared HashMap |
Pipeline Flow
Document Handler
│
│ raw text block
▼
TranslationService::translate_text()
│
├─ split_sentences() # split on . ! ? ।
│
├─ for each sentence:
│ ├─ cache hit? ──▶ return cached result
│ ├─ in-flight? ──▶ wait on oneshot channel
│ └─ new request ──▶ acquire Semaphore permit
│ │
│ TmtClient::translate()
│ │
│ (rate limiting / retry)
│ │
│ translated sentence
│
└─ rejoin sentences ──▶ translated text block