Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

TranslationService & Sentence Splitting

TranslationService

TranslationService (defined in src/translate/service.rs) is the single public interface that format handlers call. It holds:

  • An Arc<TmtClient> for HTTP communication.
  • A Semaphore to cap concurrent requests.
  • A Mutex<HashMap<CacheKey, CacheEntry>> for deduplication and caching.

translate_text

The primary method signature (conceptually):

#![allow(unused)]
fn main() {
pub async fn translate_text(
    &self,
    text: &str,
    src_lang: &str,
    tgt_lang: &str,
) -> Result<String, AppError>
}

Internally it:

  1. Calls split_sentences(text) to obtain a list of sentence fragments.
  2. Spawns a concurrent task per fragment (bounded by the Semaphore).
  3. Each task checks the cache / in-flight map before making a network call.
  4. Joins all translated fragments back into a single string.

Cache Key

Each entry in the internal cache is keyed on a CacheKey:

#![allow(unused)]
fn main() {
struct CacheKey {
    text: String,
    src_lang: String,
    tgt_lang: String,
}
}

This means the same sentence translated between different language pairs is cached independently.

split_sentences

Defined in src/translate/sentence.rs, split_sentences divides a text block on the following delimiters:

DelimiterMeaning
.Latin full stop
!Exclamation mark
?Question mark
Devanagari danda (used in Nepali and Tamang)

Fragment Merging

Short fragments (fewer than 5 characters) are merged with the next fragment before being sent to the API. This avoids wasting API quota on punctuation-only or single-word fragments and ensures the server has enough context for accurate translation.

Example

Input:

Hello world. How are you? I am fine!

Fragments after splitting and merging:

["Hello world.", "How are you?", "I am fine!"]