TranslationService & Sentence Splitting
TranslationService
TranslationService (defined in src/translate/service.rs) is the single public interface that format handlers call. It holds:
- An
Arc<TmtClient>for HTTP communication. - A
Semaphoreto cap concurrent requests. - A
Mutex<HashMap<CacheKey, CacheEntry>>for deduplication and caching.
translate_text
The primary method signature (conceptually):
#![allow(unused)]
fn main() {
pub async fn translate_text(
&self,
text: &str,
src_lang: &str,
tgt_lang: &str,
) -> Result<String, AppError>
}
Internally it:
- Calls
split_sentences(text)to obtain a list of sentence fragments. - Spawns a concurrent task per fragment (bounded by the
Semaphore). - Each task checks the cache / in-flight map before making a network call.
- Joins all translated fragments back into a single string.
Cache Key
Each entry in the internal cache is keyed on a CacheKey:
#![allow(unused)]
fn main() {
struct CacheKey {
text: String,
src_lang: String,
tgt_lang: String,
}
}
This means the same sentence translated between different language pairs is cached independently.
split_sentences
Defined in src/translate/sentence.rs, split_sentences divides a text block on the following delimiters:
| Delimiter | Meaning |
|---|---|
. | Latin full stop |
! | Exclamation mark |
? | Question mark |
। | Devanagari danda (used in Nepali and Tamang) |
Fragment Merging
Short fragments (fewer than 5 characters) are merged with the next fragment before being sent to the API. This avoids wasting API quota on punctuation-only or single-word fragments and ensures the server has enough context for accurate translation.
Example
Input:
Hello world. How are you? I am fine!
Fragments after splitting and merging:
["Hello world.", "How are you?", "I am fine!"]