Google Translate and DeepL still give completely different outputs for the same sentence in 2026. Why hasn't this been solved yet?
Tried something out of curiosity last week. Took a few sentences with slightly tricky phrasing and ran them through several MT engines. Same input, same language pair, completely different outputs. Not just stylistic differences, actual meaning divergence in some cases.
I get that training data and architecture choices differ but we're years into transformer-based MT now and the gap between leading engines on the same input still surprises me sometimes.
Has anyone else noticed this? Is this a problem with how these models work or just a matter of more training data eventually closing the gap? And does it actually matter for most use cases or is it only a problem at the edges?