
Moving MTQE from scores to operational signals in production pipelines
Manuel Herranz from Pangeanic here. Long time no posting!
We have been focusing heavily on how to move MTQE beyond passive, post-facto scoring and turn it into a dynamic routing layer in production. I just published a deep dive on why enterprise localization needs to shift away from raw scalar metrics and toward actionable operational signals.
The core argument is that a machine translation can be perfectly fluent and linguistically accurate, yet still fail the job if it ignores client-specific terminology, glosssaries, or specific contextual risk profiles. While frameworks like COMETKiwi have been useful for general evaluation, true production automation requires an adaptive control layer that dynamically triggers human review per use case / client / job / industry, or automatic corrective post-editing based on actual asset compliance rather than a generic confidence number.
For me, this represents a paradigmatic shift from treating QE as a passive audit log to using it as an active routing mechanism. For those interested in the workflow architecture and how we are balancing varying data risks across different content domains, the full article is here:
https://blog.pangeanic.com/mtqe-is-becoming-a-translation-control-layer-from-scores-to-adaptive-quality-workflows
Given the deep technical focus of this community, I would love to get your thoughts on the practical hurdles of automated thresholding. How are your teams handling the engineering challenges of real-time routing, and where do you see the boundary between automated gating and human veto power?