u/AI_Guy_In_Fintech

Indian accent english speech recognition

Been testing a bunch of ASR models lately, and I think I’ve found the best one so far for English with Indian accents.

NVIDIA’s Parakeet TDT 0.6B v2 has been surprisingly good. Accent handling feels much more natural compared to a lot of models that struggle with Indian pronunciation, mixed speech patterns, or common regional variations.

What stood out for me:

✅ Better recognition of Indian English accents

✅ Strong transcription quality

✅ Fast and lightweight (0.6B)

✅ Handles real-world speech better than expected

Model: parakeet-tdt-0.6b-v2 on huggingface

Curious if others here have tried it against Whisper, Moonshine, or other recent ASR models. So far this might be my favorite for Indian English use cases.

Anyone else tested it?

reddit.com
u/AI_Guy_In_Fintech — 4 days ago

Indic - Text to digit

I’ve been working extensively on word-to-number conversion inside Indic language sentences across 11 Indian languages, and the results have been surprisingly good so far.

The goal is to detect and normalize number words embedded in natural sentences into numeric values.

Examples:

Hindi: “मुझे पांच सौ रुपये चाहिए” → “मुझे 500 रुपये चाहिए”

Telugu: “నాకు ఐదు వందల రూపాయలు కావాలి” → “నాకు 500 రూపాయలు కావాలి”

Tamil: “எனக்கு ஐநூறு ரூபாய் வேண்டும்” → “எனக்கு 500 ரூபாய் வேண்டும்”

Currently supported languages:

Hindi

Bengali

Telugu

Tamil

Kannada

Malayalam

Marathi

Gujarati

Punjabi

Odia

Assamese

The system handles:

Sentence-level normalization

Indian numbering system (lakh/crore)

Mixed numeric + textual forms

Unicode/script variations

Noisy ASR/transcribed text

Language-specific patterns and inflections

I’ve spent quite a bit of time refining edge cases and multilingual behavior, and it’s now working pretty reliably across diverse sentence structures.

I’m also planning to share the package publicly soon. Would love feedback from people working in:

Indic NLP

ASR/text normalization

Multilingual tokenization

Speech pipelines

Production NLP systems

Curious to know:

What edge cases would you test?

Any benchmark datasets I should evaluate on?

Would a lightweight rule-based package still be useful alongside LLM pipelines?

Happy to discuss approaches and share more details if there’s interest.

reddit.com
u/AI_Guy_In_Fintech — 7 days ago

Indic - Text to digit

I’ve been working extensively on word-to-number conversion inside Indic language sentences across 11 Indian languages, and the results have been surprisingly good so far.

The goal is to detect and normalize number words embedded in natural sentences into numeric values.

Examples:

Hindi: “मुझे पांच सौ रुपये चाहिए” → “मुझे 500 रुपये चाहिए”

Telugu: “నాకు ఐదు వందల రూపాయలు కావాలి” → “నాకు 500 రూపాయలు కావాలి”

Tamil: “எனக்கு ஐநூறு ரூபாய் வேண்டும்” → “எனக்கு 500 ரூபாய் வேண்டும்”

Currently supported languages:

Hindi

Bengali

Telugu

Tamil

Kannada

Malayalam

Marathi

Gujarati

Punjabi

Odia

Assamese

The system handles:

Sentence-level normalization

Indian numbering system (lakh/crore)

Mixed numeric + textual forms

Unicode/script variations

Noisy ASR/transcribed text

Language-specific patterns and inflections

I’ve spent quite a bit of time refining edge cases and multilingual behavior, and it’s now working pretty reliably across diverse sentence structures.

I’m also planning to share the package publicly soon. Would love feedback from people working in:

Indic NLP

ASR/text normalization

Multilingual tokenization

Speech pipelines

Production NLP systems

Curious to know:

What edge cases would you test?

Any benchmark datasets I should evaluate on?

Would a lightweight rule-based package still be useful alongside LLM pipelines?

Happy to discuss approaches and share more details if there’s interest.

reddit.com
u/AI_Guy_In_Fintech — 7 days ago
▲ 12 r/DataScientist+2 crossposts

Indian Spoken Language detection model

Hey everyone,

Over the past few months, I’ve been building a spoken language identification (LID) model focused specifically on Indic languages and real-world conversational speech.

The model can automatically detect the spoken language directly from audio input, even in noisy telephony-style conversations.

Supported Languages

Hindi

English

Bengali

Marathi

Tamil

Telugu

Kannada

Malayalam

Gujarati

Punjabi

What the Model Handles

Short utterances

Call-center / telephony audio

Conversational speech

Background noise

Indian accents & regional variations

Some level of code-mixed speech

Tech Stack

PyTorch

Deep learning–based audio classification

Custom preprocessing pipeline

Audio embeddings + transformer/CNN experiments

Automated evaluation & benchmarking workflows

Biggest Challenges

One thing I underestimated was how difficult Indic spoken LID becomes in real-world data.

Some major issues:

Similar phonetics across languages

Hindi mixed with regional languages

Accent & dialect diversity

Imbalanced datasets

Extremely short voice samples

Noisy customer-support recordings

A lot of effort went into preprocessing, balancing, and improving robustness.

Potential Use Cases

IVR language routing

Multilingual voice assistants

ASR model selection

Customer support automation

Speech analytics

Voice AI systems for India

Current Focus

Right now I’m experimenting with:

Better short-utterance detection

Robustness on noisy audio

Improving confusion between related languages

Faster inference for production deployment

Looking for Feedback

Would especially appreciate:

Good Indic LID benchmarks/datasets

Ideas for handling heavy code-mixing

Production deployment suggestions

Interest in an open-source release

Happy to discuss architecture choices, datasets, or experiments if people are interested.

reddit.com
u/AI_Guy_In_Fintech — 8 days ago