u/Pioskeff

I'm building an Ekegusii ↔ English NLP translator for a critically low-resource Bantu language in KENYA ,here's where I am and what I'm figuring out next

Hey everyone 👋 Long-time lurker, first-time poster. I've been self-teaching NLP over the past few months and got hit with an idea I can't shake: building a machine translation system for Ekegusii (also called Gusii), a Bantu language spoken by the Gusii people in western Kenya roughly 2–3 million speakers.

Ekegusii is critically underrepresented in NLP. There's almost no public tooling, no pre-trained models, and very little parallel data available online. I want to change that, starting with an Ekegusii ↔ English translator, with Kiswahili as a future target.

What I've done so far:

Found a large parallel corpus the Bible in both Ekegusii and English

Parsed and aligned it into a structured .json file with paired sentence entries: { "ekegusii": "...", "english": "..." }

31,000 verse-level pairs , not huge, but a real start for a low-resource language

Where I'm stuck / what I'm figuring out next:

  • Should I fine-tune an existing multilingual model (e.g. mBART-50NLLB-200, or Helsinki-NLP opus-mt) or try to build something smaller from scratch given compute constraints?
  • Bible text is highly formal and domain-specific , how much will that hurt generalization?
  • Tokenization: Ekegusii has rich morphology, so I'm wondering whether a standard BPE tokenizer will handle it well
  • Data augmentation strategies for low-resource MT?
  • Has anyone worked on low-resource African language MT before? Any advice, papers, or communities I should know about? Would love to connect with others working on similar problems.

Happy to share the dataset and code publicly once it's cleaned up. I would love for this to become a community resource.

reddit.com
u/Pioskeff — 20 hours ago
▲ 8 r/SharpBoys+1 crossposts

Mwosho Chronicles

Wewe tuma logins ndio nitume pesaa mchezo wa paka na panya,😂😂 khabusieee 😤

u/Pioskeff — 2 days ago

My 100th plus time trying to be sharp

So my hommie told me he's got a legit guy who got some logs so I purchase 7 gs gone out of nowhere I am told in order to get the log you gotta get the rdp a whooping and my last ten g's went like that then after paying they don't send I'm being told I need and rdv I. was like Sasa rdv ndio nini nikacheka nikajua hapa nimefanyiwa kiomo mwosho Moja clean nikacheka kwanza so that's how nimerudi 0 due to wanting chumz visenti sahii 😅Sina ata mengi ya kuongezea 😭😭 I've learnt to stay guided with Telegram niggas

Otherwise sahii I have UK and US proxy ukiwa na a way msee anaeza earn ata za supper na rent link me up 🙂 I'm good in AI training tasks ,data annotation ,learning ML mostly in NLP and computer Vision and build agents and automating tasks with them

Leta kakazii wadau

reddit.com
u/Pioskeff — 4 days ago