r/kaggle

▲ 1 r/kaggle

our gemma 4 competition submission: offline disaster mesh app with on-device AI

me and a friend just wrapped our submission for the gemma 4 competition. we built MeshGemma, a disaster response app that runs gemma 4 on-device with no internet and meshes phones together over bluetooth when cell towers go down. it reads injury photos, answers medical questions offline, and compresses incident data to 200 bytes for radio uplink. filmed it on the heath next to an actual wildfire zone in the netherlands.

submission is locked now but happy to talk about what we built

https://www.kaggle.com/competitions/gemma-4-good-hackathon/writeups/new-writeup-1778607604484

reddit.com
u/Guus196 — 2 days ago
▲ 17 r/kaggle+1 crossposts

I made the largest public gender-labeled Japanese name dataset, 731k+ names

Built by merging 5 existing public datasets into one. And I've scraped the wiki 69k names too.

Kaggle Dataset License: CC BY-SA 4.0

Dataset Size Male % Notes
Wikipedia 69,209 44.1% Real attested people, 87% have birth year
ENAMDICT 116,009 16.4% Dictionary-based, heavily skewed female
Facebook 530M leak 392,434 60.6% Largest source, kanji or kana only
GenDec 64,139 49.8%
名前由来 89,635 60.4% Popularity rankings, not real frequency
Total 731,426 51.0%

Each individual dataset has its own gaps — size, quality, or skew — but combining them gives a more complete picture. The Wikipedia subset is the only one covering real individuals and has a temporal dimension through birth years. ENAMDICT skews female partly because Japanese female names have more variety. The Facebook data is massive but only records kanji or kana, not both.

Use cases: gender inference (training classifiers without LLMs), Japanese NLP (NER, tokenization, reading prediction), cross-source data quality research

Also working on a gender prediction model, will post when ready. it has around 90% accuracy

reddit.com
u/Careful_Sand_7838 — 4 days ago
▲ 140 r/kaggle+5 crossposts

I wanted to understand how Kaggle Kernels work, so I built a minimal version locally — inspired by the real Kaggle kernel design.

Each notebook session runs in its own k8s pod:

- Start → pod spins up

- Run cells → executed in kernel , states managed

- Stop → pod is destroyed

This helped me understand execution, isolation, and lifecycle under the hood.

You can deploy it easily on Minikube.

GitHub: https://github.com/mageshkrishna/k8s-kaggle-kernel-clone

If you find it useful, consider starring the repo ⭐

u/Formal-Woodpecker-78 — 14 days ago
▲ 18 r/kaggle

36 closed S. pneumoniae genomes for structural pangenomics

Dataset: https://www.kaggle.com/datasets/qasimhu/s-pneumoniae-structural-pangenomics-cohort .

This dataset provides a high-fidelity genomic cohort of Streptococcus pneumoniae, specifically curated for structural pangenomics. In clinical microbiology, understanding the genetic plasticity of this pathogen is critical, as its accessory genome, comprising mobile genetic elements like plasmids and phages, directly influences strain-dependent gene essentiality and antimicrobial resistance evolution. For my Kaggle data science and machine learning community, this dataset offers a unique opportunity to apply advanced deep learning architectures, such as sequence transformers and graph neural networks, to complex, high-dimensional biological data. It presents an excellent opportunity for AI enthusiasts to develop algorithms that bridge the gap between raw genomic sequences and clinical outcomes like antimicrobial resistance and pathogen evolution.

reddit.com
u/Obvious_Sky6614 — 13 days ago
▲ 11 r/kaggle

Vision Transformer using TF

Hi everyone I was playing around with fine tuning a Vision transformer (from HF) using TensorFlow and here is a summary of the lessons learned:

Ensemble heads don't help; a full-model ensemble might, but is likely too resource-intensive.

Sequentially unfreezing layers during fine-tuning improved performance.

A cosine decay learning rate schedule with warm-up yielded better fine-tuning results.

Data augmentation helped on the original dataset but appeared to confuse the model on extended data.

Transformers 5.x dropped TensorFlow support — pin to transformers==4.44.0.

Keras doesn't summarize layers correctly in this setup; a workaround is needed.

Notebook: https://www.kaggle.com/code/thomasprzilliox/vision-transformer-vit-for-flower-classification

Does anyone have a good solution for the last point ? Any tricks to have model.summary() working with every Hugging Face model ?

reddit.com
u/tzilliox — 14 days ago