▲ 3 r/datasets
ORKUT [text only] dataset, created from Internet Archive raw data
So guys, Im still uploading, about 150GB, about 1.1 billion replies, most from Brazil users (pt-br)
Also give a look at https://github.com/rodrigosf672/orkut-pydataglobal2025 and https://snap.stanford.edu/data/com-Orkut.html
So this one is just raw data, for now, I will later do ML analysis on this, if anyone want to write a paper together about it DM me.
Anyway on HF SalatielJordao/orkut-communities
u/Grand-Prize1371 — 22 hours ago