▲ 1 r/datasetsThe largest-scale source of LLM data is now available from anywhere. Crazy speed via CDN, no egress.commoncrawl.org u/qlhoest — 15 hours ago
▲ 47 r/datasets+1 crosspostsStructured Wikipedia now in Parquet format (en/fr) - one line of python to load in pandas/polarshuggingface.co u/qlhoest — 2 days ago
▲ 62 r/LocalLLaMA1M datasets on HF !This community is gold ! Congrats for pushing AI forward together with open datasets ! u/qlhoest — 10 days ago