u/chill-botulism

[dataset] 2.3M U.S. employer profiles joined across 16 federal enforcement agencies (OSHA, EPA, EEOC, WHD, MSHA, and more) — free, CC BY 4.0

[dataset] 2.3M U.S. employer profiles joined across 16 federal enforcement agencies (OSHA, EPA, EEOC, WHD, MSHA, and more) — free, CC BY 4.0

Full disclosure [self-promotion]: I'm the solo builder. Happy to answer questions about the data, methodology, or entity resolution approach.

I built FastDOL, a platform that links federal workplace enforcement records across agencies into a single employer profile. The government publishes this data, but each agency has its own database, its own identifiers, and its own terrible search UI.

The cross-agency dataset links enforcement records from OSHA, WHD, MSHA, EPA, EEOC, OFCCP, OFLC, and others at the employer level with parent-company rollup. The interesting finding: employers cited by 3+ agencies have a 3.4x higher worker fatality rate than employers cited by 1-2 agencies.

Four open datasets available so far, all CC BY 4.0:

  • Cross-Agency Federal Violations by Employer (~2.3M rows)
  • OSHA Construction Enforcement by Employer (377K rows)
  • OSHA Citations Q1 2026 (28,827 rows, citation-level)
  • WHD Wage Theft Enforcement Actions by Employer

All hosted on Hugging Face, Kaggle, and Zenodo with DOIs. Full schema, methodology, and BibTeX on the canonical pages: https://www.fastdol.com/datasets

u/chill-botulism — 1 day ago