u/Deep_Giraffe_2615

Hi,

Quick question about public/open data and P hacking.

If we assume first that public datasets (e.g. open government data) are analysed by an unknown number of different groups (possibly testing/modelling different but similar hypotheses). Also assuming that non-significant results are published less/not at all - or at least are much less likely to heavily shared/make it through to public consumption.

Are we in a position where we don't know how many hypotheses have been tested (because the work is being done by separate groups) and therefore the likelihood of spurious significant results being published and shared greatly increases. No individual groups are P-hacking, but collectively we are bound to find (and publish) spurious results as lots of independent teams are analysing data in slightly different ways and don't know about each others tests?

Do we need to be more suspicious of results from open data?

Or have I totally got the wrong end of the stick?

Public data and accidental, collective P hacking