
I built a dataset on SDXL + InstantID architecture and tested 14 popular deepfake detectors
I tested 14 popular deepfake detectors on SDXL + InstantID architecture. Six of them performed at or below random (dataset and blog below).
About a year removed for my last research project, I've gotten an itch to dip a toe back in. Releasing full blown papers would be a difficult task to sustain, so I've opted for a substack instead. Here is the TLDR:
What did I do?
I compiled 26K real + generated face crops across 12 demographic cells and benchmarked 14 popular open source models.
What were the results?
Only two detectors achieve near-perfect rank ordering. Only one is deployable as shipped.
Fairness drift is visible in 12 of 14 detectors. Per-cell AUC spread ranges from 0 (cell-invariant) to 0.54 (catastrophic). The aggregate AUC hides where they break.
I'll most likely be targeting liveness detection and working with a more frontier architecture. If you have a model in mind that for the next benchmark, please comment.
Read the full blog post here: https://babalolad.substack.com/p/i-tested-14-deepfake-detectors-on
Access the dataset here: https://huggingface.co/datasets/danb21/synthetic-face-sdxl-instantid-bench