u/OkButterscotch8174

Image 1 — I built a tool that tells you where any photo was taken using AI — here's what I learned about geolocation accuracy
Image 2 — I built a tool that tells you where any photo was taken using AI — here's what I learned about geolocation accuracy

I built a tool that tells you where any photo was taken using AI — here's what I learned about geolocation accuracy

Been lurking here for a while and finally shipped something worth sharing.

A few months ago I got obsessed with a simple question: how accurately can -AI determine the location of a random photo? Not just "probably Europe" — actual coordinates.

Turns out it's a genuinely hard problem. The naive approach (just ask Claude/GPT to look at the image) gets you maybe 40-50% accuracy on urban photos and falls apart completely on rural ones.

So i went deeper. The pipeline I ended up with:

  1. EXIF extraction first — if GPS metadata exists, done instantly, zero AI needed. Covers ~20% of mobile photos.
  2. Visual feature extraction via a fast/cheap model — pulls out specific searchable elements (architecture style, visible text, infrastructure details) with a specificity score. Low-score generic queries get dropped before they waste API calls.
  3. Google Vision Web Detection + Landmark Detection in parallel — if the image exists somewhere on the web or contains a known landmark, this catches it.
  4. Web search on the high-specificity queries — feeds real-world results back into the final reasoning step.
  5. Final reasoning with a stronger model that gets the image + all aggregated context. Contradiction detection built in — if web results point to 3+ different locations it flags it and tells the model to weight visual analysis higher.

Total cost per analysis: under €0.02. Most of the accuracy gains came from steps 2-4, not from using a more expensive model.

The interesting failure cases:

- Photos with visible text are almost always nailed correctly

- Rural/forest photos are still genuinely hard regardless of pipeline

- The AI confidently wrong cases dropped significantly once I added

the web search layer

Built it as a SaaS with multi-prediction output (up to 4 ranked hypotheses with confidence %), radius estimate, and a 3D map view.

Still early but the technical side was interesting enough to share.

Happy to go deep on any part of the pipeline if useful.

u/OkButterscotch8174 — 1 day ago

I built a tool that tells you where any photo was taken using AI — here's what I learned about geolocation accuracy

Been lurking here for a while and finally shipped something worth sharing.

A few months ago I got obsessed with a simple question: how accurately can -AI determine the location of a random photo? Not just "probably Europe" — actual coordinates.

Turns out it's a genuinely hard problem. The naive approach (just ask Claude/GPT to look at the image) gets you maybe 40-50% accuracy on urban photos and falls apart completely on rural ones.

So i went deeper. The pipeline I ended up with:

  1. EXIF extraction first — if GPS metadata exists, done instantly, zero AI needed. Covers ~20% of mobile photos.
  2. Visual feature extraction via a fast/cheap model — pulls out specific searchable elements (architecture style, visible text, infrastructure details) with a specificity score. Low-score generic queries get dropped before they waste API calls.
  3. Google Vision Web Detection + Landmark Detection in parallel — if the image exists somewhere on the web or contains a known landmark, this catches it.
  4. Web search on the high-specificity queries — feeds real-world results back into the final reasoning step.
  5. Final reasoning with a stronger model that gets the image + all aggregated context. Contradiction detection built in — if web results point to 3+ different locations it flags it and tells the model to weight visual analysis higher.

Total cost per analysis: under €0.02. Most of the accuracy gains came from steps 2-4, not from using a more expensive model.

The interesting failure cases:

- Photos with visible text are almost always nailed correctly

- Rural/forest photos are still genuinely hard regardless of pipeline

- The AI confidently wrong cases dropped significantly once I added

the web search layer

Built it as a SaaS with multi-prediction output (up to 4 ranked hypotheses with confidence %), radius estimate, and a 3D map view.

Still early but the technical side was interesting enough to share.

Happy to go deep on any part of the pipeline if useful.

reddit.com
u/OkButterscotch8174 — 1 day ago

I built a tool that tells you where any photo was taken using AI — here's what I learned about geolocation accuracy

Been lurking here for a while and finally shipped something worth sharing.

A few months ago I got obsessed with a simple question: how accurately can

AI determine the location of a random photo? Not just "probably Europe" —

actual coordinates.

Turns out it's a genuinely hard problem. The naive approach (just ask

Claude/GPT to look at the image) gets you maybe 40-50% accuracy on

urban photos and falls apart completely on rural ones.

So i went deeper. The pipeline I ended up with:

  1. EXIF extraction first — if GPS metadata exists, done instantly, zero AI needed. Covers ~20% of mobile photos.
  2. Visual feature extraction via a fast/cheap model — pulls out specific searchable elements (architecture style, visible text, infrastructure details) with a specificity score. Low-score generic queries get dropped before they waste API calls.
  3. Google Vision Web Detection + Landmark Detection in parallel — if the image exists somewhere on the web or contains a known landmark, this catches it.
  4. Web search on the high-specificity queries — feeds real-world results back into the final reasoning step.
  5. Final reasoning with a stronger model that gets the image + all aggregated context. Contradiction detection built in — if web results point to 3+ different locations it flags it and tells the model to weight visual analysis higher.

Total cost per analysis: under €0.02. Most of the accuracy gains came from steps 2-4, not from using a more expensive model.

The interesting failure cases:

- Photos with visible text are almost always nailed correctly

- Rural/forest photos are still genuinely hard regardless of pipeline

- The AI confidently wrong cases dropped significantly once I added

the web search layer

Built it as a SaaS with multi-prediction output (up to 4 ranked

hypotheses with confidence %), radius estimate, and a 3D map view.

Still early but the technical side was interesting enough to share.

Happy to go deep on any part of the pipeline if useful.

u/OkButterscotch8174 — 1 day ago
▲ 13 r/nocode+1 crossposts

I built a tool that tells you where any photo was taken using AI — here's what I learned about geolocation accuracy

Been lurking here for a while and finally shipped something worth sharing.

A few months ago I got obsessed with a simple question: how accurately can

AI determine the location of a random photo? Not just "probably Europe" —

actual coordinates.

Turns out it's a genuinely hard problem. The naive approach (just ask

Claude/GPT to look at the image) gets you maybe 40-50% accuracy on

urban photos and falls apart completely on rural ones.

So i went deeper. The pipeline I ended up with:

  1. EXIF extraction first — if GPS metadata exists, done instantly, zero AI needed. Covers ~20% of mobile photos.
  2. Visual feature extraction via a fast/cheap model — pulls out specific searchable elements (architecture style, visible text, infrastructure details) with a specificity score. Low-score generic queries get dropped before they waste API calls.
  3. Google Vision Web Detection + Landmark Detection in parallel — if the image exists somewhere on the web or contains a known landmark, this catches it.
  4. Web search on the high-specificity queries — feeds real-world results back into the final reasoning step.
  5. Final reasoning with a stronger model that gets the image + all aggregated context. Contradiction detection built in — if web results point to 3+ different locations it flags it and tells the model to weight visual analysis higher.

Total cost per analysis: under €0.02. Most of the accuracy gains came from steps 2-4, not from using a more expensive model.

The interesting failure cases:

- Photos with visible text are almost always nailed correctly

- Rural/forest photos are still genuinely hard regardless of pipeline

- The AI confidently wrong cases dropped significantly once I added

the web search layer

Built it as a SaaS with multi-prediction output (up to 4 ranked

hypotheses with confidence %), radius estimate, and a 3D map view.

Still early but the technical side was interesting enough to share.

Happy to go deep on any part of the pipeline if useful.

u/OkButterscotch8174 — 1 day ago

I built a tool that tells you where any photo was taken using AI — here's what I learned about geolocation accuracy

Been lurking here for a while and finally shipped something worth sharing.

A few months ago I got obsessed with a simple question: how accurately can

AI determine the location of a random photo? Not just "probably Europe" —

actual coordinates.

Turns out it's a genuinely hard problem. The naive approach (just ask

Claude/GPT to look at the image) gets you maybe 40-50% accuracy on

urban photos and falls apart completely on rural ones.

So i went deeper. The pipeline I ended up with:

  1. EXIF extraction first — if GPS metadata exists, done instantly, zero AI needed. Covers ~20% of mobile photos.
  2. Visual feature extraction via a fast/cheap model — pulls out specific searchable elements (architecture style, visible text, infrastructure details) with a specificity score. Low-score generic queries get dropped before they waste API calls.
  3. Google Vision Web Detection + Landmark Detection in parallel — if the image exists somewhere on the web or contains a known landmark, this catches it.
  4. Web search on the high-specificity queries — feeds real-world results back into the final reasoning step.
  5. Final reasoning with a stronger model that gets the image + all aggregated context. Contradiction detection built in — if web results point to 3+ different locations it flags it and tells the model to weight visual analysis higher.

Total cost per analysis: under €0.02. Most of the accuracy gains came from steps 2-4, not from using a more expensive model.

The interesting failure cases:

- Photos with visible text are almost always nailed correctly

- Rural/forest photos are still genuinely hard regardless of pipeline

- The AI confidently wrong cases dropped significantly once I added

the web search layer

Built it as a SaaS with multi-prediction output (up to 4 ranked

hypotheses with confidence %), radius estimate, and a 3D map view.

Still early but the technical side was interesting enough to share.

Happy to go deep on any part of the pipeline if useful.

https://preview.redd.it/p0onwzf2oc2h1.png?width=1906&format=png&auto=webp&s=c05cf5fec5a45adf41a7c70cc8545aa764a1a9bc

https://preview.redd.it/8hc4czf2oc2h1.png?width=1822&format=png&auto=webp&s=691c6df429b84ec3a7f8e6c938e2b859d401477d

reddit.com
u/OkButterscotch8174 — 1 day ago

I built a tool that tells you where any photo was taken using AI — here's what I learned about geolocation accuracy

Been lurking here for a while and finally shipped something worth sharing.

A few months ago I got obsessed with a simple question: how accurately can

AI determine the location of a random photo? Not just "probably Europe" —

actual coordinates.

Turns out it's a genuinely hard problem. The naive approach (just ask

Claude/GPT to look at the image) gets you maybe 40-50% accuracy on

urban photos and falls apart completely on rural ones.

So i went deeper. The pipeline I ended up with:

  1. EXIF extraction first — if GPS metadata exists, done instantly, zero AI needed. Covers ~20% of mobile photos.
  2. Visual feature extraction via a fast/cheap model — pulls out specific searchable elements (architecture style, visible text, infrastructure details) with a specificity score. Low-score generic queries get dropped before they waste API calls.
  3. Google Vision Web Detection + Landmark Detection in parallel — if the image exists somewhere on the web or contains a known landmark, this catches it.
  4. Web search on the high-specificity queries — feeds real-world results back into the final reasoning step.
  5. Final reasoning with a stronger model that gets the image + all aggregated context. Contradiction detection built in — if web results point to 3+ different locations it flags it and tells the model to weight visual analysis higher.

Total cost per analysis: under €0.02. Most of the accuracy gains came from steps 2-4, not from using a more expensive model.

The interesting failure cases:

- Photos with visible text are almost always nailed correctly

- Rural/forest photos are still genuinely hard regardless of pipeline

- The AI confidently wrong cases dropped significantly once I added

the web search layer

Built it as a SaaS with multi-prediction output (up to 4 ranked

hypotheses with confidence %), radius estimate, and a 3D map view.

Still early but the technical side was interesting enough to share.

Happy to go deep on any part of the pipeline if useful.

reddit.com
u/OkButterscotch8174 — 1 day ago