









As you probably known "Open image in new tab" doesn’t work by default in Reddit — it captures the request and embeds the image again. It’s annoying.
Here’s a snippet I paste in to the console to access the image itself:
i0 = $('.max-h-full')
s = i0.src
r = await fetch(s)
b = await r.blob()
u = URL.createObjectURL(b)
i = document.createElement('img')
i.src = u
document.body.replaceChildren()
document.body.append(i)
This will result in only the image showing, and you can zoom in further or save the image. Any other methods?
In a thread on r/datahoarder, I got help to download a whole Tiktok channel. Now I’m thinking about trying to make the on-screen text searchable. I used this Deno script (yah I used AI 💀) to 1) extract frames every so often and 2) run OCR on the frames 3) generate a WebVTT file. The results are pretty meh. As shown in the image
It’s not useless output, but there’s tons of noise.
What about a consensus approach?
Not sure if this is the right term, but I found myself thinking about how the text is the stable with respect to the frame, where as the speaker is moving around. It seems like OCR would be more successful if I computed the "average" of several images in sequence (a bit like compression, come to think of it, but finding the parts that would be compressed…).
Anyway, if I wanted to try this, do you have any suggestions about how I might get it done? Maybe with Imagemagick?
Another tricky detail becomes how not to lose the timestamps, since if I’m computing the average of a moving window of screencaps, then some windows will be better than others because they will contain only one caption…
Anyway, any suggestions welcome. 🙏