u/grio43

Trying to find a cloud-flare scraping solution

I am scraping a few TB with of data.
My experience with 5G proxies in real world application they operate 4G. Most providers throttle you from what I have seen.
Residential would be too much as that would add up fast.

Data centers from what I understand cloudflare curb stomps now.

The over all project is about 10TB
I got 3TB left to get. I was able to get a majority of it with my personal IP before captchas started hitting. From what I understand captcha solvers don't work unless you have a proxy.

reddit.com
u/grio43 — 6 days ago

Cloudflare captcha

Hello,

Tried a few 5G proxy providers and their real world speeds are on the low end 4Gs.
I was wondering if datacenter would be okay with a captcha solver? I am scraping images with direct links to the image so if the TTFB and speed is slow, it will greatly hinder me.

Any recommendations? So far I have gotten 4 Million out of the 6.7 million images I was targeting without a proxy at all.

reddit.com
u/grio43 — 7 days ago

[task] long form video editing

Im possibly looking for a long form video editor.

I do my audio recordings in post. Majority of what I'm looking for is color correction, video clean up. If my face is in a video cover it up.

I'm not 100% sure if I'll go with an editor yet. The longest part for me is audio recording in post. Finding music on pixabay.

I record my videos with a Sony a7 iv. I'll change lenses as needed for what I'm doing.

Feel free to shoot me a dm with questions/offers

reddit.com
u/grio43 — 8 days ago

Need some advise on hand shake

watch?v=RQxgzuMXIhk

I am using a Sony a7 with lenses stabilizing

I have bad anxiety/hand shakes. I am not sure if a camera gimbal would be right as the target locked location would change.

Any advice would be appreciated.

reddit.com
u/grio43 — 11 days ago

Trained a Vit model from scratch for auto tagging

I recently trained a new anime image tagging model. To prep the data, I used SmilingWolf v3 to fix 300k bad tags and fill in 1M missing ones. I also trained an initial baseline model to help identify and add around 30k low-frequency tags.

The current V1 model is a 320x320 ViT. V1.1 is currently training at 448x448, and the higher resolution is already improving accuracy. My next goal is to wait for a 2025 dataset, clean it heavily, and train from scratch with better vocab structures (e.g., artist:name).

You can find the model, card, and demo space on HuggingFace: https://huggingface.co/Grio43/OppaiOracle Live use of the model: https://huggingface.co/spaces/Grio43/OppaiOracle

CPU based tagger
https://huggingface.co/spaces/Grio43/OppaiCPU

Self hosted web interface:
https://huggingface.co/Grio43/OppaiOracle/tree/main/web_interface

Had someone have issues loading the interface on their local machine. Please DM of you have trouble. I need to figure out stand alone issues for general users.

u/grio43 — 13 days ago