r/ProxyEngineering

▲ 23 r/ProxyEngineering+1 crossposts

Testing 8 different proxy providers

Hey folks. We ran a quick benchmark last week across 8 residential proxy providers for a price-tracking use case that we're working on. Sharing the numbers since most "comparisons" out there are clearly affiliate bait. Methodology (disclosing this upfront): 500 requests per provider, same rotating residential tier, tested against Cloudflare-protected targets, Amazon, Walmart included. Measured 200 OK rate and median response time. No free trials, paid out of pocket for each (business budget).

Providers Success Rate Median Latency Additional Comments
Oxylabs 97.8% 0.74s Good consistent geo coverage, few error codes
Decodo 97.1% 0.79s Strong value at mid-tier pricing
Bright Data 96.6% 0.83s Pricing is crazy for smaller scale, but stable IPs
NetNut 95.2% 0.69s Fast but more blocks on Walmart
SOAX 94.7% 0.95s Good for geo-targeting, slower in general
NodeMaven 93.8% 1.02s Surprised us, better than expected, we haven't heard of NodeMaven before, decided to try it out
Rayobyte 91.3% 1.14s Decent for low-tier targets
Webshare 89.6% 1.31s Cheap, but comes with downsides, such as latency and success rate

Oxylabs takes the top spot on consistency and geo targeting. Decodo punches above its price. Bright Data's IP quality is still up there but the cost structure doesn't make sense unless you're pushing serious volume, mainly oriented towards Enterprises, which is understandable. If you're running under 50GB/month, skip Bright Data entirely, the per-GB cost at that volume doesn't add up. Unless of course, you'd like to test it out like we did.

Anyone run these against tougher targets like Ticketmaster or Zillow? Would like to understand how the performance shifts on all of these providers once you target something really "heavy".

Any other providers for suggestions to test them out? I know the main ones are the big players in the market, but sometimes "Mid-tier" providers works well too for decent pricing if you're not scaling. I guess it's also possible to just utilize dedicated scraping solutions, might be cheaper, and you don't have to worry about maintaining the IPs, as all of the infra is managed by the provider

reddit.com
u/night_2_dawn — 1 day ago

Anyone else using proxies and still getting inconsistent results?

Proxies definitely help with access, but I learned they’re not the whole setup. At first I thought changing IPs was enough, but if everything still runs from the same browser setup, same device traces, same habits, and same workspace, accounts can still end up feeling connected.

That’s why I started using Geelark. I can match different proxies with separate cloud phone environments, so each account feels more isolated instead of everything overlapping. Made testing way easier because I wasn’t just changing the IP anymore, I was changing the whole environment too.

Anyone else using proxies but still getting weird account behavior sometimes?

reddit.com

Excel can scrape websites directly and dump the data into your sheet??

I had to share it with you all. Every week I'd manually open a supplier's pricing page, copy the table, paste it into Excel, fix the formatting that always broke, delete the garbage rows, and repeat for 24 different sites. Easily around 2 hours every Friday. I just assumed that's how it worked. Nobody told me there's another way. Go figure. Turns out VBA can open a web page, parse the HTML, and pull exactly what you need, no browser, no copy-paste, no cleanup:

Sub ScrapePrices()
Dim http As Object, html As Object
Dim rows As Object, i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://example-supplier.com/prices", False
http.Send
Set html = CreateObject("HTMLFile")
html.body.innerHTML = http.responseText
Set rows = html.getElementsByClassName("price-row")
For i = 0 To rows.Length - 1
Cells(i + 1, 1) = rows(i).innerText
Next i
End Sub

It hits the URL, grabs the HTML, finds every element with the class price-row, and writes each one into your sheet. The whole thing runs in under 15 seconds. What used to ruin my Friday afternoons now happens while I get coffee and chat with colleagues without feeling anxious of the remaining work that has to be done. Also, you do need to peek at the page source (F12 in your browser) to find the right class or tag to target, but that takes about 2 minutes once you know what you're looking for. Works on any site that doesn't require a login or JavaScript to load the content. If you've been manually copy-pasting from the same websites over and over, you're probably one macro away from never doing it again.

TLDR: The code uses MSXML2.XMLHTTP and HTMLFile to fetch and parse a webpage.

If y'all have further improvements, let me know, I'm fairly new to the automation world, but boy do I love it already

reddit.com
u/HezzyBear_97 — 2 days ago
▲ 8 r/ProxyEngineering+1 crossposts

How to tell if your proxies are getting detected

A lot of people think proxies either “work” or “don’t work”.

In reality, proxy detection usually happens gradually.

Your requests may still go through while websites quietly lower your trust score in the background.

Some common signs that your proxies are starting to get detected:

  • captcha frequency suddenly increases
  • logins require extra verification
  • sessions expire much faster
  • account actions stop getting reach/impressions
  • checkout flows fail more often
  • certain accounts get flagged repeatedly
  • request success rates slowly decline over time

One of the biggest mistakes beginners make is only testing whether a site loads.

That tells you almost nothing about long-term proxy quality.

What actually matters is:

  • session stability
  • IP reputation
  • fraud scoring
  • subnet quality
  • consistency across multiple days

A proxy setup that survives 50 requests can still completely fail at 5,000 requests.

If you’re serious about testing proxies, monitor:

  • captcha rates
  • account health
  • trust score behavior
  • request success over time
  • how different subnets perform

Most detection systems today care much more about behavioral patterns and IP reputation than just “is this a proxy”.

reddit.com
u/tinyprincesslol — 1 day ago
▲ 34 r/ProxyEngineering+2 crossposts

FireCrawl just hit 121k GitHub stars and I have a LOT of questions, the hype, the pricing trap, and what's actually going on

Okay, so I've been in the web scraping game for quite some time now. I was browsing the GitHub top-100 stars list yesterday and saw it sitting at #73 globally with over 120k stars. That's ahead of Node.js. That's in the same breath as projects that have been around for a decade. For context, at the end of 2024 they celebrated 20k stars. They raised their Series A in August 2025 at 43k stars. Now it's 120k+. That's roughly 3x growth in under a year, for what is essentially a web scraping API aimed at AI developers. What in the world happened? How did a scraping API beat Node.js in stars? The repo describes itself as "search, scrape, and clean the web for AI agents." Useful, I'd say. But 120k-star useful?? There are open-source alternatives like Crawl4AI with 65k stars doing very similar things for free. Is it just incredible timing with the AI/RAG pipeline wave, or is there genuine technical moat here that the community is rewarding? My main main concern is the star count organic? I'm not accusing anyone of anything, but a jump from ~20k to 120k in roughly 16 months is one of the most aggressive trajectories I've seen outside of projects with massive corporate backing (and I'm thinking of Microsoft's markitdown). FireCrawl got $14.5M Series A from Nexus and YC. Is any of that marketing spend showing up in developer mindshare as stars? I'm genuinely curious how you break into the GitHub top-100 that fast. Additionally, can someone explain the pricing to me without making my head hurt? On the surface it looks simple: 1 credit = 1 page scraped. But the moment you turn on anything useful, AI extraction, JSON output, Enhanced Mode, you're burning 5–9 credits per page. The Hobby plan at $16/month gives you 3,000 credits, which sounds great until you realize that's only ~333 pages with JSON + Enhanced Mode enabled. A 500-page website on the Hobby plan exceeds your entire monthly allowance in a single scrape. Now before someone says "just self-host it", that's an option, yes, it's AGPL-3.0 open source. But the self-hosted version is deliberately crippled: no Fire-Engine (their proprietary anti-bot system), no proxy rotation, no Actions endpoint, no browser sandbox. The stuff that actually makes it worth paying for is cloud-only. AGPL also means commercial self-hosting has licensing implications your legal team needs to look at and that's if you're within a company, if you're a an individual developer, well, that can get quite expensive. To be fair, the product genuinely seems excellent. Zapier, Shopify, Replit, and Apple are customers. The clean markdown output uses 67% fewer tokens than raw HTML. The MCP server integration means you can pipe live web data straight into Cursor or Claude. That's real value, and the community clearly feels it. But I keep coming back to the same question: is this one of the best-marketed developer tools of the AI era, or is it genuinely the best technical solution? Someone kindly explain what is going on with firecrawl

reddit.com
u/Gwapong_Klapish — 4 days ago
▲ 14 r/ProxyEngineering+1 crossposts

Self-hosted log aggregation for a small homelab?

I'm trying to set up centralized log collection for my homelab, currently running about 8 machines across a mix of Proxmox VMs and bare metal. I've looked at hosted solutions like Datadog and Logtail, but I want something I control entirely, ideally without phoning home or requiring a cloud account. How I imagine it to be: open-source, lightweight enough to run on a small NUC, supports structured logs (JSON), and has a decent query UI. Grafana Loki keeps coming up, is it actually usable at this scale, or is it overkill? If you've moved away from ELK because of resource usage, I'd like to know more. All suggestions are welcome

reddit.com
u/WarAndPeace06 — 7 days ago

Proxy Concern

Proxy Concern

Goof day, I have a question regarding shared static residential proxies here.

I uploaded my first video on tiktok and got 6Ok+ views from tier one countries with proper warm up procedure and stuff, However 4days later I got shadowbanned like 0 views, Is it because the static residential proxy that I used was shared which led to getting abused and eventually me getting shadowbanned on tiktok?

Because I made one brand new acc with the same ip from the provided proxy info and properly warmed it up for 3days straight then when I decided to upload a video, I got 0views

Should I buy a new static residential proxy from iproyal but instead of the shared subscription, I buy the private one instead?

reddit.com
u/Johndavis70 — 7 days ago
▲ 25 r/ProxyEngineering+4 crossposts

ChatGPT lawsuit opinions

I've been following the OpenAI lawsuits and the one detail I can't stop thinking about: a 19-year-old asked ChatGPT about mixing sedatives, it acknowledged the combo "could be risky", then gave him dosages anyway, added Benadryl to the recommendation, and told him to go lie in a dark room instead of seeking help. He died. Source. The Canadian case is somehow worse. OpenAI's own safety team flagged the shooter's account for "gun violence activity and planning" months before the attack and pushed to notify authorities. Management said no. Source. At some point "we're just a general-purpose tool" stops being a defense. Where that point is, that's what these trials are actually going to decide. Guardrails are coming whether the industry wants them or not. Every lawsuit forces a paper trail. And when harmful outputs become liability, the instinct is aggressive filtering, mandatory escalation triggers, activity logging with retention policies. Fine for consumer chat, however, for more tech enthusiasts its going to be brutal. Now the real risk for scraping and agentic workflows is over-correction. If "how do I access this data at scale" gets flagged the same way "how do I build a weapon" does, open-weights models win by default. It would make me want to just run it locally and skip the compliance layer entirely. The smarter play would be tiered access, stricter defaults for consumer products, more permissive behavior for verified API users with actual business context, but that requires product nuance, and right now OpenAI is in legal defense mode.

My bet is that we should expect more API friction over the next 12-18 months. Local models are about to get a lot more interesting.

reddit.com
u/ahiqshb — 9 days ago
▲ 44 r/ProxyEngineering+2 crossposts

Stop throwing residential proxies at everything, your fingerprint is the actual problem

Aight, listen up, Imma keep it real with you. I know this is going to rub some people the wrong way, but I've been doing this long enough to feel confident saying it, most of you don't have a proxy problem, you have a fingerprint problem, and you're spending $200+/month on residential bandwidth to brute-force your way around it. I get it. Residential proxies feel like the safe default. The IP looks clean (for those who are checking by these fraud scores), it passes basic geo checks, and every provider markets them like they're the golden ticket. if your TLS fingerprint screams "Python requests library" or your browser automation is leaking navigator properties that no real Chrome session would ever have, it genuinely does not matter how pristine your IP is. Cloudflare, Akamai, DataDome, they all fingerprint the client now, not just the address. A burnt datacenter IP with a properly spoofed JA3 hash and realistic header order will outperform a fresh residential IP attached to a naked requests.get() call nine times out of ten. I ran a test a few weeks ago, across about 15 mid-sized e-commerce sites protected by Cloudflare. Datacenter proxies with curl-impersonate had a ~91% success rate. Residential proxies with default Python requests headers? Around 60%. The residential IPs were objectively "better" IPs, they just didn't matter because the request itself was the red flag. I think the proxy industry benefits from people not understanding this. The less you know about TLS fingerprinting and HTTP/2 header frames, the more bandwidth you burn through rotating IPs trying to find one that "works." That churn is literally their revenue model. Before you upgrade your proxy plan, spend an afternoon with curl-impersonate or look into how got-scraping handles fingerprint randomization. Learn what JA3 and JA4 fingerprints actually are and how to check yours against real browser signatures. You might find that the $30/month datacenter plan you dismissed does the job just fine once your client stops identifying itself as a bot on the first handshake. Now, I'm not saying residential proxies are useless, for account management, social media automation, and anything session-heavy where the IP itself gets scored over time, they're still the right call. But for scraping? Fix your fingerprint first. Then decide if you actually need the expensive IPs

reddit.com
u/MemeLord-Jenkins — 11 days ago
▲ 23 r/ProxyEngineering+2 crossposts

Stop hardcoding your scraper logic: use the browser's Copy as cURL first

My two cents. Most people spend bunch of time reverse engineering request headers when the browser will just hand them to you. Next time you find an API call in the Network tab, right click it and hit Copy as cURL. Paste it into your terminal and it works instantly, cookies, headers and all. From there you can import it directly into Postman or use a tool like curlconverter to turn it into clean Python requests code in seconds. The browser already did the hard work of figuring out what the server needs. There's no reason to reconstruct that by hand

reddit.com
u/Bharath0224 — 11 days ago

The ASN dimension nobody talks about enough

Here's something I don't see discussed much: ASN-level scoring is getting granular. It's not just "is this a datacenter ASN" anymore. Some residential providers concentrate their IPs in a handful of ASNs, and those specific ASNs are getting burned because the ratio of bot-to-human traffic from them is absurdly high. I've seen targets where AS7922 (Comcast) IPs sail through but AS20001 (Charter/Spectrum) IPs from the same provider get challenged at 3x the rate and it's not because Spectrum is inherently worse, but because that specific provider had most of their pool concentrated there and the target's ML model learned the pattern. What I do: I check the ASN distribution of my proxy pool before committing to a provider. If more than 40% of their residential IPs resolve to the same 3 ASNs, that's a red flag. You want entropy. Real user traffic is distributed across hundreds of ASNs in any given metro area, your scraping traffic should approximate that distribution.

Quick note on mobile/4G proxies since I know this sub likes the deep dives. CGNAT is still your best friend here. Carriers stuff thousands of real users behind a single IP, which means even aggressive anti-bot systems are reluctant to block mobile ASN ranges, why? Well, too much collateral damage. But the window is closing. I'm seeing more targets fingerprint at the device/viewport level specifically for mobile ASNs, because they know the traffic coming from mobile IPs with a desktop user-agent and a 1920x1080 viewport is almost certainly not a real phone. If you're using mobile proxies, your entire request profile needs to actually look mobile: viewport, user-agent, accept headers, even the connection timing patterns (mobile connections have higher, more variable latency than fiber, if you're hitting a site from a "mobile" IP with 8ms consistent RTT, that's a signal).

All in all, the most expensive residential bandwidth in the world won't save you from a bad session model. But a well-designed session model can make cheap datacenter proxies work on targets you assumed required residential.

reddit.com
u/boomersruinall — 10 days ago
▲ 16 r/ProxyEngineering+2 crossposts

Http3 residential proxies

Has anyone had any luck with them? Particularly scraping? How is the success rate compared to regular http/https or socks5

reddit.com
u/WarAndPeace06 — 14 days ago