A Hands-On Review of Thordata for Data Engineers

If you do any kind of serious data engineering work that involves pulling public data off the open web, you've probably hit the same wall I did last quarter: your scrapers run beautifully for a week, then suddenly every request returns a 403, a CAPTCHA, or a beautifully rendered page of nothing. Welcome to anti-bot 2026.

I've been quietly testing Thordata for the past two weeks across three production-ish pipelines, and this is the honest, unvarnished writeup. No fluff, no "top 10" filler — just what it's like to actually wire this thing into a real ETL stack.

Thordata

High-quality proxy service for web data scraping

Starting at Residential from $0.65/GB, ISP from $0.75/IP, Unlimited from $69/day

Learn More

Why I Even Looked at Thordata

My team was using a bigger-name competitor (you can probably guess) and the bill was creeping toward four figures a month for what amounts to a few hundred GB of residential traffic. When the budget conversation came up, I started shopping. Thordata kept appearing in proxy comparison roundups with two specific claims that made me curious:

Residential proxies starting at $0.65/GB (about half what we were paying)
An unlimited daily plan at $69/day with unrestricted bandwidth

For a data engineer running batch jobs, that second one is genuinely interesting. Most providers charge by GB, which makes high-volume crawls financially terrifying. A flat rate flips the math entirely.

The Setup: What I Tested

I ran Thordata against three workloads I actually deal with day to day:

E-commerce SKU monitoring — ~50K product pages/day across three retailers, geo-targeted to the US
SERP tracking — Google rank checks for 2K keywords from five different countries
Bulk dataset collection — A one-shot crawl of ~3M public pages for a model training experiment

The first two used residential proxies on the pay-as-you-go plan. The third was the perfect excuse to stress-test the Unlimited plan.

Integration: How Painful Is It?

This is where most proxy services lose me. I've onboarded six different providers in the last two years and the dashboards are universally terrible. Thordata's isn't winning any design awards either, but the actual integration is straightforward.

You get a username/password combo and an endpoint. From Python:

proxies = {
    "http": "http://user-zone-resi:pass@gw.thordata.com:9999",
    "https": "http://user-zone-resi:pass@gw.thordata.com:9999",
}
requests.get(url, proxies=proxies)

That's it. Sticky sessions are appended to the username (user-session-abc123), and geo-targeting works the same way (user-country-us-state-ca). If you've used any proxy provider before, the muscle memory transfers immediately. There's also a Chrome Proxy Manager extension for ad-hoc work, which I appreciated for one-off debugging.

Performance: The Numbers That Actually Matter

Let's get to the part data engineers care about. I logged everything across the two-week test.

Residential Pool

Success rate on e-commerce targets: 94.2% (vs 96.1% with the previous provider)
Median latency: 1.4s for first byte
CAPTCHA rate: ~3% on Google SERPs, near-zero on retail

That 2-point success-rate gap is real, and on certain targets (looking at you, anything Cloudflare-Enterprise) it widened. But for the majority of public data sources, the gap is well within "acceptable trade for half the price."

Unlimited Plan Stress Test

This is where it got fun. I pointed a 32-worker scraper at the bulk collection job and let it run.

Throughput: Sustained ~18 Mb/s per IP across multiple concurrent threads
Total bandwidth in 24h: ~2.3 TB (which would have been eye-watering on a per-GB plan)
Cost: $69 flat

For anyone doing AI training data collection or large-scale archival crawls, this pricing model is just objectively better. I'd recommend pairing it with a proper job scheduler so you're not paying for idle days.

The Web Scraper API: Mixed Feelings

Thordata also offers a managed Web Scraper API that handles JS rendering, CAPTCHAs, and fingerprint spoofing. You POST a target URL, get clean structured data back.

It works well — I'd estimate it cut my CAPTCHA-handling code by about 80% on a tricky JS-heavy target. But there's a real cost: each Scraper API request runs you significantly more than a raw proxy request. For most pipelines I'd still wire up Playwright + raw residential proxies and pocket the savings. The Scraper API is a great fit when engineering time is more expensive than infra cost — which, to be fair, is most teams.

If you're comparing scraper APIs specifically, our scraper API roundup covers the alternatives.

Where Thordata Falls Short

I want to be fair. A few things bugged me:

Support response time was inconsistent. One ticket got answered in 20 minutes; another took two days. For a paid service that's annoying.
Documentation has gaps. The Scraper API examples are thin and a few SDKs are missing. I had to read the dashboard's network tab once to figure out a parameter name.
Newer company. Thordata launched in 2022, which means it doesn't have the multi-decade reputation of established proxy giants. For mission-critical workloads where you need a procurement-friendly vendor, this might matter.
Some user reviews flag stability issues. I didn't personally hit any during my test, but if you're dependent on five-nines reliability, do your own pilot.

How It Slots Into a Data Engineering Stack

Here's how I'd actually deploy Thordata at a real company:

Airflow/Dagster DAG triggers a containerized scraper on a schedule
Thordata residential proxies for the bulk of public-web HTTP requests
Thordata Scraper API specifically for the 5–10% of targets that need full browser rendering
Unlimited plan spun up only on days when you're doing big crawls (you can buy single days)
Output lands in object storage, then gets processed by your usual data pipeline tools

The tier mixing is what makes this economical. You don't pay for capacity you don't use.

Who Should Actually Use This?

Good fit:

Solo data engineers and small teams scaling scraping past free-tier limits
AI/ML teams collecting training data where bandwidth bills are the real constraint
Anyone running price monitoring, SERP tracking, or SEO data collection at moderate scale

Probably not a fit:

Enterprise teams that need SOC 2 audited vendors and a named CSM
Anyone scraping a single high-value target where 2% extra success rate is worth 2x the price
Teams that need 24/7 white-glove support

My Verdict After Two Weeks

Thordata is a serious, capable proxy provider that punches above its price tag. It's not the absolute best at any single thing, but the price-to-performance ratio is excellent, and the Unlimited plan is a genuine cost killer for high-volume work. There's a free 1GB trial that takes about three minutes to sign up for, which is the right way to test it on your own targets.

Will I switch our production pipelines over? Probably yes for the bulk-collection jobs, no for the latency-critical SERP work where the previous provider's success rate justifies the premium. That kind of mixed deployment is, honestly, where most engineering teams should land regardless of which providers they pick.

If you want to see how Thordata stacks up directly against other options, our proxy and scraping roundup covers the full landscape.

Frequently Asked Questions

Is Thordata good for data engineers specifically?

Yes, particularly for batch and ETL workloads. The unlimited daily plan and pay-as-you-go residential pricing fit well with how data engineering teams budget infrastructure. The integration is also vanilla HTTP proxy, so it drops into any pipeline that uses requests, httpx, scrapy, or Playwright.

How does Thordata pricing actually compare to Bright Data or Oxylabs?

Residential proxies on Thordata start at $0.65/GB versus roughly $4–8/GB for the established competitors. The gap is real but you're trading some success rate (around 2 percentage points in my tests) and brand maturity for the savings.

Can I use Thordata for AI training data collection?

This is arguably the strongest use case. The Unlimited plan at $69/day with unrestricted bandwidth makes massive crawls financially predictable. I pulled 2.3 TB in a single day for a flat fee, which would have cost thousands on per-GB pricing.

Does the Web Scraper API handle JavaScript-heavy sites?

Yes. It does headless browser rendering, fingerprint spoofing, and CAPTCHA solving on the server side. You POST a URL and get clean HTML or structured data back. It works well, but costs more per request than raw residential proxies, so use it selectively.

What about IP geo-targeting accuracy?

Country-level targeting is reliable. State and city-level targeting is good but not perfect — I saw the occasional IP located one state over from where I requested. For most use cases (SERP tracking, market research) this is fine. For ad verification where exact geography matters, do your own validation.

Is there a free trial?

Yes, Thordata offers a free 1 GB residential proxy trial for new users. That's enough to run a meaningful test against your own targets before you commit to a paid plan.

Should I replace my current proxy provider with Thordata?

Depends on your bottleneck. If price is the constraint and your targets aren't unusually hostile, the savings are substantial and the quality is solid. If you're scraping high-value targets where every percentage point of success rate matters, keep your incumbent for those and use Thordata for everything else. Mixed deployment is the move.