services / scraping
Clean data, documented schemas, zero hand-waving.
Playwright-based extraction with responsible rate limits. Every scrape ships with a schema doc so a new hire can use it on day one.
shopify_store_products.csv
| handle | title | vendor | price_usd | inventory | first_seen |
|---|---|---|---|---|---|
| walnut-desk | Walnut Writing Desk | Hearth Co | 489.00 | in_stock | 2026-03-12 |
| oak-chair | Oak Captain's Chair | Hearth Co | 219.00 | low | 2026-03-12 |
| linen-throw | Linen Throw Blanket | Slow North | 78.00 | in_stock | 2026-03-14 |
| brass-lamp | Brass Task Lamp | Wayhome | 134.00 | out | 2026-03-14 |
| wool-rug-6x9 | Hand-knotted Wool Rug 6x9 | Revival | 1,480.00 | in_stock | 2026-03-15 |
| ceramic-vase | Matte Ceramic Vase | Field Kit | 42.00 | in_stock | 2026-03-16 |
showing 6 of 1,287 rows
targets we handle well
Most sites, most days.
- 01 Shopify — products, collections, variants, inventory via
/products.json+ DOM. - 02 Amazon — listings, BSR, review snapshots. Per TOS, no bulk reviews.
- 03 Lazada / Shopee — PH + SEA marketplaces, listing + seller data.
- 04 Google Maps — local business lists with phone, hours, reviews count.
- 05 Bespoke — anything Playwright can drive. Give us a URL + auth.
every scrape ships a schema
No "what does this column mean?"
schema.json · json
{
"handle": "string · shopify product slug",
"title": "string · product display title",
"vendor": "string · brand name, as reported by store",
"price_usd": "decimal · first variant price, USD",
"inventory": "enum · in_stock | low | out | unknown",
"first_seen": "date · first time this handle appeared in our scrapes",
"url": "string · canonical product URL",
"tags": "array · product tags from store",
"variants": "int · count of variants",
"images": "int · count of images"
} pricing
Fixed scope. Paid on delivery.
Single scrape
$500
One target, up to 5,000 rows, CSV + schema doc. 3–5 business days.
- › Single site / API / feed
- › ≤5,000 rows
- › CSV + JSON + schema doc
- › 1-pager of findings
- › 50/50 payment
Multi-target
most common $1,400
Up to 5 related targets, merged CSV, comparative notes.
- › Up to 5 related targets
- › ≤25,000 rows total
- › Merged + normalized CSV
- › Comparative 2-pager
- › 7 business days
Recurring feed
$600 / mo
Scheduled re-scrapes with diffing. You get only the changes.
- › Daily or weekly cadence
- › Diff-only CSV exports
- › Change-log email digest
- › R2 or Sheets delivery
- › Min. 3-month commit
Scraping is only performed on targets you are authorized to access. We do not bypass authentication, circumvent robots.txt for prohibited paths, or exfiltrate personal data without a lawful basis.