u/Additional-Elk-3712

Just open-sourced my personal scraping engine: tiny self-contained binary with Lua scripting
▲ 41 r/thewebscrapingclub+1 crossposts

Just open-sourced my personal scraping engine: tiny self-contained binary with Lua scripting

I originally built it for myself because I wanted something extremely lightweight that runs in the background like it never existed. It's called SpyWeb.

It's designed to be "set and forget." I've had it running for months on my PC tracking job boards without a single crash or memory leak.

Specific features:

  • Zero Runtime: Self-contained ~7MB binary. No Python, Node, or Docker needed.
  • Low Footprint: Uses <5MB RAM at idle.
  • Lua Scripting: Use Lua to handle complex logic like custom headers, JS rendering, advanced monitoring, etc.
  • Hot Reloading: Change a config or Lua script and the job respawns instantly, no restarts.
  • Web Dashboard: Simple local UI to monitor scrape data in real-time.
  • Desktop Alerts: Built-in support for system notifications and webhooks.
  • Embedded DB: Built-in KV store so you don't need a separate database.
  • CDP Support: Controls any Chromium or CDP-compatible browser via Lua for JS-heavy sites.
  • Dual Mode: CLI for servers and a System Tray version for silent background runs.
  • Deduplication: Internal database ensures you never see the same result twice.

I just released the beta with CDP integration. If you need something that just sits in the background and sips resources while actually being maintainable, check it out.

Set up is very easy and straightforward: for server-side rendered pages, it's just a few lines of config (URL, selectors, fields). For JS-heavy sites, you can write a little Lua to launch a browser and drive the workflow.

You can check it out here: https://github.com/spyweb-app/spyweb

u/Additional-Elk-3712 — 6 days ago