r/madeinpython

GitHub has a serious fake engagement problem and I wanted to see how visible it actually is through the public API, its worse than I thought after I went down that rabbit hole...
▲ 240 r/madeinpython+16 crossposts

GitHub has a serious fake engagement problem and I wanted to see how visible it actually is through the public API, its worse than I thought after I went down that rabbit hole...

Turns out: very visible. Yesterday's scan found 185 out of 185 engagers on a single repo were bots. Not 90%. Not "mostly suspicious". Every single one. The repo had zero legitimate stars.

What I built

phantomstars is a Python tool that runs daily via GitHub Actions (free, no servers):

  1. Scrapes GitHub Trending and searches for repos created in the last 7 days with sudden star spikes
  2. Pulls star and fork events from the last 24 hours per repo
  3. Bulk-fetches every engager's profile via the GraphQL API (account creation date, follower counts, repo history)
  4. Scores each account on a weighted model: account age (35%), profile completeness (30%), repo patterns (25%), activity history (10%)
  5. Detects coordinated campaigns using timestamp clustering and union-find: groups of 4+ suspicious accounts that engaged within a 3-hour window
  6. Files an issue directly on the targeted repo so the maintainer knows what's happening

Campaign IDs are deterministic SHA-256 fingerprints of the sorted member set, so the same group of bots gets the same ID across runs. You can track a farm across multiple days even as individual accounts get suspended.

What the pattern actually looks like

It's remarkably consistent. A fake engagement campaign in the raw data:

  • 40-200 accounts, all created within the same 1-2 week window
  • Zero original repositories, or only forks they never touched
  • No bio, no location, no followers, no following
  • All of them starring the same repo within a 90-minute window
  • The target repo usually has a name implying it's a tool, hack, executor, or generator

Today's scan: 53 active campaigns across 3,560 accounts profiled. 798 classified as likely_fake. The repos being targeted are mostly low-quality AI tools and "executor" software that needs manufactured credibility fast.

Notifying the affected repo

When a repo hits a 40%+ fake engagement ratio or a campaign is detected, phantomstars opens an issue on that repo with the full suspect table: account logins, creation dates, composite scores, campaign membership. The maintainer sees it in their own issue tracker without having to find this project first.

Worth noting: a lot of these repos have issues disabled, which is a red flag on its own. Those get skipped silently.

Why I built this

Stars are how developers decide what to evaluate, what to depend on, what to recommend. When that signal is bought, it affects real decisions downstream. This started as curiosity about how measurable the problem was. The answer was more measurable than I expected.

It's part of broader research into AI slop distribution at JS Labs: https://labs.jamessawyer.co.uk/ai-slop-intelligence-dashboards/

The fake engagement problem and the AI content quality problem are really the same problem. Fake stars are the distribution layer that gets garbage in front of real users.

All open source. The data is append-only JSONL committed back to the repo after every run, queryable with jq.

Repo: https://github.com/tg12/phantomstars

Findings are probabilistic, false positives exist, the README explains the full scoring model. If your account shows up and you're a real person, there's a false positive process.

Questions welcome on the detection approach, GraphQL batching, or campaign ID stability.

github.com
u/SyntaxOfTheDamned — 1 day ago
▲ 12 r/madeinpython+5 crossposts

I’ve put together a small CVE proof-of-concept search tool:

The idea is simple: search public repositories and references that appear to contain proof-of-concept exploit material, indexed by CVE identifier.

You can search by an exact CVE ID, for example:

CVE-2021-44228

Or use a partial string/vendor keyword to find related results.

Current index:

7,466 CVEs

This is not a vulnerability scanner, exploit database, exploit marketplace, or “AI cyber threat intelligence” dashboard.

It is a lightweight lookup layer for quickly checking what public PoC material appears to exist around a vulnerability and which references are associated with it.

The public tier is deliberately limited:

  • 3 searches per day
  • 3 results per query
  • email submission for extended access / whitelist

The intended use case is research, triage, defensive validation, and quick visibility into public PoC exposure.

I’m keeping it narrow on purpose. No magic scoring. No fake certainty. No pretending that the presence of public PoC material automatically means a system is exploitable.

Just indexed public references by CVE, with the limitations kept visible.

Feedback welcome, especially around:

  • search behaviour
  • useful filters
  • false positives
  • reference quality signals
  • whether vendor/keyword lookup is useful or too noisy
github.com
u/SyntaxOfTheDamned — 1 day ago
▲ 81 r/madeinpython+2 crossposts

Typio: Make Your Terminal Type Like a Human

Typio is a lightweight Python library that prints text to the terminal as if it were being typed by a human. It supports multiple typing modes (character, word, line, sentence, typewriter, and adaptive), configurable delays and jitter for natural variation, and seamless integration with existing code via a simple function or a decorator. Typio is designed to be minimal, extensible, and safe, making it ideal for demos, CLIs, tutorials, and storytelling in the terminal.

Repo: https://github.com/sepandhaghighi/typio

u/sepandhaghighi — 3 days ago
▲ 2 r/madeinpython+2 crossposts

I built a fast Python API to extract Open Graph tags and link previews.

Hey r/roastmystartup

I was recently looking into how chat apps generate those beautiful link preview cards (like Discord or iMessage) and realized the process is surprisingly annoying. You have to parse Open Graph tags, Twitter Cards, regular meta tags, figure out canonical URLs, and hunt down favicons in weird HTML link elements.

So I built LinkVault API to do all the heavy lifting in a single, fast API call.

What it does:

Extracts Title, Description, Image, Favicon, Theme Colors, and structured JSON-LD data.

Handles all the complex fallbacks (e.g., if Open Graph is missing, it intelligently falls back to standard HTML tags).

Built-in caching for ultra-fast response times.

Returns clean, standardized JSON.

I just launched it on RapidAPI with a completely free tier so developers can easily drop it into their projects without worrying about hosting scrapers themselves.

I would love for you guys to test it out or roast it!

Link: https://rapidapi.com/mccarthymael2011/api/linkvault-link-preview-api

u/DifferentChampion831 — 14 days ago