u/cstocks

Found a hack for getting my CV to stand out much higher - not sure it's ethical

Recently got my CV to receive a much better response rate through open source contributions. Disclaimer: it was all AI generated PRs.

So the idea was to contribute to popular open source repositories and then add them to a list of projects I've contributed to in my CV. I used [Probus](https://github.com/etairl/Probus) to automatically detect & create PRs for Vercel's AI SDK (#14751#14750), n8n (#29405), haystack (#11248) with scanned vulnerabilities. BTW I guess that even using Claude directly might be good enough.

So far I'm seeing very good results in terms of calls back from recruiters that get all hyped up that I contributed to n8n but I do wonder if I'm gonna get "called out" in a live interview with a more technical person..

Thoughts? Is this "ethical"?

reddit.com
u/cstocks — 3 days ago

Heads-up if you run a LangGraph.js app with MongoDBSaver: there's a way for a malicious user to read other people's checkpoints (full conversation state, tool I/O, the lot) by sending a crafted thread_id in their request. Easy to mitigate on your side in one line; upstream fix is in flight.

TL;DR: coerce thread_id to a string before it reaches the saver. String(req.body.thread_id) or z.string().parse(...) is enough.

The bug

// libs/checkpoint-mongodb/src/checkpoint.ts
const { thread_id, checkpoint_ns = "", checkpoint_id } = config.configurable ?? {};
const query = { thread_id, checkpoint_ns };
this.db.collection(...).find(query).sort("checkpoint_id", -1).limit(1);

Attacker payload:

{ "thread_id": { "$gt": "" }, "checkpoint_ns": { "$ne": null } }

find matches every checkpoint, sorted descending, returning the latest one in the whole collection, victim's data and all. app.invoke() calls getTuple automatically when a saver is configured, so any chat handler that takes thread_id from the body triggers it.

Are you affected?

Yes if all three:

  • You use MongoDBSaver.
  • thread_id (or the whole configurable blob) comes from a JSON body or Express qs-parsed query (?thread_id[$gt]= parses into { $gt: "" }).
  • You don't coerce/validate it to a string.

Not affected if thread_id is server-issued (session/JWT), comes from a URL path param, or you're already validating with Zod / typeof === "string".

Mitigation

const thread_id = String(req.body.thread_id ?? "");
// or: z.string().parse(req.body.thread_id)

That closes every payload I tried. The list() method in the same file already has this guard on its filter arg; getTuple just got missed.

Status

Issue: https://github.com/langchain-ai/langgraphjs/issues/2351

Detected automatically with Probus

reddit.com
u/cstocks — 22 days ago
▲ 2 r/Rag

If your RAG pipeline ingests user-influenced data into image documents (uploads, tool-call arguments, third-party feeds, deserialized records), there's a footgun in llama-index-coreworth knowing about.

There's a metadata field on ImageDocument that, if set to a file path, gets opened and base64-encoded with no validation. No "is this actually an image" check, no allow-listed directory, no symlink check. The bytes then ride along to the multimodal model, which usually echoes them back when asked to describe the image.

The practical effect is that anything the process can read is reachable: config files, cloud credential files, K8s tokens, .env, etc.

from llama_index.core.schema import ImageDocument
from llama_index.core.multi_modal_llms.generic_utils import image_documents_to_base64


doc = ImageDocument(metadata={"file_path": "/etc/passwd"})
print(image_documents_to_base64([doc]))  # base64 of /etc/passwd

Per the project's security policy, path validation is treated as the app's responsibility. So if you're shipping a RAG product on llama-index, you should:

  • Stop honoring the file_path metadata key entirely if you can
  • Otherwise, resolve the path and require it to live under a known image directory
  • Reject symlinks, validate MIME and size

Tracking issue: https://github.com/run-llama/llama_index/issues/21512

Detected automatically by Probus: https://github.com/etairl/Probus

u/cstocks — 22 days ago

Getting attention (i.e. marketing) is the real hard part. Glueing together pieces of code is always the easy (and fun) part. That one project that somehow got exploded (whether it's Github stars, signups or w/e), how did it explode?

reddit.com
u/cstocks — 25 days ago