r/Kolsetu

There are two kinds of engineering teams: those who think they understand open-source licensing, and those who have been burned badly enough to read SPDX identifiers the way historians read ancient curses.

Modern software is not written, it is assembled. Every npm install is a blind date with legal consequences. Every dependency is a houseguest who may or may not steal the silverware. Copyleft licences are the guests who insist that since they helped you move a sofa, they now own half your living room.

Most companies handle open-source governance the way a toddler handles a biscuit tin: they know the rules exist, they know the consequences are real, but they also know nobody is watching closely enough. Yet. That is how codebases end up with "just one more tiny library", the way nights out end with "just one more drink": quietly, consistently, and with catastrophic potential. When that moment arrives you do not just fail an audit. You discover the software-supply-chain equivalent of finding skid marks somewhere they absolutely should not be. Nobody wants to investigate how they got there, but everyone agrees something very wrong has happened.

Our rule is simple: one whitelist of approved licences. If a licence is not on it, the build fails. Instantly. Automatically. Without discussion. The pipeline does not negotiate with tiny. Want a new licence added? That is not a ticket. That is a mythic quest. You begin bright-eyed and full of optimism. Somewhere around clause 3(b), doubt creeps in. Somewhere around clause 11, you begin to age. Junior developers become seniors. Seniors start researching ergonomic chairs. You communicate only in SPDX identifiers. And then - if you return from this bureaucratic hellscape carrying your legal analysis, compatibility matrix, and the blessing of the Ancient Gods of Compliance. Congratulations, you have reached your personal Ithaca. You are older. You are wiser. You are traumatised. You have prevailed. That is precisely why it works: once a licence survives that odyssey, it is safe forever.

This only works if you start before the codebase does, and you scan everything: every dependency, every transitive dependency, every licensing string hiding inside someone's weekend side project. Try bolting enforcement onto an existing product and every old commit becomes a crime scene, every release a hostage negotiation with your own history. Relying on developers to remember licence obligations is like relying on office colleagues not to steal biscuits. Noble in theory. Hilarious in practice.

A quick field guide to what you are actually dealing with:

Permissive (MIT, BSD, Apache 2.0): Take the biscuit, eat it, build a billion-dollar company with it, close the recipe, sell it on Etsy. Just credit the baker. Easy.
Weak copyleft (LGPL): You can link to the library from proprietary code. But if you modify the library itself, you publish those modifications. You may borrow the biscuit to dip in your tea. Change the recipe though, and everyone gets to see it. Fair is fair.
Strong copyleft (GPL): You touched the biscuit. The biscuit now owns you. Any derivative work becomes GPL. No half measures, no private batches, no "we only used a tiny bit."
Network copyleft (AGPL): Closes the SaaS loophole - even users accessing your software over a network trigger derivative work obligations. AGPL does not care whether you ate the biscuit, photographed it, or just looked at it through glass. If you touched the dough, the world gets your recipe.
Public domain / Unlicense: Biscuits left on the office counter with no note. They might be safe. They might not. If you eat them, that is on you.
Custom licences: The legal equivalent of discovering someone else's pre-chewed biscuit in your mouth. You do not know where it has been. You want it out immediately.

The part that gets people: even if you did not install copyleft code, your dependency might have, and your dependency's dependency might have, and suddenly your entire product is GPL because some cheerful library three layers deep refused to play by permissive rules. You did not eat the biscuit. You ate the cake made with the biscuit crumbs nobody declared. Now the whole bakery is public.

During our ISO 27001 certification, our auditors flagged our open-source governance as exemplary. Not because we write hymns praising SPDX formats, but because we could prove with logs, automation, and history that nothing enters our codebase unless it is licensed correctly and enforced automatically. Governance without enforcement is polite fan fiction. Governance with automation is evidence. Auditors love evidence more than oxygen.

Do you fancy to read more articles and blogs? If yes, here you go: https://kolsetu.com/blog

A quick confession. I usually stay away from technical implementation detail, because when it comes to actual engineering I am about as useful as a chocolate soldering iron. This post is different. Sub-processors and data flows are one of the very few areas where I genuinely know what I am talking about. So for once, consider me briefly competent.

At some point in the last year, someone on your team added an SDK. Then another one. A webhook. An error monitoring agent. An AI API where you pipe user input and get a response back. Each of those felt like a tool decision. Each of them is also a data governance decision. The two things rarely happened in the same conversation.

Under GDPR, every third party handling your users' personal data on your behalf is a data processor. You are legally responsible for them. If something goes wrong, regulators will ask you to account for the full chain, not just your own code. Most teams cannot do that, not because they are careless, but because the integrations accumulated faster than the documentation did.

Start with your network traffic, not a spreadsheet. Pull every outbound destination your application connects to. Then walk the codebase: every SDK, every API key, every webhook endpoint, every external call with a payload that could be linked back to a person.

Common places builders miss: error monitoring tools (stack traces contain more PII than you think), AI model APIs (their DPA terms have evolved: the version you accepted at signup may not reflect current terms), staging environments (often connected to the same third-party tools as production, sometimes with production data), and libraries with embedded analytics that phone home by default. "We did not know the library was doing that" is not a sentence that lands well in front of a regulator.

For each integration, three questions. Does it receive personal data? Do you have a signed DPA with this vendor? Where is the data processed? And if outside the EU, is there a valid transfer mechanism documented)?

Two things worth actually reading in any DPA before you accept it: whether the vendor can use your data to train their models, and what happens to your data when you stop paying. Both are frequently worse than the marketing copy implies. A vendor that refuses to sign a DPA entirely is a red flag, not a negotiating position.

Triage what you find, fix the highest-risk gaps first, and write it down. GDPR requires a Record of Processing Activities and what you just built is the foundation. Add a processor review to your integration checklist so you are never doing this retrospectively under deadline pressure.

Regulators are not auditing most startups. The real reason to do this is simpler: a breach involving data you did not know was being processed by a vendor you had not mapped is a different category of problem than one you can fully account for. One is an incident. The other is evidence that your data governance does not exist.

Do the audit. A few days of uncomfortable discovery, and then you know what you actually have.

Full step-by-step breakdown here: https://kolsetu.com/blog/your-processor-list-is-longer-than-you-think

Fancy to read more? Take a look at our blogs: https://kolsetu.com/blog

Open-source licensing will bite you

You probably have more data processors than you think