u/exoxfanel

Tried a two-agent workflow for an API integration spec this week. Result was about 80% good to publish.

Tried something at work this week that worked better than I expected, and I want to share where it broke down because that's the more useful part.

Context: I'm a Tech BA on a payments API project at a Canadian bank. ISO 20022 backend, Kafka, AWS, the usual stack. New integration coming in, and I needed to produce a functional analysis page in Confluence that matched the team's reference format.

The setup was two agents in sequence. Agent 1 had access to the legacy services codebase and I asked it for the complete flow of the existing integration. Request mapping, validations, Kafka topics, error handling, the lot. Agent 2 had MCP access to Confluence with my PAT, plus the new API contract, plus a link to a reference page that represented "good" in our team. I gave it Agent 1's output, my own writeup of what the new service needs to do, and asked it to produce the new page.

The draft was honestly close to ship-quality on the technical mapping. Giving it the contract was the unlock there, it stopped hallucinating field names and the request/response examples actually validated. Field-by-field, it was solid.

Where I spent my cleanup time was somewhere else entirely.

The agent treated the reference page as inspiration instead of as a contract. It added sections that weren't in the reference (an assumptions block, a risks block, a glossary it invented). It drifted on the color scheme for the Confluence panels and lozenges, which I think is because I gave it the rendered page instead of the storage format. And it put code snippets inside a functional analysis page, which on our team is a hard line, code lives in the technical analysis page, not this one. The agent doesn't know that's a team convention, it just knows "technical documentation usually has code blocks."

Lesson I'm taking from this: the reference page is doing double duty as both the template and the example, and agents can't separate those two roles. Next time I'm going to maintain a stripped-down template with empty sections and inline notes like "field mapping table goes here", separate from a finished example for tone. Then the prompt becomes "fill the template, use the example for tone only, do not copy its structure."

The other thing I'll change is the prompt order. Constraints first (no added sections, no code blocks, match these macros), task last. Agents weight recent instructions heavier, so burying the constraints above the task lets them survive the generation.

PS, audit traceability is a thing worth thinking about if you do this in a regulated shop. Every edit Agent 2 made to Confluence shows up under my name in the page history. Fine for a draft, less fine if it's pushing directly to a page devs reference.

Curious if anyone else has run this kind of two-agent setup for spec work. Especially interested in how you handle the "follow the format exactly" problem, because that's where mine still leaks.

reddit.com
u/exoxfanel — 5 days ago

A recruiter sent me an interesting JD for a BA role on an MLOps/LLMOps project at a bank. Well I studied to prep the interview.

I'm a technical BA in banking, payments and wealth management being my strong suits. I heavily use AI at work but LLMOps, MLOps pipelines, RAG architecture, Azure OpenAI, RAGAS metrics were mostly unknown to me. I like these subjects and maybe it's good to be a future-proof BA who knows. So I studied the subject an entire weekend to prep a potential interview.

Here's a quick summary of my findings:

MLOps = DevOps for ML models. Your job as a BA is to define what "done" looks like at each stage: minimum accuracy before a model goes to prod, what drift threshold triggers retraining, what the escalation path looks like when the model degrades. Functional requirements. Same job, new nouns.

LLMOps flips your testing model. You can't write "assert output == expected" for a model response because it's probabilistic. You need structural assertions (does it include a citation?), constraint assertions, and RAGAS metrics, specifically faithfulness, which catches hallucinations. In banking, that would be a major compliance issue.

RAG is everywhere. The LLM doesn't answer from memory. It retrieves relevant documents at query time and grounds its answer in them. As a BA you define what documents are in scope, how they're chunked, what metadata each chunk carries, and what the citation requirement looks like in the output spec. Functional requirements problem dressed in AI vocabulary.

My existing skills surprisingly transfer more than I thought. Writing an API spec is similar to writing a RAG functional spec. Kafka pipeline UAT maps to LLM evaluation criteria. Payment SLAs map to LLM latency NFRs. The gap is vocabulary, not fundamentals.

Its fun to see new job posting for BAs and functional analysts with different workflows and technologies.

If you're a BA in a similar situation, whether assigned to an AI project cold or prepping for an AI functional analyst interview at a bank, I would love to to share or chat. Open to questions, comments and DMs as usual.

reddit.com
u/exoxfanel — 12 days ago

Been working on this for two years now. Started with a blank Confluence page and a colorful klaxoon board. Now we have a real system with more than 10 microservices, real use cases, test strategy, real signoffs lined up, and a go live date in summer.

Figured I would share this up before the madness starts. If you are a BA, TBA, junior PM, or anyone heading into your first big production launch in a regulated environment, ask away.

Some context on what I worked on without naming names. Payments rail. ISO 20022 backend API. Multi pod deployment in EKS. Real TPS targets, real downstream dependencies, real management pressure. The kind of project where you cannot fake the test results because our consumer/client (frontend) has an army of QAs. They expect defects to be resolved under a few days.

Here are a few things I learned the hard way that nobody told me when we started:

Learn Splunk and Datadog (and link between the two) before you need them. As a BA you being able to pull a log trace by payment ID, read a latency dashboard, flamegraph, and know what metrics your service actually emits. If you have to ask the dev team to read logs for you during an incident, you are dead weight in the war room. Spend a Friday afternoon writing your own splunk queries (add to you bookmarks) and clicking through your service dashboard until you can navigate it without help. Our SA made a very nice datadog dashboard, it helps a lot.

The single pod stress test is the one that actually matters (take history TPS levels as reference) . Everyone wants to skip straight to the multi pod scaled out test because it gives the impressive number. But if you do not know your single pod saturation point you cannot do capacity math, and you will get caught in the readiness review when someone asks for it. BTW Putting the CPR tests in place took much longer than we thought, it quite finnicky, need to use the right library, have separate pods and other challenges.

Production readiness is a process thats needs to be well documented. The meeting is where the document gets challenged. If you walk in without the test plan, the volume profile, the NHP + retry scenarios, and the runbook, the meeting is going to go badly and you will be under the spot.If you are not sure about the documentation ask the RUN team or SRE.

External systems will tell you their component scales well. It's not always the case, CPR tests showed many liars.

Happy to go into any of this. Test plan structure, how to write requirements that survive a perf review, how to push back on architects when the design is flawed, how to talk to SREs without sounding like you are reading from a script or a noob, how to handle the requirements freeze when the business changes its mind D-30 to live.

Open to comments or DMs, will do my best to answer.
PS: Yes I did use use my dear friend Claude to give me a first skeleton.

reddit.com
u/exoxfanel — 18 days ago