u/TranslatorRude4917 — reddlx

I "accidently" turned my e2e tests into MCP tools

Hey guys!

I've been pimping playwright for a while - chasing my obsession of building a tool that lets me create e2e tests quickly while enforcing best practices like proper use of fixtures, semantic POM etc.
I'm pretty far already - UI-based e2e test recording works, giving me proper test steps, POM, UI and API tests - but my current project at work gave me an idea that sent me on a side quest.

tldr;
Check the video:
- I record our dashboard creation flow using my tool in Cursor
- Cursor writes POM, fixtures, e2e test, WebMCP tool definition, wiring
- I ask the AI-Assistant to create a new Dashboard for me
- The assistant creates the dashboard using the newly recorded flow

I've been working on creating our in-app AI assistant during my day job. One of our main goals is helping our users with onboarding: explaining to them how certain features work and where they can find stuff on the UI.
I wanted to take it a step further, since imo showing is better than telling. Certain UI Assistant libraries (we're using CoplikotKit) allow calling FE tools and MCPs. My idea was to expose our main user flow as FE tools to our assistant, so they can do things on the user's behalf - or show them when prompted.

I modified my tool to not only generate POM and e2e tests, but also FE tool and MCP definitions from the same, single source of truth.

So now from one recording, I'm able to generate:
- A single flow.spec.ts file that can execute the same flow using 3 modes:
- ui-based e2e test
- API e2e test
- FE tool test (via WebMCP bridge)
- WebMCP tools for any AI assistant use (claude, codex etc)
- Wiring WebMCP tools into our in-app CopilotKit assistant

It's still super early, but I've always been fascinated by the idea of having a single source of truth for features, exposing them to the world through different interfaces (UI, API, MCP, whatever you want).

Next things I probably want to do:
- define API-based WebMCP tools using the same approach, so the user can choose if they want the UI showcase or the fast track.
- Zoom out a little, and consider what this means from a security perspective :D

What's your opinion? Have you tried something similar on your own?
Is this something you would find useful or exciting, either from the testing or user-facing /UX perspective?

u/TranslatorRude4917 — 2 days ago

▲ 24 r/softwarearchitecture+6 crossposts

Convergence Mechanisms: Confidence in the Age of Agentic Engineering

A useful agentic change does not end when the diff appears. It ends when the system is coherent again.

I watched this exact loop last month: we asked an agent to tighten signup validation. It updated the form, the server-side validator, even the e2e test. Green across the board. We shipped. Two minutes later realized the password reset was broken.

A software change is not merely code moving. It is a shift in the requirement set.
There's a gap between the physical system (code, tests, schemas, diffs) and the theoretical system (requirements, contracts, constraints). Agents edit the former. We care about the latter.

When those two layers drift apart, an agent can satisfy the explicit task while breaking an implicit requirement nobody named and no automated check protects.

This post is my attempt to reason about that gap, and how to structure an agentic engineering harness around requirements, contracts, and deterministic feedback loops instead of just writing longer instruction files.

If you're interested, give it a read. If not, maybe let me know what I could do better!

Appreciate any feedback, and happy to partake in discussions :)

abelenekes.com

u/TranslatorRude4917 — 7 hours ago

▲ 1 r/AI_Agents

Hey, FE dev here, working at SaaS startups for over a decade, plus coding a couple of side projects on my own - none released yet, but hope dies last :D

At my current team we’re actively working on integrating an AI assistant into our product, and the more time I spend on this project, the more I think about this:

Right now, if you want an assistant to do something useful in your app, you usually end up exposing the same product flows in a bunch of different, very product-specific ways.

Take something like user or team management. In many products that exists through:

the regular UI
internal/public API
custom MCP
in-app assistant actions
sometimes even frontend tools where the agent literally navigates the UI to do the work

The user wants one thing done, but we keep rebuilding different ways to access the same capability depending on whether the caller is a human in the app, another system, or an AI assistant.

I think web apps should expose their key user flows in some more standard way, and users should be able to bring their own assistant to them, instead of every product rebuilding its own separate assistant layer around the same flows.

Imo that's more or less the direction WebMCP is going to, and once a standard (already getting built into Google Chrome), I think the value is pretty big:

centralized feature surface in the browser, products exposing flows once instead of rebuilding them for every surface
less product-specific integration work
more unified web experience
users not being locked into each product’s assistant and product

Maybe I’m overly excited because I’m close to the problem right now, but I can’t really shake the feeling that this is where things are heading.

Wdyt, will this eventually settle into a standard model?

reddit.com

u/TranslatorRude4917 — 22 days ago

▲ 1 r/AgentsOfAI

Hey, FE dev here, working at SaaS startups for over a decade, plus coding a couple of side projects on my own - none released yet, but hope dies last :D

At my current team we’re actively working on integrating an AI assistant into our product, and the more time I spend on this project, the more I think about this:
Right now, if you want an assistant to do something useful in your app, you usually end up exposing the same product flows in a bunch of different, very product-specific ways.

Take something like user or team management. In many products that exists through:
- the regular UI
- internal/public API
- custom MCP
- in-app assistant actions
- sometimes even frontend tools where the agent literally navigates the UI to do the work

As a developer, it’s super exciting. Obviously no one figured it out yet and there’s a lot of experimentation happening. But at the same time it also starts feeling messy and not really like the thing that scales.
The user wants one thing done, but we keep rebuilding different ways to access the same capability depending on whether the caller is a human in the app, another system, or an AI assistant.

Imo that's more or less the direction WebMCP is going to, and once a standard (already getting built into Google Chrome), I think the value is pretty big:
- centralized feature surface in the browser, products exposing flows once instead of rebuilding them for every surface
- less product-specific integration work
- more unified web experience
- users not being locked into each product’s assistant and product

Maybe I’m overly excited because I’m close to the problem right now, but I can’t really shake the feeling that this is where things are heading.

Wdyt, will this eventually settle into a standard model?

reddit.com

u/TranslatorRude4917 — 22 days ago

▲ 23 r/Playwright+5 crossposts

Hey guys,

A while ago I posted here about the gap between what an e2e test says it protects and what it actually checks.

That discussion raised a few good questions, especially around whether I was just arguing for page objects or trying to force everything into application-level tests.

I spent some time thinking deeper about the problem, and now I think the thing I've been trying to name more precisely is this:

A test can be perfectly clean and still change for the wrong reasons if it is anchored to a different scope than the promise it claims to protect.

Example:

test('create business party', async ({ page }) =&gt; {
  const partyList = page.getByTestId('Components.PartyList');

  await partyList.getByRole('button', { name: /add party/i }).click();

  const modal = page.getByTestId('Components.PartyModal');
  await modal.getByRole('button', { name: /business/i }).click();

  const entityName = modal.getByTestId('Components.PartyModal.PartyModalBusinessForm.entityName');
  await entityName.getByRole('combobox').fill('Acme Inc.');
  await entityName.getByRole('option', { name: /create/i }).click();

  await modal.getByTestId('Components.PartyModal.submitButton').click();

  await expect(partyList.getByTestId('Components.PartyList.PartyRow').filter({ hasText: 'Acme Inc.' })).toBeVisible();
});

Nothing is wrong with this by itself.

But if the promise is just:

>a business party can be created

then this test is anchored to a much more UI-specific scope:
- there is a party list with an add-party entry point
- the flow starts there
- it happens through a modal
- that modal has a business tab
- etc...

That may be exactly what you want to protect. But then it is a UI-scope contract.
Same promise space, different scope:

test('create business party', async ({ parties }) =&gt; {
  await parties
    .addBusiness({ companyName: 'Acme Inc.' })
    .create();
  await expect.poll(async () =&gt; parties.get('Acme Inc.')).not.toBeUndefined();
});

UI-scope tests are completely valid when the thing you want to protect is UI behavior. Application-scope tests are valid when the thing you want to protect is the capability itself.

The problem starts when the test sounds like it protects one scope, but is actually tied to another.
And if a test is truly UI-scope, it is worth asking whether e2e is the right place for it, or whether a smaller UI/component test would give faster, more focused feedback.

Imo that is where a lot of brittleness comes from. And it's not just naming alignment. Once those two are aligned, the whole suite - and maybe your whole testing strategy - gets much easier to reason about:
- UI-scope tests change when UI behavior changes
- application-scope tests change when the application capability changes
- mechanics can still break, but the fix is easier to locate
- "should this really be an e2e test?" is easier to answer
- it becomes easier to see when a lower-level test is creating more churn than the promise is worth

If interested, I wrote the longer version with a fuller example and more on scope alignment in the linked post.

Glad to jump back in the trenches arguing about testing practices :D

u/TranslatorRude4917 — 13 days ago