u/Guilty-Effect-3771

Building and Testing MCP Servers from SDK Authors
▲ 2 r/mcp

Building and Testing MCP Servers from SDK Authors

Hey r/mcp, I am Pietro from Manufact (https://manufact.com), we build open source dev tools and infrastructure for MCP.

You might know us for mcp-use (https://github.com/mcp-use/mcp-use) our open source full stack SDK to build MCP servers and clients.

At Manufact we gave ourselves the mission, and delight, to write as many MCP servers as we could, through this journey we could hone our SDK to offer the best possible developer/agent experience.

Testing/developing MCP servers is a pain because:

- Configuring MCPs in normal clients is not an easy feat. People complain that installing them is not easy, imagine having to refresh them every time you make a change - Testing does not only mean testing tools work one at a time, but making sure agents understand them and can call the tool in the right way/order - If installing an MCP locally is a challenge, it is even more on remote clients where people are going to actually use your products (claude.ai, chatgpt.com) - Model capabilities + system prompt (agent) that will end up using your server vary greatly. Some people might be using Opus 4.7 from Claude Code, some might use Instant on chatgpt.com, the model's ability to call your tool varies a lot. Testing on GPT5.5 locally and testing on ChatGPT with the same model yield very different experiences.

First: local development loop

Two things made web development frameworks like Next and Vite (etc.) better than anything else, HMR and preview on localhost.

What is the preview of an MCP ? In our opinion a chat, every time you npm run dev an mcp-use server we serve an inspector on localhost, automatically connected to your MCP server, it has a BYOK chat, a way to test tools one by one, and super detailed metadata about your MCP server to make sure it is compliant

Interesting technical challenge here was to make an MCP client that runs completely (or almost) in the browser.

About HMR: this was not super easy, there are a few ways to do this, we chose the hard but proper way. We implemented HMR using the protocol primitives, if you change a tool we do not hard refresh the server and cancel the previous MCP session, we send a notifications/tools/list_changed notification (in spec) to the client which knows it should reload the tools. As far as UI elements we use Vite HMR and we forward the UI changes across all elements of the inspector so for instance you can change the UI element your MCP returns and see the change live in the embedded chat. (This is pretty marvellous to look at)

This sped up the development of MCPs by a lot.

You can try it out our inspector by running

 npx @mcp-use/inspector

try the hosted version at inspector.manufact.com or just by using our sdk.

Bonus: one thing I do often is launch Claude Code with --chrome enabled and tell it to go to the inspector URL to test the server, this creates a closed loop for the agents that make development of MCP with them much much more predictable

Second: testing on other clients (Disclaimer : this is a cloud feature)

Have you ever installed an MCP on ChatGPT? You have to be in developer mode, install the app through a pretty buggy dialog, and it's very confusing which version of the MCP you're actually using when talking to the model. This is due to ChatGPT's aggressive caching of MCP app UI resources. Generally a good thing, but with cache comes the crash.

(Disclaimer: this part is a paid feature. The Inspector and mcp-use are open source and MIT licensed.)

To make this better we built an automated testing feature. You define test cases in the regular agent-testing shape (user message, expected tool calls, rubrics). GPT-5.5 from the API and GPT-5.5 inside ChatGPT are wildly different experiences. Same model, different client, different behavior. So we test on the actual client: browser agents install the app and run the tests on the clients themselves.

Once the session is over, you get the results plus screenshots and a screen recording of the conversation. These turned out to be super useful for sharing new versions of MCP apps between teams as well.

You can set it up so these tests run on every deploy on a given branch, and gate promotion to production on them, so you know you're not shipping a broken app. Apps can break in all sorts of unexpected ways.

I'd love to hear thoughts and feedback and specifically know how (if) people are testing their MCP servers both in production and locally.

(I started writing MCPs in Feb 25, when no tool was available and hardly any support in clients, I'd love to see how people are doing this today)

manufact.com
u/Guilty-Effect-3771 — 10 days ago