A curated, least-privilege tool surface
Deny-by-default, per-capability scopes, auditable consent. Windows' MCP proxy/registry model is a good reference.
MCP is great plumbing, not a magic wand. Treating it like "USB-C for APIs" confuses connectivity with capability.
Somewhere between a keynote and a LinkedIn carousel, the message morphed into: "Expose everything through MCP and your agent just… works." Like USB-C: plug-and-play.
Even Anthropic's own docs use the USB-C metaphor. It's a useful mental model for the plug. It's a terrible one for the driver stack. USB-C only "works" because the OS still has drivers, permissions, and policies. MCP is the socket. You still owe the system.
References: Anthropic MCP docs · MCP getting started · Anthropic MCP announcement
No. The hype is. MCP is a solid port that reduces wiring. But the "plug-and-play agent" narrative sets business users up to skip the engineering—then blame the protocol when reality bites.
A more honest message: MCP lowers integration cost; it does not eliminate system design. You still need semantics, safety, and strategy.
MCP gives you a consistent way to discover tools, invoke them, and pass messages between clients/servers. That's valuable! But MCP does not decide:
Even Microsoft's Windows rollout treats MCP as one layer in a security-first architecture (proxy mediation, tool-level authorization, isolation, and a curated server registry). If MCP were "plug-and-play," none of that would be necessary.
References: Microsoft MCP security · Windows AI development
If the "USB-C for APIs" story were enough, we'd see near-perfect function calls. We don't.
References: Gorilla hallucination analysis · ToolScan paper · OpenAI computer agent · WebArena benchmark
When you give an agent a buffet of endpoints, three predictable failures show up:
OWASP's LLM Top-10 calls out insecure plugin design and prompt injection. MCP doesn't neutralize that—architecture does.
Multiple tools claim the same capability with different shapes. The model picks poorly or "shotguns" calls. BFCL/ToolScan were built to measure exactly this class of errors.
Real APIs have enums, idempotency, and cross-field constraints. ComplexFuncBench shows the error rates spike when parameter values must be reasoned about, not just copied.
And that's before indirect prompt injection—malicious instructions hiding in web pages, emails, or files that your agent ingests, then forwards straight through your shiny MCP pipe to a sensitive tool. Recent papers show many defenses can be bypassed with adaptive attacks (>50% success in tests).
References: OWASP LLM Top-10 · Indirect prompt injection overview
USB-C works because the OS handles drivers, permissions, and policies. With agents, you still need:
A curated, least-privilege tool surface
Deny-by-default, per-capability scopes, auditable consent. Windows' MCP proxy/registry model is a good reference.
A tool-selection router
RAG over your tool docs and a router model beat "let the LLM guess." Gorilla's retrieval-augmented approach is the canonical starting point.
Hard schemas & contract tests
Typed parameters, constraints, golden examples, and failing fast on invalid payloads. Use ToolScan/BFCL-style diagnostics to see what's actually breaking.
Runtime guardrails
Dry-run/preview, reversible operations, and "damage confinement." GoEX formalizes undo + blast-radius limits. That's systems engineering, not protocol sugar.
Evals that reflect your reality
Reliability, time-to-result, safe-fail behavior—measured on your tasks. WABER focuses on reliability/efficiency for web agents; similar practice should exist for your domain.
One domain, well-defined success metrics, and an eval harness (reliability, TTR, safe-fail).
Types, enums, cross-field constraints, idempotency keys, and golden payloads. Auto-generate validators and reject fuzzy inputs fast.
Few-shot policy + embeddings over your tool catalog. Disallow unrecognized capabilities.
Dry-run, preview diffs, reversibility, and human-in-the-loop for money-moving or prod-mutating steps.
Sanitize, isolate, and scope tokens. Map your surface to OWASP LLM risks and test it.
MCP reduces wiring work. It does not auto-solve tool ambiguity, parameter validity, or enterprise safety.
MCP is the port. Your job is the drivers, the policies, and the tests. Ship that—then the metaphor will finally fit.