Building MCP Discovery with Knowledge Graphs: A PeKG Pattern Guide
title: "Why MCP discovery is broken (and how knowledge graphs fix it)"
tags: [ai, security, tutorial, opensource]
Last week, I watched an agent do something very human: it picked the first tool that looked right.
It needed to “get customer data.” The MCP server had three similarly named tools:
get_customercustomer_lookupfetch_account_profile
The agent guessed wrong, hit the broader tool, and pulled way more data than the task actually needed.
Nothing malicious. Just bad discovery.
That’s the part people skip when they talk about MCP: not how to expose tools, but how agents decide which tool to use safely, consistently, and with enough context to avoid overreach.
A flat list of tools doesn’t scale. Once your MCP ecosystem grows past a handful of functions, discovery becomes a search problem. And search without structure gets messy fast.
One pattern that’s worked well is treating MCP discovery like a knowledge graph problem: model tools, permissions, resources, and relationships as nodes and edges, then let agents query the graph instead of guessing from names.
The problem with “just list the tools”
Most MCP discovery today looks roughly like this:
- Ask server for available tools
- Read names/descriptions
- Pick the closest match
- Hope auth, scope, and side effects line up
That works for 5 tools. It gets risky at 50.
The failure modes are predictable:
- tools with overlapping names
- stale descriptions
- hidden side effects
- missing auth context
- no way to rank “least privilege” options
- no memory of which tools are related to which resources
If an agent sees:
delete_filearchive_fileupdate_file_metadata
…it needs more than descriptions. It needs to know:
- which resource types each tool touches
- whether the action is read/write/delete
- what scopes or approvals are required
- whether there’s a safer alternative
- whether this tool is commonly used in the current workflow
That’s graph-shaped data.
The PeKG pattern
A useful pattern here is PeKG: a Permission-enriched Knowledge Graph.
At a high level:
- Nodes = tools, resources, scopes, policies, users/agents, workflows
- Edges = can_access, modifies, requires_scope, belongs_to, safer_than, commonly_follows
Instead of asking:
“What tools exist?”
the agent asks:
“What tool can read
customer_profile, with my current scope, in this environment, with the lowest risk?”
That’s a much better question.
Here’s the mental model:
[Agent]
| has_scope
v
[read:customer]
^
| requires_scope
[get_customer_profile] ----modifies/read----> [CustomerProfile]
|
| safer_than
v
[fetch_full_account_record] ----read----> [AccountRecord]
With this structure, discovery becomes ranking and filtering, not blind selection.
What to put in the graph
You don’t need a giant ontology to get value. Start with:
- Tool
- name
- description
- action type (
read,write,delete,execute) - side effect level
- Resource
- domain object affected (
Customer,Invoice,Repo,File)
- domain object affected (
- Scope / Permission
- required auth scopes
- Policy
- approval needed, environment restrictions, time constraints
- Agent
- current identity, delegated scopes, trust level
Then add a few useful edges:
TOOL -> REQUIRES_SCOPE -> SCOPETOOL -> ACTS_ON -> RESOURCEAGENT -> HAS_SCOPE -> SCOPETOOL_A -> SAFER_THAN -> TOOL_BTOOL -> PART_OF -> WORKFLOW
That alone improves discovery a lot.
A tiny runnable example
You don’t need Neo4j on day one. Even a simple in-memory graph beats a flat array of tool descriptions.
npm install graphology
const Graph = require("graphology");
const g = new Graph();
g.addNode("agent", { type: "agent", scopes: ["read:customer"] });
g.addNode("get_customer_profile", { type: "tool", action: "read" });
g.addNode("fetch_full_account_record", { type: "tool", action: "read" });
g.addNode("read:customer", { type: "scope" });
g.addEdge("get_customer_profile", "read:customer", { type: "requires_scope" });
g.addEdge("fetch_full_account_record", "read:customer", { type: "requires_scope" });
g.addEdge("agent", "read:customer", { type: "has_scope" });
g.addEdge("get_customer_profile", "fetch_full_account_record", { type: "safer_than" });
const candidates = g.outNeighbors("agent")
.flatMap(scope => g.inNeighbors(scope))
.filter(node => g.getNodeAttribute(node, "type") === "tool");
console.log("Discovered tools:", candidates);
It’s simple, but it shows the idea: discover tools by relationship, not string matching.
Why this works better for MCP
MCP already gives you a way to expose tools. The missing layer is decision support.
A PeKG-style discovery layer helps with:
- least privilege: prefer narrower tools over broad ones
- better ranking: use graph distance, policy, and workflow context
- safer automation: avoid tools requiring approvals the agent doesn’t have
- explainability: “I chose this tool because it matches the resource and your delegated scope”
- cross-server discovery: unify tools from multiple MCP servers without flattening everything into one giant menu
And if your policy needs are getting serious, say it plainly: a policy engine like OPA is often the right answer for enforcement. The graph helps with discovery and reasoning; policy engines help with hard decisions.
Use both if you need both.
Practical implementation advice
If you’re building this for real, keep it boring:
Normalize tool metadata
- action type
- resource type
- required scopes
- side effect level
Build the graph from MCP manifests
- don’t hand-maintain it forever
Rank by safety first
- read over write
- narrow over broad
- no-approval over approval-required when appropriate
Separate discovery from authorization
- graph suggests
- policy decides
Log why a tool was chosen
- this matters when agents surprise you
Try it yourself
If you want to test how your MCP server exposes tools and whether the metadata is actually helping discovery:
- Want to check your MCP server? Try https://tools.authora.dev
- Run
npx @authora/agent-auditto scan your codebase - Add a verified badge to your agent: https://passport.authora.dev
- Check out https://github.com/authora-dev/awesome-agent-security for more resources
MCP discovery is going to matter a lot more as tool ecosystems get bigger. Flat lists are fine for demos. In production, agents need relationships, constraints, and context.
That’s why knowledge graph patterns are worth borrowing here.
How are you handling MCP tool discovery today: embeddings, heuristics, hand-written routing, or something graph-shaped? Drop your approach below.
-- Authora team
This post was created with AI assistance.