Building MCP Discovery with Knowledge Graphs: A PeKG Pattern Guide

title: "Why MCP discovery is broken (and how knowledge graphs fix it)"

tags: [ai, security, tutorial, opensource]

Last week, I watched an agent do something very human: it picked the first tool that looked right.

It needed to “get customer data.” The MCP server had three similarly named tools:

get_customer
customer_lookup
fetch_account_profile

The agent guessed wrong, hit the broader tool, and pulled way more data than the task actually needed.

Nothing malicious. Just bad discovery.

That’s the part people skip when they talk about MCP: not how to expose tools, but how agents decide which tool to use safely, consistently, and with enough context to avoid overreach.

A flat list of tools doesn’t scale. Once your MCP ecosystem grows past a handful of functions, discovery becomes a search problem. And search without structure gets messy fast.

One pattern that’s worked well is treating MCP discovery like a knowledge graph problem: model tools, permissions, resources, and relationships as nodes and edges, then let agents query the graph instead of guessing from names.

The problem with “just list the tools”

Most MCP discovery today looks roughly like this:

Ask server for available tools
Read names/descriptions
Pick the closest match
Hope auth, scope, and side effects line up

That works for 5 tools. It gets risky at 50.

The failure modes are predictable:

tools with overlapping names
stale descriptions
hidden side effects
missing auth context
no way to rank “least privilege” options
no memory of which tools are related to which resources

If an agent sees:

delete_file
archive_file
update_file_metadata

…it needs more than descriptions. It needs to know:

which resource types each tool touches
whether the action is read/write/delete
what scopes or approvals are required
whether there’s a safer alternative
whether this tool is commonly used in the current workflow

That’s graph-shaped data.

The PeKG pattern

A useful pattern here is PeKG: a Permission-enriched Knowledge Graph.

At a high level:

Nodes = tools, resources, scopes, policies, users/agents, workflows
Edges = can_access, modifies, requires_scope, belongs_to, safer_than, commonly_follows

Instead of asking:

“What tools exist?”

the agent asks:

“What tool can read customer_profile, with my current scope, in this environment, with the lowest risk?”

That’s a much better question.

Here’s the mental model:

[Agent]
   | has_scope
   v
[read:customer]
   ^
   | requires_scope
[get_customer_profile] ----modifies/read----> [CustomerProfile]
   |
   | safer_than
   v
[fetch_full_account_record] ----read----> [AccountRecord]

With this structure, discovery becomes ranking and filtering, not blind selection.

What to put in the graph

You don’t need a giant ontology to get value. Start with:

Tool
- name
- description
- action type (read, write, delete, execute)
- side effect level
Resource
- domain object affected (Customer, Invoice, Repo, File)
Scope / Permission
- required auth scopes
Policy
- approval needed, environment restrictions, time constraints
Agent
- current identity, delegated scopes, trust level

Then add a few useful edges:

TOOL -> REQUIRES_SCOPE -> SCOPE
TOOL -> ACTS_ON -> RESOURCE
AGENT -> HAS_SCOPE -> SCOPE
TOOL_A -> SAFER_THAN -> TOOL_B
TOOL -> PART_OF -> WORKFLOW

That alone improves discovery a lot.

A tiny runnable example

You don’t need Neo4j on day one. Even a simple in-memory graph beats a flat array of tool descriptions.

npm install graphology

const Graph = require("graphology");

const g = new Graph();

g.addNode("agent", { type: "agent", scopes: ["read:customer"] });
g.addNode("get_customer_profile", { type: "tool", action: "read" });
g.addNode("fetch_full_account_record", { type: "tool", action: "read" });
g.addNode("read:customer", { type: "scope" });

g.addEdge("get_customer_profile", "read:customer", { type: "requires_scope" });
g.addEdge("fetch_full_account_record", "read:customer", { type: "requires_scope" });
g.addEdge("agent", "read:customer", { type: "has_scope" });
g.addEdge("get_customer_profile", "fetch_full_account_record", { type: "safer_than" });

const candidates = g.outNeighbors("agent")
  .flatMap(scope => g.inNeighbors(scope))
  .filter(node => g.getNodeAttribute(node, "type") === "tool");

console.log("Discovered tools:", candidates);

It’s simple, but it shows the idea: discover tools by relationship, not string matching.

Why this works better for MCP

MCP already gives you a way to expose tools. The missing layer is decision support.

A PeKG-style discovery layer helps with:

least privilege: prefer narrower tools over broad ones
better ranking: use graph distance, policy, and workflow context
safer automation: avoid tools requiring approvals the agent doesn’t have
explainability: “I chose this tool because it matches the resource and your delegated scope”
cross-server discovery: unify tools from multiple MCP servers without flattening everything into one giant menu

And if your policy needs are getting serious, say it plainly: a policy engine like OPA is often the right answer for enforcement. The graph helps with discovery and reasoning; policy engines help with hard decisions.

Use both if you need both.

Practical implementation advice

If you’re building this for real, keep it boring:

Normalize tool metadata
- action type
- resource type
- required scopes
- side effect level
Build the graph from MCP manifests
- don’t hand-maintain it forever
Rank by safety first
- read over write
- narrow over broad
- no-approval over approval-required when appropriate
Separate discovery from authorization
- graph suggests
- policy decides
Log why a tool was chosen
- this matters when agents surprise you

Try it yourself

If you want to test how your MCP server exposes tools and whether the metadata is actually helping discovery:

Want to check your MCP server? Try https://tools.authora.dev
Run npx @authora/agent-audit to scan your codebase
Add a verified badge to your agent: https://passport.authora.dev
Check out https://github.com/authora-dev/awesome-agent-security for more resources

MCP discovery is going to matter a lot more as tool ecosystems get bigger. Flat lists are fine for demos. In production, agents need relationships, constraints, and context.

That’s why knowledge graph patterns are worth borrowing here.

How are you handling MCP tool discovery today: embeddings, heuristics, hand-written routing, or something graph-shaped? Drop your approach below.

-- Authora team

This post was created with AI assistance.

Building MCP Discovery with Knowledge Graphs: A PeKG Pattern Guide

tags: [ai, security, tutorial, opensource]

The problem with “just list the tools”

The PeKG pattern

What to put in the graph

A tiny runnable example

Why this works better for MCP

Practical implementation advice

Try it yourself

Comments

More from this blog

ASTs + Gemini: Building a Knowledge Graph to Fix Codebase Onboarding

Why Claude Mythos Needs Persistent Memory for Threat Detection

How I Solved AI Agent Amnesia With MCP Memory Servers

Solving LLM Memory Drift: Knowledge Graphs vs Vector Store Decay

Context Window Overflow? How Knowledge Graphs Fix AI Agent Memory

Command Palette

tags: [ai, security, tutorial, opensource]

The problem with “just list the tools”

The PeKG pattern

What to put in the graph

A tiny runnable example

Why this works better for MCP

Practical implementation advice

Try it yourself

Comments

More from this blog