Skip to main content

Command Palette

Search for a command to run...

Building MCP Discovery with Knowledge Graphs: A PeKG Pattern Guide

Published
5 min read

title: "Why MCP discovery is broken (and how knowledge graphs fix it)"

tags: [ai, security, tutorial, opensource]

Last week, I watched an agent do something very human: it picked the first tool that looked right.

It needed to “get customer data.” The MCP server had three similarly named tools:

  • get_customer
  • customer_lookup
  • fetch_account_profile

The agent guessed wrong, hit the broader tool, and pulled way more data than the task actually needed.

Nothing malicious. Just bad discovery.

That’s the part people skip when they talk about MCP: not how to expose tools, but how agents decide which tool to use safely, consistently, and with enough context to avoid overreach.

A flat list of tools doesn’t scale. Once your MCP ecosystem grows past a handful of functions, discovery becomes a search problem. And search without structure gets messy fast.

One pattern that’s worked well is treating MCP discovery like a knowledge graph problem: model tools, permissions, resources, and relationships as nodes and edges, then let agents query the graph instead of guessing from names.

The problem with “just list the tools”

Most MCP discovery today looks roughly like this:

  1. Ask server for available tools
  2. Read names/descriptions
  3. Pick the closest match
  4. Hope auth, scope, and side effects line up

That works for 5 tools. It gets risky at 50.

The failure modes are predictable:

  • tools with overlapping names
  • stale descriptions
  • hidden side effects
  • missing auth context
  • no way to rank “least privilege” options
  • no memory of which tools are related to which resources

If an agent sees:

  • delete_file
  • archive_file
  • update_file_metadata

…it needs more than descriptions. It needs to know:

  • which resource types each tool touches
  • whether the action is read/write/delete
  • what scopes or approvals are required
  • whether there’s a safer alternative
  • whether this tool is commonly used in the current workflow

That’s graph-shaped data.

The PeKG pattern

A useful pattern here is PeKG: a Permission-enriched Knowledge Graph.

At a high level:

  • Nodes = tools, resources, scopes, policies, users/agents, workflows
  • Edges = can_access, modifies, requires_scope, belongs_to, safer_than, commonly_follows

Instead of asking:

“What tools exist?”

the agent asks:

“What tool can read customer_profile, with my current scope, in this environment, with the lowest risk?”

That’s a much better question.

Here’s the mental model:

[Agent]
   | has_scope
   v
[read:customer]
   ^
   | requires_scope
[get_customer_profile] ----modifies/read----> [CustomerProfile]
   |
   | safer_than
   v
[fetch_full_account_record] ----read----> [AccountRecord]

With this structure, discovery becomes ranking and filtering, not blind selection.

What to put in the graph

You don’t need a giant ontology to get value. Start with:

  • Tool
    • name
    • description
    • action type (read, write, delete, execute)
    • side effect level
  • Resource
    • domain object affected (Customer, Invoice, Repo, File)
  • Scope / Permission
    • required auth scopes
  • Policy
    • approval needed, environment restrictions, time constraints
  • Agent
    • current identity, delegated scopes, trust level

Then add a few useful edges:

  • TOOL -> REQUIRES_SCOPE -> SCOPE
  • TOOL -> ACTS_ON -> RESOURCE
  • AGENT -> HAS_SCOPE -> SCOPE
  • TOOL_A -> SAFER_THAN -> TOOL_B
  • TOOL -> PART_OF -> WORKFLOW

That alone improves discovery a lot.

A tiny runnable example

You don’t need Neo4j on day one. Even a simple in-memory graph beats a flat array of tool descriptions.

npm install graphology
const Graph = require("graphology");

const g = new Graph();

g.addNode("agent", { type: "agent", scopes: ["read:customer"] });
g.addNode("get_customer_profile", { type: "tool", action: "read" });
g.addNode("fetch_full_account_record", { type: "tool", action: "read" });
g.addNode("read:customer", { type: "scope" });

g.addEdge("get_customer_profile", "read:customer", { type: "requires_scope" });
g.addEdge("fetch_full_account_record", "read:customer", { type: "requires_scope" });
g.addEdge("agent", "read:customer", { type: "has_scope" });
g.addEdge("get_customer_profile", "fetch_full_account_record", { type: "safer_than" });

const candidates = g.outNeighbors("agent")
  .flatMap(scope => g.inNeighbors(scope))
  .filter(node => g.getNodeAttribute(node, "type") === "tool");

console.log("Discovered tools:", candidates);

It’s simple, but it shows the idea: discover tools by relationship, not string matching.

Why this works better for MCP

MCP already gives you a way to expose tools. The missing layer is decision support.

A PeKG-style discovery layer helps with:

  • least privilege: prefer narrower tools over broad ones
  • better ranking: use graph distance, policy, and workflow context
  • safer automation: avoid tools requiring approvals the agent doesn’t have
  • explainability: “I chose this tool because it matches the resource and your delegated scope”
  • cross-server discovery: unify tools from multiple MCP servers without flattening everything into one giant menu

And if your policy needs are getting serious, say it plainly: a policy engine like OPA is often the right answer for enforcement. The graph helps with discovery and reasoning; policy engines help with hard decisions.

Use both if you need both.

Practical implementation advice

If you’re building this for real, keep it boring:

  1. Normalize tool metadata

    • action type
    • resource type
    • required scopes
    • side effect level
  2. Build the graph from MCP manifests

    • don’t hand-maintain it forever
  3. Rank by safety first

    • read over write
    • narrow over broad
    • no-approval over approval-required when appropriate
  4. Separate discovery from authorization

    • graph suggests
    • policy decides
  5. Log why a tool was chosen

    • this matters when agents surprise you

Try it yourself

If you want to test how your MCP server exposes tools and whether the metadata is actually helping discovery:

  • Want to check your MCP server? Try https://tools.authora.dev
  • Run npx @authora/agent-audit to scan your codebase
  • Add a verified badge to your agent: https://passport.authora.dev
  • Check out https://github.com/authora-dev/awesome-agent-security for more resources

MCP discovery is going to matter a lot more as tool ecosystems get bigger. Flat lists are fine for demos. In production, agents need relationships, constraints, and context.

That’s why knowledge graph patterns are worth borrowing here.

How are you handling MCP tool discovery today: embeddings, heuristics, hand-written routing, or something graph-shaped? Drop your approach below.

-- Authora team

This post was created with AI assistance.

More from this blog

A

Authora Dev

38 posts