MCP API Reference
This document provides comprehensive API documentation for all MCP (Model Context Protocol) tools exposed by Context Engine's dual-server architecture.
Documentation: README · Getting Started · Configuration · IDE Clients · MCP API · ctx CLI · Memory Guide · Architecture · Multi-Repo · Observability · Kubernetes · VS Code Extension · Troubleshooting · Development
On this page:
- Overview
- Memory Server API -
memory_store(),memory_find() - Indexer Server API -
repo_search(),context_search(),context_answer(),info_request(), etc. - Response Schemas
- Error Handling
Overview
Context Engine exposes two MCP servers:
- Memory Server: Knowledge base storage and retrieval (
port 8000SSE,port 8002HTTP) - Indexer Server: Code search, indexing, and management (
port 8001SSE,port 8003HTTP)
Both servers support SSE and HTTP RMCP transports simultaneously.
Transports & IDE Integration
For each server, two transports are available:
SSE (Server-Sent Events)
- Memory:
http://localhost:8000/sse - Indexer:
http://localhost:8001/sse - Typically used via
mcp-remoteor legacy MCP clients.
- Memory:
HTTP (streamable MCP over HTTP)
- Memory:
http://localhost:8002/mcp - Indexer:
http://localhost:8003/mcp - Health:
- Memory:
http://localhost:18002/readyz - Indexer:
http://localhost:18003/readyz
- Memory:
- Tools (for debugging):
GET /toolson the health ports.
- Memory:
Recommendation for IDEs: Prefer the HTTP /mcp endpoints when integrating with IDE clients (Claude Code, Windsurf, etc.). HTTP uses a simple request/response pattern where initialize completes before listTools and other calls, avoiding initialization races.
When using SSE via mcp-remote, some clients may send MCP messages (for example listTools) in parallel on a fresh session before initialize has fully completed. FastMCP enforces that only initialize may be processed during initialization; if a non-initialize request arrives too early, the server can log:
Failed to validate request: Received request before initialization was complete
This manifests as tools/resources only appearing after a second reconnect. Switching the IDE to talk directly to the HTTP /mcp endpoints avoids this class of issue.
Memory Server API
memory_store()
Store information with rich metadata for later retrieval and search.
Parameters:
information(str, required): Clear natural language description of the content to storemetadata(dict, optional): Structured metadata with the following schema:kind(str, optional): Category type - one of:"snippet": Code snippet or pattern"explanation": Technical explanation"pattern": Design pattern or approach"example": Usage example"reference": Reference information
language(str, optional): Programming language (e.g., "python", "javascript", "go")path(str, optional): File path context for code-related entriestags(list[str], optional): Searchable tags for categorizationpriority(int, optional): Importance ranking (1-10, higher = more important)topic(str, optional): High-level topic classificationcode(str, optional): Actual code content (for snippet kind)author(str, optional): Author or source attributioncreated_at(str, optional): ISO timestamp (auto-generated if omitted)
Returns:
{
"ok": true,
"id": "uuid-string",
"message": "Successfully stored information"
}
Example:
{
"information": "Efficient Python pattern for processing large files using generators to minimize memory usage",
"metadata": {
"kind": "pattern",
"language": "python",
"path": "utils/file_processor.py",
"tags": ["python", "generators", "memory-efficient", "performance"],
"priority": 8,
"topic": "performance optimization",
"code": "def process_large_file(file_path):\n with open(file_path) as f:\n for line in f:\n yield process_line(line)"
}
}
memory_find()
Search stored memories using hybrid retrieval (semantic + lexical search).
Parameters:
query(str, required): Search query or questionkind(str, optional): Filter by entry kind (snippet, explanation, pattern, etc.)language(str, optional): Filter by programming languagetopic(str, optional): Filter by topictags(str or list[str], optional): Filter by tags (comma-separated string or list)limit(int, default 10): Maximum number of results to returnpriority_min(int, optional): Minimum priority threshold (1-10)
Returns:
{
"ok": true,
"results": [
{
"id": "uuid-string",
"information": "Full stored information text",
"metadata": {
"kind": "pattern",
"language": "python",
"path": "utils/file_processor.py",
"tags": ["python", "generators"],
"priority": 8,
"topic": "performance",
"created_at": "2024-01-15T10:30:00Z"
},
"score": 0.89,
"highlights": ["<<efficient>> Python pattern", "<<memory usage>>"]
}
],
"total": 15,
"query": "python file processing generators"
}
Example:
{
"query": "database connection pooling patterns",
"language": "python",
"kind": "pattern",
"limit": 5
}
Indexer Server API
repo_search()
Perform hybrid code search combining dense semantic, lexical BM25, and optional neural reranking.
Core Parameters:
query(str or list[str], required): Search query or list of queries for query fusionlimit(int, default 10): Maximum total results to returnper_path(int, default 2): Maximum results per file path
Cross-Codebase Isolation:
repo(str or list[str], optional): Filter results to specific repository(ies)- Single repo:
"pathful-commons-app"- Search only this repo - Multiple repos:
["frontend", "backend"]- Search related repos together - All repos:
"*"- Explicitly search all indexed repos (disable auto-filter) - Default: Auto-detects current repo from
CURRENT_REPOenv whenREPO_AUTO_FILTER=1
- Single repo:
Content Filters:
language(str, optional): Filter by programming languagepath_glob(str or list[str], optional): Glob patterns for path filteringunder(str, optional): Limit search to specific directory pathnot_glob(str or list[str], optional): Exclude paths matching these patterns
Code Structure Filters:
symbol(str, optional): Search for specific function, class, or variable nameskind(str, optional): Filter by code construct type:"function": Function definitions"class": Class definitions"variable": Variable assignments"import": Import statements"comment": Comments and docstrings
Search Options:
include_snippet(bool, default true): Include code snippet in resultscontext_lines(int, default 3): Number of context lines around snippethighlight_snippet(bool, default true): Highlight matching tokens in snippet
Reranking Options:
rerank_enabled(bool, optional): Override default reranker settingrerank_top_n(int, default 50): Number of candidates to consider for rerankingrerank_return_m(int, default 12): Number of results to return after reranking
Reranking uses a blended scoring approach that preserves symbol match boosts:
- Blend weight (
RERANK_BLEND_WEIGHT, default 0.6): Ratio of neural reranker score to fusion score - Post-rerank symbol boost (
POST_RERANK_SYMBOL_BOOST, default 1.0): Applied after blending to ensure exact symbol matches rank highest even when the neural reranker disagrees
Response Format:
{
"ok": true,
"results": [
{
"score": 0.89,
"path": "src/search/hybrid_search.py",
"symbol": "hybrid_search",
"start_line": 45,
"end_line": 67,
"snippet": "def hybrid_search(query, limit=10):\n # ReFRAG-inspired implementation\n results = []\n return results",
"highlights": ["<<ReFRAG-inspired>> implementation"],
"components": {
"dense_score": 0.85,
"lexical_score": 0.42,
"reranker_score": 0.91,
"final_score": 0.89
},
"metadata": {
"language": "python",
"kind": "function",
"complexity": "medium",
"tokens": 156
}
}
],
"total": 15,
"used_rerank": true,
"search_time_ms": 127,
"query": "asyncio subprocess management python"
}
Examples:
Basic Search:
{
"query": "asyncio subprocess management",
"limit": 10,
"language": "python"
}
Advanced Search with Multiple Filters:
{
"query": ["database connection", "sqlalchemy pool"],
"language": "python",
"path_glob": "**/db/**/*.py",
"not_glob": ["**/test_*.py", "**/migrations/**"],
"kind": "function",
"limit": 20,
"per_path": 3,
"rerank_enabled": true
}
Symbol Search:
{
"query": "hybrid_search",
"symbol": "hybrid_search",
"language": "python",
"include_snippet": true
}
Cross-Codebase Search (multi-repo):
{
"query": "authentication middleware",
"repo": ["frontend", "backend"],
"limit": 15
}
Single Repo Search:
{
"query": "user authentication",
"repo": "my-repo",
"include_snippet": true
}
context_search()
Blend code search results with memory entries for comprehensive context.
Parameters:
All repo_search parameters (including repo for cross-codebase isolation) plus:
include_memories(bool, default true): Whether to include memory resultsmemory_weight(float, default 1.0): Weight for memory results vs code resultsper_source_limits(dict, optional): Limits per source type:{ "code": 8, "memory": 4 }
Returns:
{
"ok": true,
"results": [
{
"source": "code",
"score": 0.89,
"path": "src/db/connection.py",
"symbol": "create_pool",
"snippet": "def create_pool(database_url):\n return create_engine(database_url, pool_size=10)"
},
{
"source": "memory",
"score": 0.85,
"id": "uuid-string",
"information": "Database connection pooling best practices for high-concurrency applications",
"metadata": {
"kind": "pattern",
"language": "python",
"priority": 9
}
}
],
"total": 12,
"sources": ["code", "memory"],
"query": "database connection pooling"
}
context_answer()
Generate natural language answers using retrieval-augmented generation with local LLM.
Core Parameters:
query(str or list[str], required): Question or query to answerbudget_tokens(int, optional): Token budget for context assembly (default from config)include_snippet(bool, default true): Include code snippets in context
Retrieval Parameters:
All repo_search parameters supported for context retrieval.
LLM Parameters:
max_tokens(int, optional): Maximum tokens in generated answertemperature(float, default 0.3): Sampling temperature (lower = more deterministic)mode(str, default "stitch"): Context assembly mode ("stitch" or "pack")expand(bool, default false): Enable query expansion
Response Format:
{
"ok": true,
"answer": "Context Engine uses ReFRAG-inspired micro-chunking with 16-token windows and 8-token stride to achieve precise code retrieval. The span budgeting system ensures efficient token usage while maintaining context relevance.",
"citations": [
{
"path": "scripts/hybrid_search.py",
"start_line": 156,
"end_line": 162,
"snippet": "# ReFRAG micro-chunking\nWINDOW_SIZE = 16\nSTRIDE = 8",
"relevance": 0.92
},
{
"path": "scripts/utils.py",
"start_line": 89,
"end_line": 95,
"snippet": "def micro_chunk(text, window_size=16, stride=8):",
"relevance": 0.87
}
],
"query": ["How does Context Engine implement micro-chunking?"],
"used_context_tokens": 1247,
"generation_time_ms": 2340,
"decoder_used": "llamacpp"
}
Example:
{
"query": "What is the best way to handle database connections in Python web applications?",
"budget_tokens": 2000,
"language": "python",
"expand": true,
"temperature": 0.2
}
info_request()
Simplified codebase retrieval with optional explanation mode. Drop-in replacement for basic codebase retrieval tools with human-readable result descriptions.
Primary Parameters:
info_request(str, required): Natural language description of the code you're looking forinformation_request(str): Alias forinfo_request
Explanation Mode:
include_explanation(bool, default false): Add summary, primary_locations, related_concepts, grouped_results, and confidence metricsinclude_relationships(bool, default false): Add imports_from, calls, related_paths to each result
Filter Parameters:
limit(int): Maximum results (smart defaults: 15 for short queries, 8 for questions, 10 otherwise)language(str, optional): Filter by programming languageunder(str, optional): Limit search to specific directoryrepo(str or list[str], optional): Filter by repository name(s)path_glob(str or list[str], optional): Glob patterns for file paths
Snippet Options:
include_snippet(bool, default true): Include code snippetscontext_lines(int, default 5): Lines of context around matches
Returns (basic mode):
{
"ok": true,
"results": [
{
"score": 0.85,
"path": "/work/src/hooks/useAuth.tsx",
"symbol": "useAuth",
"start_line": 15,
"end_line": 45,
"information": "Found 'useAuth' in useAuth.tsx (lines 15-45)",
"relevance_score": 0.85,
"snippet": "export function useAuth() { ... }"
}
],
"total": 10,
"search_strategy": "hybrid+rerank"
}
Returns (with include_explanation: true):
{
"ok": true,
"results": [...],
"total": 10,
"search_strategy": "hybrid+rerank+lang:typescript",
"summary": "Found 10 results related to 'authentication hook' across 5 files",
"primary_locations": [
"/work/src/hooks/useAuth.tsx",
"/work/src/context/AuthContext.tsx"
],
"related_concepts": ["auth", "hook", "context", "session", "token"],
"grouped_results": {
"by_file": {
"/work/src/hooks/useAuth.tsx": {
"count": 3,
"top_symbols": ["useAuth", "AuthProvider", "useSession"]
}
}
},
"confidence": {
"level": "high",
"score": 0.78,
"top_score": 0.85,
"symbol_matches": 2
},
"query_understanding": {
"intent": "search_for_code",
"detected_language": "typescript",
"detected_symbols": ["useAuth"],
"search_strategy": "hybrid+rerank+lang:typescript"
}
}
Returns (with include_relationships: true):
{
"results": [
{
"information": "Found 'useAuth' in useAuth.tsx (lines 15-45)",
"relationships": {
"imports_from": ["react", "@/context/AuthContext"],
"calls": ["useState", "useContext", "fetchUser"],
"symbol_path": "useAuth",
"related_paths": ["/work/src/context/AuthContext.tsx"]
}
}
]
}
Smart Limits:
- Short queries (1-2 words): 15 results for broader coverage
- Question queries ("how does", "what is"): 8 results for focused answers
- Default: 10 results
Search Strategy Labels:
hybrid- Base hybrid search (dense + lexical)+rerank- Neural reranker applied+repo_filtered- Filtered to specific repo(s)+lang:python- Filtered by language+path_filtered- Filtered by directory
Environment Variables:
INFO_REQUEST_LIMIT=10- Default result limitINFO_REQUEST_CONTEXT_LINES=5- Default context linesINFO_REQUEST_EXPLAIN_DEFAULT=0- Enable explanation mode by defaultINFO_REQUEST_RELATIONSHIPS=0- Enable relationships by default
Example:
{
"info_request": "authentication middleware",
"include_explanation": true,
"include_relationships": true,
"language": "python",
"limit": 5
}
qdrant_index()
Index or reindex code from the mounted workspace.
Parameters:
subdir(str, optional): Subdirectory to index (default: entire workspace)recreate(bool, default false): Drop and recreate collection before indexingcollection(str, optional): Override default collection name
Returns:
{
"ok": true,
"operation": "index",
"subdir": "",
"collection": "my-workspace",
"recreate": false,
"stats": {
"files_processed": 1250,
"chunks_created": 8432,
"vectors_generated": 8432,
"processing_time_seconds": 127,
"errors": 0
},
"message": "Indexing completed successfully"
}
qdrant_prune()
Remove stale points from the collection (files that no longer exist).
Parameters: None (operates on current workspace)
Returns:
{
"ok": true,
"operation": "prune",
"points_removed": 47,
"points_before": 15234,
"points_after": 15187,
"processing_time_ms": 892,
"message": "Pruning completed successfully"
}
qdrant_status()
Get comprehensive status information about the collection and indexing state.
Parameters:
collection(str, optional): Override default collection namemax_points(int, default 5000): Maximum points to scan for timestamp analysisbatch(int, default 1000): Batch size for scanning
Returns:
{
"ok": true,
"collection": "my-workspace",
"exists": true,
"count": 15234,
"scanned_points": 5000,
"last_ingested_at": {
"unix": 1705123456,
"iso": "2024-01-13T15:30:56Z"
},
"last_modified_at": {
"unix": 1705124123,
"iso": "2024-01-13T15:35:23Z"
},
"vectors_config": {
"fast-bge-base-en-v1.5": 384,
"lex": 4096
},
"storage_size_mb": 245.7,
"status": "healthy"
}
qdrant_list()
List all available Qdrant collections.
Parameters: None
Returns:
{
"ok": true,
"collections": [
{
"name": "my-workspace",
"vectors_count": 15234,
"segments_count": 12,
"points_count": 15234,
"indexed_vectors_count": 15234,
"status": "green",
"optimizer_status": "ok"
}
]
}
workspace_info()
Read workspace state and default collection information.
Parameters:
workspace_path(str, optional): Override workspace path (default: current workspace)
Returns:
{
"ok": true,
"workspace_path": "/work",
"default_collection": "context-engine-workspace",
"source": "state_file",
"state": {
"workspace_id": "workspace-uuid",
"created_at": "2024-01-10T09:15:00Z",
"last_indexed": "2024-01-13T15:30:56Z",
"files_count": 1250,
"total_size_bytes": 52428800
}
}
list_workspaces()
Scan for all workspaces with .codebase/state.json files.
Parameters:
search_root(str, optional): Root directory to scan (default: parent of workspace)
Returns:
{
"ok": true,
"workspaces": [
{
"workspace_path": "/work",
"collection_name": "context-engine-workspace",
"last_updated": "2024-01-13T15:30:56Z",
"indexing_state": "completed"
},
{
"workspace_path": "/work/project-b",
"collection_name": "project-b-workspace",
"last_updated": "2024-01-12T11:20:30Z",
"indexing_state": "in_progress"
}
]
}
expand_query()
Generate alternative query variations using LLM decoder (requires REFRAG_DECODER=1).
Supports three runtime backends via REFRAG_RUNTIME:
llamacpp(default): Local llama.cpp serverglm: ZhipuAI GLM-4 API (disables deep thinking for fast JSON output)minimax: MiniMax M2 API
Parameters:
query(str or list[str], required): Original query or queries to expandmax_new(int, default 2): Maximum number of alternative queries to generate (0-2)
Returns:
{
"ok": true,
"original_query": "python asyncio subprocess",
"alternates": [
"python asynchronous process management",
"asyncio subprocess handling"
],
"total_queries": 3,
"decoder_used": "minimax"
}
On decoder error, falls back to suffix-based expansion with "decoder_used": "fallback".
If expansion fails entirely, returns "ok": false with an error message.
code_search()
Exact alias of repo_search() for discoverability. Same parameters and return shape.
qdrant_index_root()
Index the entire workspace root (/work).
Parameters:
recreate(bool, default false): Drop and recreate collection before indexingcollection(str, optional): Target collection name
Returns: Subprocess result with indexing status.
search_tests_for()
Find test files related to a query. Presets common test file globs.
Parameters:
query(str or list[str], required): Search querylimit(int, optional): Max resultsinclude_snippet(bool, optional): Include code snippetslanguage(str, optional): Filter by language
Returns: Same shape as repo_search().
search_config_for()
Find configuration files related to a query. Presets config file globs (yaml/json/toml/etc).
Parameters: Same as search_tests_for().
Returns: Same shape as repo_search().
search_callers_for()
Heuristic search for callers/usages of a symbol.
Parameters:
query(str, required): Symbol name to find callers forlimit(int, optional): Max resultslanguage(str, optional): Filter by language
Returns: Same shape as repo_search().
search_importers_for()
Find files likely importing or referencing a module/symbol.
Parameters: Same as search_callers_for().
Returns: Same shape as repo_search().
pattern_search()
Find structurally similar code patterns across languages. Requires PATTERN_VECTORS=1.
Parameters:
query(str, required): Code snippet OR natural language pattern descriptionlanguage(str, default "python"): Language hint for code querieslimit(int, default 10): Maximum resultsmin_score(float, default 0.3): Similarity thresholdinclude_snippet(bool): Include code in resultstarget_languages(list[str]): Filter target languages
Response:
{
"ok": true,
"results": [{"path": "...", "start_line": 45, "score": 0.94, "control_flow_signature": "L2_2_B0_T2_M0__C_TL"}],
"total": 5,
"query_signature": "L2_2_B0_T2_M0__C_TL",
"query_mode": "code"
}
Signature format: L{loop_depth}_{count}_B{branches}_T{try}_M{match}_{flags} where flags include TL (retry pattern), BL (filter pattern).
Example:
{"query": "for i in range(3): try: fetch() except: sleep(i)", "include_snippet": true}
symbol_graph()
First-class symbol graph navigation using indexed metadata fields:
metadata.calls(call graph)metadata.imports(imports graph)metadata.symbol/metadata.symbol_path(definitions)
Supports three query types:
"callers": "Who calls X?""definition": "Where is X defined?""importers": "What imports Y?"
If there are no graph hits, symbol_graph falls back to semantic search and returns the same response shape.
Parameters:
symbol(str, required): Symbol name (function/class/module) to navigatequery_type(str, default"callers"): One of"callers","definition","importers"limit(int, default 20): Max resultslanguage(str, optional): Filter by languageunder(str, optional): Path prefix filter (directory)output_format(str, optional):"json"(default) or"toon"
Examples:
{"symbol": "ASTAnalyzer", "query_type": "definition", "limit": 10}
{"symbol": "get_embedding_model", "query_type": "callers", "under": "scripts/", "limit": 10}
{"symbol": "qdrant_client", "query_type": "importers", "limit": 10}
Returns:
{
"results": [
{
"path": "scripts/ingest/chunking.py",
"start_line": 12,
"end_line": 88,
"symbol_path": "ASTAnalyzer",
"kind": "class"
}
],
"symbol": "ASTAnalyzer",
"query_type": "definition",
"count": 1,
"collection": "codebase"
}
change_history_for_path()
Summarize recent change metadata for a file path from the index.
Parameters:
path(str, required): Relative path under /workcollection(str, optional): Target collectionmax_points(int, optional): Cap on scanned points
Returns:
{
"ok": true,
"summary": {
"path": "scripts/ctx.py",
"last_modified": "2025-01-15T14:22:00"
}
}
collection_map()
Return collection↔repo mappings with optional Qdrant payload samples.
Parameters:
search_root(str, optional): Directory to scancollection(str, optional): Filter by collectionrepo_name(str, optional): Filter by repoinclude_samples(bool, optional): Include payload sampleslimit(int, optional): Max entries
Returns: Mapping of collections to repositories.
set_session_defaults() (Indexer)
Set default collection for subsequent calls on the same session.
Parameters:
collection(str, optional): Default collection namesession(str, optional): Session token for cross-connection reuse
Returns:
{
"ok": true,
"session": "abc123",
"defaults": {"collection": "codebase"},
"applied": "connection"
}
Error Handling
All API methods follow consistent error handling patterns:
Standard Error Response
{
"ok": false,
"error": "Error type and description",
"error_code": "VALIDATION_ERROR",
"details": {
"field": "query",
"message": "Query cannot be empty"
}
}
Common Error Codes
VALIDATION_ERROR: Invalid parameter valuesCOLLECTION_NOT_FOUND: Specified collection doesn't existINDEXING_ERROR: Failed during indexing operationSEARCH_ERROR: Search operation failedDECODER_ERROR: LLM decoder operation failedTIMEOUT_ERROR: Operation timed outRATE_LIMIT_ERROR: Too many requests
Rate Limits and Quotas
- Default timeout: 30 seconds per operation
- Maximum query length: 1000 characters
- Maximum result limit: 100 results per search
- Memory storage: Configurable per deployment
- Batch indexing limits: Configurable via environment variables
Transport-Specific Behavior
Both SSE and HTTP RMCP transports expose the same tools, arguments, and response shapes. The choice of transport affects only how MCP messages are carried, not what the tools do.
- SSE (
/sse) is primarily intended for use behindmcp-remoteor legacy clients. - HTTP (
/mcp) is recommended for IDE integrations and direct tooling because it uses a simple request/response pattern whereinitializecompletes beforelistToolsand other calls, avoiding known initialization races in some SSE clients.
When in doubt, prefer the HTTP /mcp endpoints described in the Overview.
This API reference should enable developers to effectively integrate Context Engine's MCP tools into their applications and workflows.