CONFIGURATION.md

Configuration

Configure Context Engine for your needs

Configuration Reference

Complete environment variable reference for Context Engine.

Documentation: README · Getting Started · Configuration · IDE Clients · MCP API · ctx CLI · Memory Guide · Architecture · Multi-Repo · Observability · Kubernetes · VS Code Extension · Troubleshooting · Development


On this page:


Core Settings

Name Description Default
COLLECTION_NAME Qdrant collection name (unified across all repos) codebase
REPO_NAME Logical repo tag stored in payload for filtering auto-detect from git/folder
HOST_INDEX_PATH Host path mounted at /work in containers current repo (.)
QDRANT_URL Qdrant base URL container: http://qdrant:6333; local: http://localhost:6333
MULTI_REPO_MODE Enable multi-repo collections (each subdir gets own collection) 0 (disabled)
LOG_LEVEL Logging verbosity: DEBUG, INFO, WARNING, ERROR, CRITICAL INFO
CTXCE_AUTH_ENABLED Enable API authentication (requires token header) 0 (disabled)
CTXCE_AUTH_ADMIN_TOKEN Admin token for authenticated requests unset

Tool Description Customization

Override default MCP tool descriptions (useful for agent tuning).

Name Description Default
TOOL_STORE_DESCRIPTION Custom description for memory_store tool built-in
TOOL_FIND_DESCRIPTION Custom description for memory_find tool built-in

Embedding Models

Context Engine supports multiple embedding models via the EMBEDDING_MODEL and EMBEDDING_PROVIDER settings.

Default (BGE-base)

The default configuration uses BAAI/bge-base-en-v1.5 via fastembed:

Name Description Default
EMBEDDING_MODEL Model name for dense embeddings BAAI/bge-base-en-v1.5
EMBEDDING_PROVIDER Backend provider fastembed
EMBEDDING_SEED Seed for deterministic embeddings (used in benchmarks) unset

Qwen3-Embedding (Experimental)

Qwen3-Embedding-0.6B offers improved semantic understanding with instruction-aware encoding. Enable via feature flag:

Name Description Default
QWEN3_EMBEDDING_ENABLED Enable Qwen3 embedding support 0 (disabled)
QWEN3_QUERY_INSTRUCTION Add instruction prefix to search queries 1 (enabled when Qwen3 active)
QWEN3_INSTRUCTION_TEXT Custom instruction prefix Instruct: Given a code search query, retrieve relevant code snippets\nQuery:

Setup:

# In .env
QWEN3_EMBEDDING_ENABLED=1
EMBEDDING_MODEL=electroglyph/Qwen3-Embedding-0.6B-onnx-uint8
QWEN3_QUERY_INSTRUCTION=1
# Optional: customize instruction
# QWEN3_INSTRUCTION_TEXT=Instruct: Find code implementing this feature\nQuery:

Important: Switching embedding models requires a full reindex:

make reset-dev-dual  # Recreates collection and reindexes

Dimension comparison:

Model Dimensions Notes
BGE-base-en-v1.5 768 Default, well-tested
Qwen3-Embedding-0.6B 1024 Instruction-aware, experimental

Indexing & Micro-Chunks

Name Description Default
INDEX_MICRO_CHUNKS Enable token-based micro-chunking 0 (off)
MAX_MICRO_CHUNKS_PER_FILE Cap micro-chunks per file 200
TOKENIZER_URL HF tokenizer.json URL (for Make download) n/a
TOKENIZER_PATH Local path where tokenizer is saved (Make) models/tokenizer.json
TOKENIZER_JSON Runtime path for tokenizer (indexer) models/tokenizer.json
USE_TREE_SITTER Enable tree-sitter parsing (py/js/ts) 1 (on)
INDEX_USE_ENHANCED_AST Enable advanced AST-based semantic chunking 1 (on)
INDEX_SEMANTIC_CHUNKS Enable semantic chunking (preserve function/class boundaries) 1 (on)
INDEX_CHUNK_LINES Lines per chunk (non-micro mode) 120
INDEX_CHUNK_OVERLAP Overlap lines between chunks 20
INDEX_BATCH_SIZE Upsert batch size 64
INDEX_PROGRESS_EVERY Log progress every N files 200
SMART_SYMBOL_REINDEXING Reuse embeddings when only symbols change 1 (enabled)
MAX_CHANGED_SYMBOLS_RATIO Threshold for full reindex vs smart update 0.6

Query Optimization

Dynamic HNSW_EF tuning and intelligent query routing for 2x faster simple queries.

Name Description Default
QUERY_OPTIMIZER_ADAPTIVE Enable adaptive EF optimization 1 (on)
QUERY_OPTIMIZER_MIN_EF Minimum EF value 64
QUERY_OPTIMIZER_MAX_EF Maximum EF value 512
QUERY_OPTIMIZER_SIMPLE_THRESHOLD Complexity threshold for simple queries 0.3
QUERY_OPTIMIZER_COMPLEX_THRESHOLD Complexity threshold for complex queries 0.7
QUERY_OPTIMIZER_SIMPLE_FACTOR EF multiplier for simple queries 0.5
QUERY_OPTIMIZER_SEMANTIC_FACTOR EF multiplier for semantic queries 1.0
QUERY_OPTIMIZER_COMPLEX_FACTOR EF multiplier for complex queries 2.0
QUERY_OPTIMIZER_DENSE_THRESHOLD Complexity threshold for dense-only routing 0.2
QUERY_OPTIMIZER_COLLECTION_SIZE Approximate collection size for scaling 10000
QDRANT_EF_SEARCH Base HNSW_EF value (overridden by optimizer) 128

Watcher Settings

Name Description Default
WATCH_DEBOUNCE_SECS Debounce between FS events 1.5
INDEX_UPSERT_BATCH Upsert batch size (watcher) 128
INDEX_UPSERT_RETRIES Retry count 5
INDEX_UPSERT_BACKOFF Seconds between retries 0.5
QDRANT_TIMEOUT HTTP timeout seconds watcher: 60; search: 20
MCP_TOOL_TIMEOUT_SECS Max duration for long-running MCP tools 3600

Reranker

Cross-encoder reranking improves search quality by scoring query-document pairs directly. Context Engine supports two configuration methods:

Set RERANKER_MODEL to use FastEmbed's auto-downloading cross-encoder models:

Name Description Default
RERANKER_MODEL FastEmbed reranker model name unset
RERANKER_ENABLED Enable reranker by default 1 (enabled)

Popular models:

  • jinaai/jina-reranker-v2-base-multilingual - Multilingual, good quality
  • BAAI/bge-reranker-base - English-focused, fast
  • Xenova/ms-marco-MiniLM-L-6-v2 - Lightweight, fast inference

Example:

RERANKER_MODEL=jinaai/jina-reranker-v2-base-multilingual
RERANKER_ENABLED=1

Manual ONNX Paths (Legacy)

For custom models or explicit control, set both ONNX path and tokenizer:

Name Description Default
RERANKER_ONNX_PATH Local ONNX cross-encoder model path unset
RERANKER_TOKENIZER_PATH Tokenizer path for reranker unset
RERANKER_ENABLED Enable reranker by default 1 (enabled)

Note: If both RERANKER_MODEL and RERANKER_ONNX_PATH are set, RERANKER_MODEL takes priority.

Reranker Tuning

Name Description Default
RERANKER_TOPN Candidates to retrieve before reranking 50
RERANKER_RETURN_M Final results after reranking 12
RERANKER_TIMEOUT_MS Rerank timeout in milliseconds 2000
RERANK_BLEND_WEIGHT Ratio of rerank vs fusion score (0.0-1.0) 0.6
RERANK_TIMEOUT_FLOOR_MS Min timeout to avoid cold-start failures 1000
POST_RERANK_SYMBOL_BOOST Score boost for exact symbol matches after rerank 1.0
EMBEDDING_WARMUP Warm up embedding model on startup 0 (disabled)
RERANK_WARMUP Warm up reranker model on startup 0 (disabled)

Learning Reranker

The learning reranker trains a lightweight neural network (TinyScorer) to improve search rankings over time. See Architecture for details.

This feature is optional and enabled by default. To disable:

# Disable learning scorer in search results
RERANK_LEARNING=0

# Disable event logging (no training data collected)
RERANK_EVENTS_ENABLED=0

# Or simply don't run the learning_worker container

Enable/Disable

Name Description Default
RERANK_LEARNING Enable learning scorer in search results 1 (enabled)
RERANK_EVENTS_ENABLED Enable event logging for training 1 (enabled)
RERANK_EVENTS_SAMPLE_RATE Fraction of events to log (0.0-1.0) 0.33

Weight Management

Name Description Default
RERANKER_WEIGHTS_DIR Directory for learned weight files /tmp/rerank_weights
RERANKER_WEIGHTS_RELOAD_INTERVAL How often to check for new weights (seconds) 60
RERANKER_MAX_CHECKPOINTS Number of weight versions to retain 5

Learning Rate

Name Description Default
RERANKER_LR_DECAY_STEPS Updates between learning rate decay 1000
RERANKER_LR_DECAY_RATE Decay multiplier (e.g., 0.95 = 5% reduction) 0.95
RERANKER_MIN_LR Minimum learning rate floor 0.0001

Event Logging

Name Description Default
RERANK_EVENTS_DIR Directory for search event logs /tmp/rerank_events
RERANK_EVENTS_RETENTION_DAYS Days to keep event files before cleanup 7

Learning Worker

Name Description Default
RERANK_LEARNING_BATCH_SIZE Number of events per training batch 32
RERANK_LEARNING_POLL_INTERVAL Seconds between checking for new events 30
RERANK_LEARNING_RATE Initial learning rate for TinyScorer 0.001
RERANK_LLM_TEACHER Enable LLM-teacher guided learning 1 (enabled)
RERANK_LLM_SAMPLE_RATE Fraction of queries to evaluate with LLM teacher 1.0
RERANK_VICREG_WEIGHT Weight for VICReg consistency loss 0.1

Decoder (llama.cpp / OpenAI / GLM / MiniMax)

Name Description Default
REFRAG_DECODER Enable decoder for context_answer (required for llamacpp) 1 (enabled)
REFRAG_RUNTIME Decoder backend: llamacpp, openai, glm, or minimax llamacpp
LLAMACPP_URL llama.cpp server endpoint http://llamacpp:8080 or http://host.docker.internal:8081
LLAMACPP_TIMEOUT_SEC Decoder request timeout 300
DECODER_MAX_TOKENS Max tokens for decoder responses 4000
REFRAG_DECODER_MODE prompt or soft (soft requires patched llama.cpp) prompt
OPENAI_API_KEY API key for OpenAI provider unset
OPENAI_MODEL OpenAI model name gpt-4.1-mini
OPENAI_API_BASE OpenAI API base URL (supports Azure/compatible endpoints) https://api.openai.com/v1
GLM_API_KEY API key for GLM provider unset
GLM_MODEL GLM model name (used for context_answer) glm-4.6
GLM_MODEL_FAST GLM model for expand_query/simple tasks (higher concurrency) glm-4.5
GLM_TIMEOUT_SEC GLM request timeout in seconds unset
PSEUDO_BATCH_CONCURRENCY Parallel API calls for pseudo-tag indexing (1=sequential, 4=4x speedup) 1
MINIMAX_API_KEY API key for MiniMax M2 provider unset
MINIMAX_MODEL MiniMax model name MiniMax-M2
MINIMAX_API_BASE MiniMax API base URL https://api.minimax.io/v1
MINIMAX_TIMEOUT_SEC MiniMax request timeout in seconds unset
USE_GPU_DECODER Native Metal decoder (1) vs Docker (0) 0 (docker)
LLAMACPP_GPU_LAYERS Number of layers to offload to GPU, -1 for all 32

Runtime Selection

Set REFRAG_RUNTIME explicitly to choose a decoder backend:

  • llamacpp: Local llama.cpp server (requires REFRAG_DECODER=1)
  • openai: OpenAI API (GPT-4.1, GPT-4.1-mini, o1, etc.)
  • glm: ZhipuAI GLM models (GLM-4.5, GLM-4.6, GLM-4.7)
  • minimax: MiniMax M2 API

No auto-detection is performed to avoid surprise API calls. If REFRAG_RUNTIME is unset, it defaults to llamacpp.

Git History & Commit Indexing

Settings for indexing git commit history and enabling commit-aware search.

Name Description Default
REFRAG_COMMIT_DESCRIBE Enable commit lineage goals for indexing 1 (enabled)
COMMIT_VECTOR_SEARCH Enable vector search over commit messages 0 (disabled)
REMOTE_UPLOAD_GIT_MAX_COMMITS Max commits per upload bundle (0 = no git history) 500
GIT_HISTORY_PRUNE Prune old git_message points on manifest ingest 1 (enabled)
GIT_HISTORY_DELETE_MANIFEST Delete manifest files after successful ingest 1 (enabled)
GIT_HISTORY_MANIFEST_MAX_FILES Cap manifest files per .remote-git dir (0 = unlimited) 50

Note: Git history indexing stores commit messages and metadata as searchable points. Use search_commits_for MCP tool to query.

ReFRAG (Micro-Chunking & Retrieval)

Name Description Default
REFRAG_MODE Enable micro-chunking and span budgeting 1 (enabled)
REFRAG_GATE_FIRST Enable mini-vector gating 1 (enabled)
REFRAG_CANDIDATES Candidates for gate-first filtering 200
REFRAG_PSEUDO_DESCRIBE Enable LLM-based pseudo/tags generation during indexing 0 (disabled)
MICRO_BUDGET_TOKENS Token budget for context_answer 5000 (GLM: 6000-8192)
MICRO_OUT_MAX_SPANS Max spans returned per query 8 (GLM: 24)
MICRO_CHUNK_TOKENS Tokens per micro-chunk window 16
MICRO_CHUNK_STRIDE Stride between windows 8
MICRO_MERGE_LINES Lines to merge adjacent spans 4
MICRO_TOKENS_PER_LINE Estimated tokens per line 32

LLM-Based Pseudo/Tags (REFRAG_PSEUDO_DESCRIBE):

When enabled, the indexer uses the configured decoder (via REFRAG_RUNTIME) to generate semantic descriptions and tags for each code chunk. This enriches the lexical vectors with natural language terms, improving NL→code retrieval.

# Enable LLM pseudo/tags generation (requires decoder configured)
REFRAG_PSEUDO_DESCRIBE=1
REFRAG_RUNTIME=glm  # or openai, minimax, llamacpp

Note: This significantly increases indexing time and API costs. Best used with batch concurrency (PSEUDO_BATCH_CONCURRENCY=4).

Pseudo Backfill Worker

Deferred pseudo/tag generation runs asynchronously after initial indexing.

Name Description Default
PSEUDO_BACKFILL_ENABLED Enable async pseudo/tag backfill worker 0 (disabled)
PSEUDO_DEFER_TO_WORKER Skip inline pseudo, defer to backfill worker 0 (disabled)

Adaptive Span Sizing

Expand search hits to full symbol boundaries for better context.

Name Description Default
ADAPTIVE_SPAN_SIZING Expand hits to encompassing symbol boundaries 1 (enabled)
DEBUG_ADAPTIVE_SPAN Enable debug logging for span expansion 0 (disabled)

Context Answer Shaping

Controls output formatting for context_answer responses.

Name Description Default
CTX_SUMMARY_CHARS Max chars for summary (0 = disabled) 0
CTX_SNIPPET_CHARS Max chars per code snippet 400
DEBUG_CONTEXT_ANSWER Enable debug logging for context_answer 0 (disabled)

Mini Vector Gating

Compact 64-dim vectors for fast candidate filtering before full dense search.

Name Description Default
MINI_VECTOR_NAME Name of mini vector index mini
MINI_VEC_DIM Dimension of mini vectors 64
MINI_VEC_SEED Random projection seed (for reproducibility) 1337
HYBRID_MINI_WEIGHT Weight of mini vectors in hybrid scoring 0.5

Structural code pattern matching across languages. Disabled by default.

Name Description Default
PATTERN_VECTORS Enable pattern_search tool and pattern vector indexing 0 (disabled)

Enable:

# In .env or docker-compose
PATTERN_VECTORS=1

When enabled, the indexer extracts control-flow signatures (loops, branches, try/except, etc.) and stores them as pattern vectors. The pattern_search MCP tool allows finding structurally similar code across languages—e.g., a Python retry loop can match Go/Rust equivalents.

Note: Enabling requires reindexing to generate pattern vectors for existing files.

Lexical Vector Settings

Controls the sparse lexical (keyword) vectors used for hybrid search.

Name Description Default
LEX_VECTOR_NAME Name of lexical vector in Qdrant lex
LEX_VECTOR_DIM Dimension of lexical hash vector 2048
LEX_MULTI_HASH Hash functions per token (more = better collision resistance) 3
LEX_BIGRAMS Enable bigram hashing for phrase matching 1 (enabled)
LEX_BIGRAM_WEIGHT Weight for bigram entries relative to unigrams 0.7

Sparse Vector Settings (Experimental)

True sparse vectors for lossless lexical matching (no hash collisions).

Name Description Default
LEX_SPARSE_MODE Enable sparse lexical vectors instead of dense hash vectors 0 (off)
LEX_SPARSE_NAME Name of sparse vector index in Qdrant lex_sparse

Note: Enabling LEX_SPARSE_MODE requires the collection to have a sparse vector index configured. Use --recreate flag when switching modes. If sparse query fails or returns empty, the system automatically falls back to dense lexical vectors.

Note: Changing LEX_VECTOR_DIM requires recreating collections (--recreate flag).
To use legacy settings (pre-v2): LEX_VECTOR_DIM=4096 LEX_MULTI_HASH=1 LEX_BIGRAMS=0

Ports

Name Description Default
FASTMCP_PORT Memory MCP server port (SSE) 8000
FASTMCP_INDEXER_PORT Indexer MCP server port (SSE) 8001
FASTMCP_HTTP_PORT Memory RMCP host port mapping 8002
FASTMCP_INDEXER_HTTP_PORT Indexer RMCP host port mapping 8003
FASTMCP_HEALTH_PORT Health port (memory/indexer) memory: 18000; indexer: 18001

Search & Expansion

Name Description Default
HYBRID_EXPAND Enable heuristic multi-query expansion 0 (off)
LLM_EXPAND_MAX Max number of alternate queries to generate via LLM (0 = disabled) 0
EXPAND_MAX_TOKENS Max tokens for LLM query expansion response 512
REPO_AUTO_FILTER Auto-detect and filter to current repo in searches 1 (enabled)
HYBRID_IN_PROCESS Run hybrid search in-process (faster, falls back to subprocess) 1 (enabled)
RERANK_IN_PROCESS Run reranker in-process (faster, falls back to subprocess) 1 (enabled)
PARALLEL_DENSE_QUERIES Enable parallel dense query execution 1 (enabled)
PARALLEL_DENSE_THRESHOLD Min queries to trigger parallelization 4
HYBRID_SYMBOL_BOOST Score boost for exact symbol matches 0.35
HYBRID_RECENCY_WEIGHT Weight for recently modified files 0.1
HYBRID_PER_PATH Max results per file path 2
HYBRID_SNIPPET_DISK_READ Allow snippet scoring to read file contents 1 (enabled)
PRF_ENABLED Enable Pseudo-Relevance Feedback (refined second pass) 1 (enabled)
RERANK_EXPAND Expand candidates before reranking 1 (enabled)
REPO_SEARCH_DEFAULT_LIMIT Default result limit for repo_search 10

Note: REPO_AUTO_FILTER=0 disables automatic repo scoping, useful for benchmarks or cross-repo searches.

Caching

Name Description Default
MAX_EMBED_CACHE Max cached embeddings 16384
HYBRID_RESULTS_CACHE Max cached search results 128
HYBRID_RESULTS_CACHE_ENABLED Enable search result caching 1 (enabled)

Semantic Expansion

Synonym/related term expansion for improved recall on natural language queries.

Name Description Default
SEMANTIC_EXPANSION_ENABLED Enable semantic term expansion 1 (enabled)
SEMANTIC_EXPANSION_TOP_K Number of similar terms to consider 5
SEMANTIC_EXPANSION_SIMILARITY_THRESHOLD Min similarity for expansion terms 0.7
SEMANTIC_EXPANSION_MAX_TERMS Max expansion terms added per query 3
SEMANTIC_EXPANSION_CACHE_SIZE Cache size for expansion lookups 1000
SEMANTIC_EXPANSION_CACHE_TTL Cache TTL in seconds 3600

LLM Query Expansion

Query expansion uses the decoder infrastructure (set via REFRAG_RUNTIME):

  • openai: Uses OpenAI API when REFRAG_RUNTIME=openai and OPENAI_API_KEY is set
  • glm: Uses GLM API when REFRAG_RUNTIME=glm and GLM_API_KEY is set
  • minimax: Uses MiniMax API when REFRAG_RUNTIME=minimax and MINIMAX_API_KEY is set
  • llamacpp: Uses local llama.cpp when REFRAG_RUNTIME=llamacpp and REFRAG_DECODER=1

Set LLM_EXPAND_MAX=4 to enable LLM-assisted query expansion (generates up to 4 alternate phrasings).
EXPAND_MAX_TOKENS controls the response length budget for the LLM call.

Filename Boost

The search engine can boost files whose paths match query terms—production-grade algorithm for real-world codebases.

Name Description Default
FNAME_BOOST Base score boost factor for path/query token matches 0.15

Naming convention support:

  • snake_case, camelCase, PascalCase, kebab-case, SCREAMING_CASE
  • Acronyms: XMLParser → xml, parser; HTTPClient → http, client
  • Prefixes stripped: IUserService → user, service; _private → private
  • Dot notation: com.company.auth → com, company, auth

Abbreviation normalization:

  • auth ↔ authenticate/authentication
  • config ↔ configuration/cfg/conf
  • repo ↔ repository, util ↔ utility, impl ↔ implementation, etc.

Scoring tiers:

  • Exact token match: 1.0 × factor
  • Normalized match (abbreviation/plural): 0.8 × factor
  • Substring containment: 0.4 × factor
  • Filename bonus: 1.5× multiplier for filename vs directory matches
  • Common token penalty: 0.5× for tokens like "utils", "index", "main"

Example: Query "authenticate user handler" matching auth/UserAuthHandler.ts:

  • "user" exact match in filename (1.0 × 1.5 = 1.5)
  • "authenticate" → "auth" normalized (0.8 × 1.5 = 1.2)
  • "handler" exact match in filename (1.0 × 1.5 = 1.5)
  • Total: 4.2 × 0.15 = 0.63 boost

Set FNAME_BOOST=0 to disable, or increase (e.g., 0.25) for stronger path weighting.

info_request Tool

Simplified codebase retrieval with optional explanation mode.

Name Description Default
INFO_REQUEST_LIMIT Default result limit for info_request queries 10
INFO_REQUEST_CONTEXT_LINES Context lines in snippets (richer than repo_search) 5

Output Formatting

TOON (Token-Oriented Object Notation)

Compact output format that reduces token usage by 40-60%.

Name Description Default
TOON_ENABLED Enable TOON format by default for all search output 0 (disabled)

Set output_format="toon" per-call, or enable globally via TOON_ENABLED=1.

Memory Blending

Name Description Default
MEMORY_SSE_ENABLED Enable SSE memory blending false
MEMORY_MCP_URL Memory MCP endpoint for blending http://mcp:8000/sse
MEMORY_MCP_TIMEOUT Timeout for memory queries 6
MEMORY_AUTODETECT Auto-detect memory collection 1
MEMORY_COLLECTION_TTL_SECS Cache TTL for collection detection 300

Exclusions (.qdrantignore)

The indexer supports a .qdrantignore file at the repo root (similar to .gitignore).

Default exclusions (overridable):

  • /models, /node_modules, /dist, /build
  • /.venv, /venv, /__pycache__, /.git
  • *.onnx, *.bin, *.safetensors, tokenizer.json, *.whl, *.tar.gz

Override via env or flags:

# Disable defaults
QDRANT_DEFAULT_EXCLUDES=0

# Custom ignore file
QDRANT_IGNORE_FILE=.myignore

# Additional excludes
QDRANT_EXCLUDES='tokenizer.json,*.onnx,/third_party'

CLI examples:

docker compose run --rm indexer --root /work --ignore-file .qdrantignore
docker compose run --rm indexer --root /work --no-default-excludes --exclude '/vendor' --exclude '*.bin'

Scaling Recommendations

Repo Size Chunk Lines Overlap Batch Size
Small (<100 files) 80-120 16-24 32-64
Medium (100s-1k files) 120-160 ~20 64-128
Large (1k+ files) 120 (default) 20 128+

For large monorepos, set INDEX_PROGRESS_EVERY=200 for visibility.