Configuration Reference
Complete environment variable reference for Context Engine.
Documentation: README · Getting Started · Configuration · IDE Clients · MCP API · ctx CLI · Memory Guide · Architecture · Multi-Repo · Observability · Kubernetes · VS Code Extension · Troubleshooting · Development
On this page:
- Core Settings
- Embedding Models
- Indexing & Micro-Chunks
- Query Optimization
- Watcher Settings
- Reranker
- Learning Reranker
- Decoder (llama.cpp / OpenAI / GLM / MiniMax)
- Git History & Commit Indexing
- ReFRAG
- Pattern Search
- Lexical Vector Settings
- Ports
- Search & Expansion
- info_request Tool
- Memory Blending
Core Settings
| Name | Description | Default |
|---|---|---|
| COLLECTION_NAME | Qdrant collection name (unified across all repos) | codebase |
| REPO_NAME | Logical repo tag stored in payload for filtering | auto-detect from git/folder |
| HOST_INDEX_PATH | Host path mounted at /work in containers | current repo (.) |
| QDRANT_URL | Qdrant base URL | container: http://qdrant:6333; local: http://localhost:6333 |
| MULTI_REPO_MODE | Enable multi-repo collections (each subdir gets own collection) | 0 (disabled) |
| LOG_LEVEL | Logging verbosity: DEBUG, INFO, WARNING, ERROR, CRITICAL | INFO |
| CTXCE_AUTH_ENABLED | Enable API authentication (requires token header) | 0 (disabled) |
| CTXCE_AUTH_ADMIN_TOKEN | Admin token for authenticated requests | unset |
Tool Description Customization
Override default MCP tool descriptions (useful for agent tuning).
| Name | Description | Default |
|---|---|---|
| TOOL_STORE_DESCRIPTION | Custom description for memory_store tool | built-in |
| TOOL_FIND_DESCRIPTION | Custom description for memory_find tool | built-in |
Embedding Models
Context Engine supports multiple embedding models via the EMBEDDING_MODEL and EMBEDDING_PROVIDER settings.
Default (BGE-base)
The default configuration uses BAAI/bge-base-en-v1.5 via fastembed:
| Name | Description | Default |
|---|---|---|
| EMBEDDING_MODEL | Model name for dense embeddings | BAAI/bge-base-en-v1.5 |
| EMBEDDING_PROVIDER | Backend provider | fastembed |
| EMBEDDING_SEED | Seed for deterministic embeddings (used in benchmarks) | unset |
Qwen3-Embedding (Experimental)
Qwen3-Embedding-0.6B offers improved semantic understanding with instruction-aware encoding. Enable via feature flag:
| Name | Description | Default |
|---|---|---|
| QWEN3_EMBEDDING_ENABLED | Enable Qwen3 embedding support | 0 (disabled) |
| QWEN3_QUERY_INSTRUCTION | Add instruction prefix to search queries | 1 (enabled when Qwen3 active) |
| QWEN3_INSTRUCTION_TEXT | Custom instruction prefix | Instruct: Given a code search query, retrieve relevant code snippets\nQuery: |
Setup:
# In .env
QWEN3_EMBEDDING_ENABLED=1
EMBEDDING_MODEL=electroglyph/Qwen3-Embedding-0.6B-onnx-uint8
QWEN3_QUERY_INSTRUCTION=1
# Optional: customize instruction
# QWEN3_INSTRUCTION_TEXT=Instruct: Find code implementing this feature\nQuery:
Important: Switching embedding models requires a full reindex:
make reset-dev-dual # Recreates collection and reindexes
Dimension comparison:
| Model | Dimensions | Notes |
|---|---|---|
| BGE-base-en-v1.5 | 768 | Default, well-tested |
| Qwen3-Embedding-0.6B | 1024 | Instruction-aware, experimental |
Indexing & Micro-Chunks
| Name | Description | Default |
|---|---|---|
| INDEX_MICRO_CHUNKS | Enable token-based micro-chunking | 0 (off) |
| MAX_MICRO_CHUNKS_PER_FILE | Cap micro-chunks per file | 200 |
| TOKENIZER_URL | HF tokenizer.json URL (for Make download) | n/a |
| TOKENIZER_PATH | Local path where tokenizer is saved (Make) | models/tokenizer.json |
| TOKENIZER_JSON | Runtime path for tokenizer (indexer) | models/tokenizer.json |
| USE_TREE_SITTER | Enable tree-sitter parsing (py/js/ts) | 1 (on) |
| INDEX_USE_ENHANCED_AST | Enable advanced AST-based semantic chunking | 1 (on) |
| INDEX_SEMANTIC_CHUNKS | Enable semantic chunking (preserve function/class boundaries) | 1 (on) |
| INDEX_CHUNK_LINES | Lines per chunk (non-micro mode) | 120 |
| INDEX_CHUNK_OVERLAP | Overlap lines between chunks | 20 |
| INDEX_BATCH_SIZE | Upsert batch size | 64 |
| INDEX_PROGRESS_EVERY | Log progress every N files | 200 |
| SMART_SYMBOL_REINDEXING | Reuse embeddings when only symbols change | 1 (enabled) |
| MAX_CHANGED_SYMBOLS_RATIO | Threshold for full reindex vs smart update | 0.6 |
Query Optimization
Dynamic HNSW_EF tuning and intelligent query routing for 2x faster simple queries.
| Name | Description | Default |
|---|---|---|
| QUERY_OPTIMIZER_ADAPTIVE | Enable adaptive EF optimization | 1 (on) |
| QUERY_OPTIMIZER_MIN_EF | Minimum EF value | 64 |
| QUERY_OPTIMIZER_MAX_EF | Maximum EF value | 512 |
| QUERY_OPTIMIZER_SIMPLE_THRESHOLD | Complexity threshold for simple queries | 0.3 |
| QUERY_OPTIMIZER_COMPLEX_THRESHOLD | Complexity threshold for complex queries | 0.7 |
| QUERY_OPTIMIZER_SIMPLE_FACTOR | EF multiplier for simple queries | 0.5 |
| QUERY_OPTIMIZER_SEMANTIC_FACTOR | EF multiplier for semantic queries | 1.0 |
| QUERY_OPTIMIZER_COMPLEX_FACTOR | EF multiplier for complex queries | 2.0 |
| QUERY_OPTIMIZER_DENSE_THRESHOLD | Complexity threshold for dense-only routing | 0.2 |
| QUERY_OPTIMIZER_COLLECTION_SIZE | Approximate collection size for scaling | 10000 |
| QDRANT_EF_SEARCH | Base HNSW_EF value (overridden by optimizer) | 128 |
Watcher Settings
| Name | Description | Default |
|---|---|---|
| WATCH_DEBOUNCE_SECS | Debounce between FS events | 1.5 |
| INDEX_UPSERT_BATCH | Upsert batch size (watcher) | 128 |
| INDEX_UPSERT_RETRIES | Retry count | 5 |
| INDEX_UPSERT_BACKOFF | Seconds between retries | 0.5 |
| QDRANT_TIMEOUT | HTTP timeout seconds | watcher: 60; search: 20 |
| MCP_TOOL_TIMEOUT_SECS | Max duration for long-running MCP tools | 3600 |
Reranker
Cross-encoder reranking improves search quality by scoring query-document pairs directly. Context Engine supports two configuration methods:
FastEmbed Model (Recommended)
Set RERANKER_MODEL to use FastEmbed's auto-downloading cross-encoder models:
| Name | Description | Default |
|---|---|---|
| RERANKER_MODEL | FastEmbed reranker model name | unset |
| RERANKER_ENABLED | Enable reranker by default | 1 (enabled) |
Popular models:
jinaai/jina-reranker-v2-base-multilingual- Multilingual, good qualityBAAI/bge-reranker-base- English-focused, fastXenova/ms-marco-MiniLM-L-6-v2- Lightweight, fast inference
Example:
RERANKER_MODEL=jinaai/jina-reranker-v2-base-multilingual
RERANKER_ENABLED=1
Manual ONNX Paths (Legacy)
For custom models or explicit control, set both ONNX path and tokenizer:
| Name | Description | Default |
|---|---|---|
| RERANKER_ONNX_PATH | Local ONNX cross-encoder model path | unset |
| RERANKER_TOKENIZER_PATH | Tokenizer path for reranker | unset |
| RERANKER_ENABLED | Enable reranker by default | 1 (enabled) |
Note: If both RERANKER_MODEL and RERANKER_ONNX_PATH are set, RERANKER_MODEL takes priority.
Reranker Tuning
| Name | Description | Default |
|---|---|---|
| RERANKER_TOPN | Candidates to retrieve before reranking | 50 |
| RERANKER_RETURN_M | Final results after reranking | 12 |
| RERANKER_TIMEOUT_MS | Rerank timeout in milliseconds | 2000 |
| RERANK_BLEND_WEIGHT | Ratio of rerank vs fusion score (0.0-1.0) | 0.6 |
| RERANK_TIMEOUT_FLOOR_MS | Min timeout to avoid cold-start failures | 1000 |
| POST_RERANK_SYMBOL_BOOST | Score boost for exact symbol matches after rerank | 1.0 |
| EMBEDDING_WARMUP | Warm up embedding model on startup | 0 (disabled) |
| RERANK_WARMUP | Warm up reranker model on startup | 0 (disabled) |
Learning Reranker
The learning reranker trains a lightweight neural network (TinyScorer) to improve search rankings over time. See Architecture for details.
This feature is optional and enabled by default. To disable:
# Disable learning scorer in search results
RERANK_LEARNING=0
# Disable event logging (no training data collected)
RERANK_EVENTS_ENABLED=0
# Or simply don't run the learning_worker container
Enable/Disable
| Name | Description | Default |
|---|---|---|
| RERANK_LEARNING | Enable learning scorer in search results | 1 (enabled) |
| RERANK_EVENTS_ENABLED | Enable event logging for training | 1 (enabled) |
| RERANK_EVENTS_SAMPLE_RATE | Fraction of events to log (0.0-1.0) | 0.33 |
Weight Management
| Name | Description | Default |
|---|---|---|
| RERANKER_WEIGHTS_DIR | Directory for learned weight files | /tmp/rerank_weights |
| RERANKER_WEIGHTS_RELOAD_INTERVAL | How often to check for new weights (seconds) | 60 |
| RERANKER_MAX_CHECKPOINTS | Number of weight versions to retain | 5 |
Learning Rate
| Name | Description | Default |
|---|---|---|
| RERANKER_LR_DECAY_STEPS | Updates between learning rate decay | 1000 |
| RERANKER_LR_DECAY_RATE | Decay multiplier (e.g., 0.95 = 5% reduction) | 0.95 |
| RERANKER_MIN_LR | Minimum learning rate floor | 0.0001 |
Event Logging
| Name | Description | Default |
|---|---|---|
| RERANK_EVENTS_DIR | Directory for search event logs | /tmp/rerank_events |
| RERANK_EVENTS_RETENTION_DAYS | Days to keep event files before cleanup | 7 |
Learning Worker
| Name | Description | Default |
|---|---|---|
| RERANK_LEARNING_BATCH_SIZE | Number of events per training batch | 32 |
| RERANK_LEARNING_POLL_INTERVAL | Seconds between checking for new events | 30 |
| RERANK_LEARNING_RATE | Initial learning rate for TinyScorer | 0.001 |
| RERANK_LLM_TEACHER | Enable LLM-teacher guided learning | 1 (enabled) |
| RERANK_LLM_SAMPLE_RATE | Fraction of queries to evaluate with LLM teacher | 1.0 |
| RERANK_VICREG_WEIGHT | Weight for VICReg consistency loss | 0.1 |
Decoder (llama.cpp / OpenAI / GLM / MiniMax)
| Name | Description | Default |
|---|---|---|
| REFRAG_DECODER | Enable decoder for context_answer (required for llamacpp) | 1 (enabled) |
| REFRAG_RUNTIME | Decoder backend: llamacpp, openai, glm, or minimax | llamacpp |
| LLAMACPP_URL | llama.cpp server endpoint | http://llamacpp:8080 or http://host.docker.internal:8081 |
| LLAMACPP_TIMEOUT_SEC | Decoder request timeout | 300 |
| DECODER_MAX_TOKENS | Max tokens for decoder responses | 4000 |
| REFRAG_DECODER_MODE | prompt or soft (soft requires patched llama.cpp) | prompt |
| OPENAI_API_KEY | API key for OpenAI provider | unset |
| OPENAI_MODEL | OpenAI model name | gpt-4.1-mini |
| OPENAI_API_BASE | OpenAI API base URL (supports Azure/compatible endpoints) | https://api.openai.com/v1 |
| GLM_API_KEY | API key for GLM provider | unset |
| GLM_MODEL | GLM model name (used for context_answer) | glm-4.6 |
| GLM_MODEL_FAST | GLM model for expand_query/simple tasks (higher concurrency) | glm-4.5 |
| GLM_TIMEOUT_SEC | GLM request timeout in seconds | unset |
| PSEUDO_BATCH_CONCURRENCY | Parallel API calls for pseudo-tag indexing (1=sequential, 4=4x speedup) | 1 |
| MINIMAX_API_KEY | API key for MiniMax M2 provider | unset |
| MINIMAX_MODEL | MiniMax model name | MiniMax-M2 |
| MINIMAX_API_BASE | MiniMax API base URL | https://api.minimax.io/v1 |
| MINIMAX_TIMEOUT_SEC | MiniMax request timeout in seconds | unset |
| USE_GPU_DECODER | Native Metal decoder (1) vs Docker (0) | 0 (docker) |
| LLAMACPP_GPU_LAYERS | Number of layers to offload to GPU, -1 for all | 32 |
Runtime Selection
Set REFRAG_RUNTIME explicitly to choose a decoder backend:
- llamacpp: Local llama.cpp server (requires
REFRAG_DECODER=1) - openai: OpenAI API (GPT-4.1, GPT-4.1-mini, o1, etc.)
- glm: ZhipuAI GLM models (GLM-4.5, GLM-4.6, GLM-4.7)
- minimax: MiniMax M2 API
No auto-detection is performed to avoid surprise API calls. If REFRAG_RUNTIME is unset, it defaults to llamacpp.
Git History & Commit Indexing
Settings for indexing git commit history and enabling commit-aware search.
| Name | Description | Default |
|---|---|---|
| REFRAG_COMMIT_DESCRIBE | Enable commit lineage goals for indexing | 1 (enabled) |
| COMMIT_VECTOR_SEARCH | Enable vector search over commit messages | 0 (disabled) |
| REMOTE_UPLOAD_GIT_MAX_COMMITS | Max commits per upload bundle (0 = no git history) | 500 |
| GIT_HISTORY_PRUNE | Prune old git_message points on manifest ingest | 1 (enabled) |
| GIT_HISTORY_DELETE_MANIFEST | Delete manifest files after successful ingest | 1 (enabled) |
| GIT_HISTORY_MANIFEST_MAX_FILES | Cap manifest files per .remote-git dir (0 = unlimited) | 50 |
Note: Git history indexing stores commit messages and metadata as searchable points. Use search_commits_for MCP tool to query.
ReFRAG (Micro-Chunking & Retrieval)
| Name | Description | Default |
|---|---|---|
| REFRAG_MODE | Enable micro-chunking and span budgeting | 1 (enabled) |
| REFRAG_GATE_FIRST | Enable mini-vector gating | 1 (enabled) |
| REFRAG_CANDIDATES | Candidates for gate-first filtering | 200 |
| REFRAG_PSEUDO_DESCRIBE | Enable LLM-based pseudo/tags generation during indexing | 0 (disabled) |
| MICRO_BUDGET_TOKENS | Token budget for context_answer | 5000 (GLM: 6000-8192) |
| MICRO_OUT_MAX_SPANS | Max spans returned per query | 8 (GLM: 24) |
| MICRO_CHUNK_TOKENS | Tokens per micro-chunk window | 16 |
| MICRO_CHUNK_STRIDE | Stride between windows | 8 |
| MICRO_MERGE_LINES | Lines to merge adjacent spans | 4 |
| MICRO_TOKENS_PER_LINE | Estimated tokens per line | 32 |
LLM-Based Pseudo/Tags (REFRAG_PSEUDO_DESCRIBE):
When enabled, the indexer uses the configured decoder (via REFRAG_RUNTIME) to generate semantic descriptions and tags for each code chunk. This enriches the lexical vectors with natural language terms, improving NL→code retrieval.
# Enable LLM pseudo/tags generation (requires decoder configured)
REFRAG_PSEUDO_DESCRIBE=1
REFRAG_RUNTIME=glm # or openai, minimax, llamacpp
Note: This significantly increases indexing time and API costs. Best used with batch concurrency (PSEUDO_BATCH_CONCURRENCY=4).
Pseudo Backfill Worker
Deferred pseudo/tag generation runs asynchronously after initial indexing.
| Name | Description | Default |
|---|---|---|
| PSEUDO_BACKFILL_ENABLED | Enable async pseudo/tag backfill worker | 0 (disabled) |
| PSEUDO_DEFER_TO_WORKER | Skip inline pseudo, defer to backfill worker | 0 (disabled) |
Adaptive Span Sizing
Expand search hits to full symbol boundaries for better context.
| Name | Description | Default |
|---|---|---|
| ADAPTIVE_SPAN_SIZING | Expand hits to encompassing symbol boundaries | 1 (enabled) |
| DEBUG_ADAPTIVE_SPAN | Enable debug logging for span expansion | 0 (disabled) |
Context Answer Shaping
Controls output formatting for context_answer responses.
| Name | Description | Default |
|---|---|---|
| CTX_SUMMARY_CHARS | Max chars for summary (0 = disabled) | 0 |
| CTX_SNIPPET_CHARS | Max chars per code snippet | 400 |
| DEBUG_CONTEXT_ANSWER | Enable debug logging for context_answer | 0 (disabled) |
Mini Vector Gating
Compact 64-dim vectors for fast candidate filtering before full dense search.
| Name | Description | Default |
|---|---|---|
| MINI_VECTOR_NAME | Name of mini vector index | mini |
| MINI_VEC_DIM | Dimension of mini vectors | 64 |
| MINI_VEC_SEED | Random projection seed (for reproducibility) | 1337 |
| HYBRID_MINI_WEIGHT | Weight of mini vectors in hybrid scoring | 0.5 |
Pattern Search
Structural code pattern matching across languages. Disabled by default.
| Name | Description | Default |
|---|---|---|
| PATTERN_VECTORS | Enable pattern_search tool and pattern vector indexing | 0 (disabled) |
Enable:
# In .env or docker-compose
PATTERN_VECTORS=1
When enabled, the indexer extracts control-flow signatures (loops, branches, try/except, etc.) and stores them as pattern vectors. The pattern_search MCP tool allows finding structurally similar code across languages—e.g., a Python retry loop can match Go/Rust equivalents.
Note: Enabling requires reindexing to generate pattern vectors for existing files.
Lexical Vector Settings
Controls the sparse lexical (keyword) vectors used for hybrid search.
| Name | Description | Default |
|---|---|---|
| LEX_VECTOR_NAME | Name of lexical vector in Qdrant | lex |
| LEX_VECTOR_DIM | Dimension of lexical hash vector | 2048 |
| LEX_MULTI_HASH | Hash functions per token (more = better collision resistance) | 3 |
| LEX_BIGRAMS | Enable bigram hashing for phrase matching | 1 (enabled) |
| LEX_BIGRAM_WEIGHT | Weight for bigram entries relative to unigrams | 0.7 |
Sparse Vector Settings (Experimental)
True sparse vectors for lossless lexical matching (no hash collisions).
| Name | Description | Default |
|---|---|---|
| LEX_SPARSE_MODE | Enable sparse lexical vectors instead of dense hash vectors | 0 (off) |
| LEX_SPARSE_NAME | Name of sparse vector index in Qdrant | lex_sparse |
Note: Enabling LEX_SPARSE_MODE requires the collection to have a sparse vector index configured. Use --recreate flag when switching modes. If sparse query fails or returns empty, the system automatically falls back to dense lexical vectors.
Note: Changing LEX_VECTOR_DIM requires recreating collections (--recreate flag).
To use legacy settings (pre-v2): LEX_VECTOR_DIM=4096 LEX_MULTI_HASH=1 LEX_BIGRAMS=0
Ports
| Name | Description | Default |
|---|---|---|
| FASTMCP_PORT | Memory MCP server port (SSE) | 8000 |
| FASTMCP_INDEXER_PORT | Indexer MCP server port (SSE) | 8001 |
| FASTMCP_HTTP_PORT | Memory RMCP host port mapping | 8002 |
| FASTMCP_INDEXER_HTTP_PORT | Indexer RMCP host port mapping | 8003 |
| FASTMCP_HEALTH_PORT | Health port (memory/indexer) | memory: 18000; indexer: 18001 |
Search & Expansion
| Name | Description | Default |
|---|---|---|
| HYBRID_EXPAND | Enable heuristic multi-query expansion | 0 (off) |
| LLM_EXPAND_MAX | Max number of alternate queries to generate via LLM (0 = disabled) | 0 |
| EXPAND_MAX_TOKENS | Max tokens for LLM query expansion response | 512 |
| REPO_AUTO_FILTER | Auto-detect and filter to current repo in searches | 1 (enabled) |
| HYBRID_IN_PROCESS | Run hybrid search in-process (faster, falls back to subprocess) | 1 (enabled) |
| RERANK_IN_PROCESS | Run reranker in-process (faster, falls back to subprocess) | 1 (enabled) |
| PARALLEL_DENSE_QUERIES | Enable parallel dense query execution | 1 (enabled) |
| PARALLEL_DENSE_THRESHOLD | Min queries to trigger parallelization | 4 |
| HYBRID_SYMBOL_BOOST | Score boost for exact symbol matches | 0.35 |
| HYBRID_RECENCY_WEIGHT | Weight for recently modified files | 0.1 |
| HYBRID_PER_PATH | Max results per file path | 2 |
| HYBRID_SNIPPET_DISK_READ | Allow snippet scoring to read file contents | 1 (enabled) |
| PRF_ENABLED | Enable Pseudo-Relevance Feedback (refined second pass) | 1 (enabled) |
| RERANK_EXPAND | Expand candidates before reranking | 1 (enabled) |
| REPO_SEARCH_DEFAULT_LIMIT | Default result limit for repo_search | 10 |
Note: REPO_AUTO_FILTER=0 disables automatic repo scoping, useful for benchmarks or cross-repo searches.
Caching
| Name | Description | Default |
|---|---|---|
| MAX_EMBED_CACHE | Max cached embeddings | 16384 |
| HYBRID_RESULTS_CACHE | Max cached search results | 128 |
| HYBRID_RESULTS_CACHE_ENABLED | Enable search result caching | 1 (enabled) |
Semantic Expansion
Synonym/related term expansion for improved recall on natural language queries.
| Name | Description | Default |
|---|---|---|
| SEMANTIC_EXPANSION_ENABLED | Enable semantic term expansion | 1 (enabled) |
| SEMANTIC_EXPANSION_TOP_K | Number of similar terms to consider | 5 |
| SEMANTIC_EXPANSION_SIMILARITY_THRESHOLD | Min similarity for expansion terms | 0.7 |
| SEMANTIC_EXPANSION_MAX_TERMS | Max expansion terms added per query | 3 |
| SEMANTIC_EXPANSION_CACHE_SIZE | Cache size for expansion lookups | 1000 |
| SEMANTIC_EXPANSION_CACHE_TTL | Cache TTL in seconds | 3600 |
LLM Query Expansion
Query expansion uses the decoder infrastructure (set via REFRAG_RUNTIME):
- openai: Uses OpenAI API when
REFRAG_RUNTIME=openaiandOPENAI_API_KEYis set - glm: Uses GLM API when
REFRAG_RUNTIME=glmandGLM_API_KEYis set - minimax: Uses MiniMax API when
REFRAG_RUNTIME=minimaxandMINIMAX_API_KEYis set - llamacpp: Uses local llama.cpp when
REFRAG_RUNTIME=llamacppandREFRAG_DECODER=1
Set LLM_EXPAND_MAX=4 to enable LLM-assisted query expansion (generates up to 4 alternate phrasings).EXPAND_MAX_TOKENS controls the response length budget for the LLM call.
Filename Boost
The search engine can boost files whose paths match query terms—production-grade algorithm for real-world codebases.
| Name | Description | Default |
|---|---|---|
| FNAME_BOOST | Base score boost factor for path/query token matches | 0.15 |
Naming convention support:
- snake_case, camelCase, PascalCase, kebab-case, SCREAMING_CASE
- Acronyms:
XMLParser→ xml, parser;HTTPClient→ http, client - Prefixes stripped:
IUserService→ user, service;_private→ private - Dot notation:
com.company.auth→ com, company, auth
Abbreviation normalization:
- auth ↔ authenticate/authentication
- config ↔ configuration/cfg/conf
- repo ↔ repository, util ↔ utility, impl ↔ implementation, etc.
Scoring tiers:
- Exact token match: 1.0 × factor
- Normalized match (abbreviation/plural): 0.8 × factor
- Substring containment: 0.4 × factor
- Filename bonus: 1.5× multiplier for filename vs directory matches
- Common token penalty: 0.5× for tokens like "utils", "index", "main"
Example: Query "authenticate user handler" matching auth/UserAuthHandler.ts:
- "user" exact match in filename (1.0 × 1.5 = 1.5)
- "authenticate" → "auth" normalized (0.8 × 1.5 = 1.2)
- "handler" exact match in filename (1.0 × 1.5 = 1.5)
- Total: 4.2 × 0.15 = 0.63 boost
Set FNAME_BOOST=0 to disable, or increase (e.g., 0.25) for stronger path weighting.
info_request Tool
Simplified codebase retrieval with optional explanation mode.
| Name | Description | Default |
|---|---|---|
| INFO_REQUEST_LIMIT | Default result limit for info_request queries | 10 |
| INFO_REQUEST_CONTEXT_LINES | Context lines in snippets (richer than repo_search) | 5 |
Output Formatting
TOON (Token-Oriented Object Notation)
Compact output format that reduces token usage by 40-60%.
| Name | Description | Default |
|---|---|---|
| TOON_ENABLED | Enable TOON format by default for all search output | 0 (disabled) |
Set output_format="toon" per-call, or enable globally via TOON_ENABLED=1.
Memory Blending
| Name | Description | Default |
|---|---|---|
| MEMORY_SSE_ENABLED | Enable SSE memory blending | false |
| MEMORY_MCP_URL | Memory MCP endpoint for blending | http://mcp:8000/sse |
| MEMORY_MCP_TIMEOUT | Timeout for memory queries | 6 |
| MEMORY_AUTODETECT | Auto-detect memory collection | 1 |
| MEMORY_COLLECTION_TTL_SECS | Cache TTL for collection detection | 300 |
Exclusions (.qdrantignore)
The indexer supports a .qdrantignore file at the repo root (similar to .gitignore).
Default exclusions (overridable):
/models,/node_modules,/dist,/build/.venv,/venv,/__pycache__,/.git*.onnx,*.bin,*.safetensors,tokenizer.json,*.whl,*.tar.gz
Override via env or flags:
# Disable defaults
QDRANT_DEFAULT_EXCLUDES=0
# Custom ignore file
QDRANT_IGNORE_FILE=.myignore
# Additional excludes
QDRANT_EXCLUDES='tokenizer.json,*.onnx,/third_party'
CLI examples:
docker compose run --rm indexer --root /work --ignore-file .qdrantignore
docker compose run --rm indexer --root /work --no-default-excludes --exclude '/vendor' --exclude '*.bin'
Scaling Recommendations
| Repo Size | Chunk Lines | Overlap | Batch Size |
|---|---|---|---|
| Small (<100 files) | 80-120 | 16-24 | 32-64 |
| Medium (100s-1k files) | 120-160 | ~20 | 64-128 |
| Large (1k+ files) | 120 (default) | 20 | 128+ |
For large monorepos, set INDEX_PROGRESS_EVERY=200 for visibility.