Harden web search and docs defaults
This commit is contained in:
@@ -14,6 +14,13 @@ shell code.
|
||||
| `CONTEXT_KIT_DATA_DIR` | `$HOME/.local/share/context-kit` | Persistent docs indexes and model cache |
|
||||
| `CONTEXT_KIT_COMPOSE_PROJECT` | `context-kit` | Docker Compose project and network prefix |
|
||||
| `CONTEXT_KIT_SEARXNG_PORT` | `8099` | Localhost SearXNG port |
|
||||
| `CONTEXT_KIT_WEB_SEARCH_MAX_BYTES` | `52428800` | Max bytes `context-web-search` accepts and downloads per fetch |
|
||||
| `CONTEXT_KIT_WEB_SEARCH_PROVIDER` | `searxng` | Default `search_web` provider; fallback order depends on this provider |
|
||||
| `CONTEXT_KIT_WEB_SEARCH_HTTP_TIMEOUT` | `15000` | HTTP timeout in milliseconds for search providers |
|
||||
| `CONTEXT_KIT_WEB_SEARCH_MAX_RESULTS` | `10` | Default search result count when clients omit `limit` |
|
||||
| `CONTEXT_KIT_WEB_SEARCH_CHROME_PATH` | `/usr/bin/chromium` | Chromium path inside the web-search image for Bing fallback |
|
||||
| `CONTEXT_KIT_WEB_SEARCH_BROWSER_USER_AGENT` | bundled Chrome/Linux UA | User agent for the Chromium-backed Bing fallback |
|
||||
| `CONTEXT_KIT_WEB_SEARCH_MCP_COMPAT_MODE` | unset | Set to `legacy` for MCP clients with weak tool-schema parsers |
|
||||
| `CONTEXT_KIT_DOCS_PORT` | `8776` | Localhost port for the long-lived docs-mcp HTTP service |
|
||||
| `CONTEXT_KIT_DOCS_HTTP_URL` | `http://127.0.0.1:${CONTEXT_KIT_DOCS_PORT}/mcp` | URL emitted into HTTP MCP install snippets |
|
||||
| `CONTEXT_KIT_DOCS_ALLOW_ORIGIN` | unset | Optional exact browser CORS origin(s) for docs-mcp, separated by spaces |
|
||||
@@ -22,6 +29,8 @@ shell code.
|
||||
| `CONTEXT_KIT_DOCS_MAX_GET_BYTES` | `75000` | Max bytes returned by docs retrieval |
|
||||
| `CONTEXT_KIT_DOCS_EMBED_MODEL` | `BAAI/bge-small-en-v1.5` | SentenceTransformers embedding model |
|
||||
| `CONTEXT_KIT_DOCS_PREINDEX` | `0` | Set to `1` to re-embed every source on container start |
|
||||
| `CONTEXT_KIT_DOCS_LOCAL_SOURCES_DIR` | `${CONTEXT_KIT_DATA_DIR}/local-sources` | Machine-local llms.txt tree mounted read-only into docs-mcp |
|
||||
| `CONTEXT_KIT_DOCS_LOCAL_SOURCES_PORT` | `8769` | Loopback port inside docs-mcp for serving local source files |
|
||||
|
||||
## TTL Guidance
|
||||
|
||||
@@ -66,3 +75,8 @@ CONTEXT_KIT_DOCS_SOURCES="config/sources.default.txt config/sources.js.txt"
|
||||
```
|
||||
|
||||
Each source file is plain text. Blank lines and `#` comments are ignored.
|
||||
Entries may be absolute source-profile paths for private machine-local config.
|
||||
For local llms.txt files, place content under
|
||||
`CONTEXT_KIT_DOCS_LOCAL_SOURCES_DIR` and reference it as
|
||||
`http://127.0.0.1:8769/path/inside/local-sources.txt`; that loopback URL is
|
||||
inside the docs-mcp container, not exposed on the host.
|
||||
|
||||
@@ -33,6 +33,37 @@ Build default images:
|
||||
bin/context-kit build
|
||||
```
|
||||
|
||||
## Fetch URL Says Max Download Bytes Is Too Big
|
||||
|
||||
If `fetch_url` fails before making a network request with an MCP validation error
|
||||
like `Number must be less than or equal to 26214400`, rebuild the web-search MCP
|
||||
image:
|
||||
|
||||
```sh
|
||||
bin/context-kit build
|
||||
```
|
||||
|
||||
Context Kit patches the upstream `mcp-web-search` schema so the accepted
|
||||
`max_download_bytes` value matches `CONTEXT_KIT_WEB_SEARCH_MAX_BYTES`, which
|
||||
defaults to `52428800`.
|
||||
|
||||
## Search Fallback and Chromium
|
||||
|
||||
`search_web` defaults to SearXNG. If SearXNG fails or returns no results, the
|
||||
upstream fallback order is DuckDuckGo, then Bing. Bing uses Chromium through
|
||||
Puppeteer, so `bin/context-kit doctor` checks that the configured Chromium path
|
||||
exists inside the web-search image.
|
||||
|
||||
Context Kit carries a source-controlled Bing provider override in
|
||||
`docker/web-search/overrides/bing.js` because the upstream 1.3.0 provider can
|
||||
race result rendering and return no items even when Chromium sees Bing result
|
||||
cards. The override waits for result cards and decodes current Bing redirect
|
||||
URLs before handing results back to the upstream fallback registry.
|
||||
|
||||
`fetch_url` is different: in upstream `mcp-web-search` 1.3.0, `engine=browser` is
|
||||
accepted but reserved for future support. It does not currently invoke Chromium;
|
||||
URL fetching uses the HTTP extractor path.
|
||||
|
||||
## Docs Indexing Is Slow
|
||||
|
||||
The first run downloads an embedding model and embeds every configured docs
|
||||
|
||||
Reference in New Issue
Block a user