← dannwaneri.com

MCP Server on Cloudflare Workers

I build MCP (Model Context Protocol) servers on Cloudflare Workers — deployed on global servers near users, stateless, no infrastructure to manage. The live example is vectorize-mcp-worker: hybrid BM25 (keyword ranking) + vector search, cross-encoder reranking, and a Gemma 4 reflection layer, all running inside a single Worker with 100,302 indexed documents and a $5/month total cost.

Key Takeaways
  • MCP (Model Context Protocol) servers handle tool-use requests from AI agents like Claude — Cloudflare Workers is one of the best platforms for hosting them
  • BM25 keyword search + vector search combined gives better recall than either method alone
  • Gemma 4 MoE (Mixture of Experts) runs on Workers AI — no external API call, no data leaving the edge
  • D1 (Cloudflare's relational SQL database) and KV (Key-Value storage) store structured data alongside the vector index in the same Worker
  • Live production example with 100k+ documents runs on Cloudflare's $5/month paid plan
100k+ documents in production index
$5/mo total running cost
1 Worker — no external hops
Edge deployed globally via Cloudflare

MCP servers on Workers

Stateless MCP (Model Context Protocol) servers deployed to Cloudflare Workers. No server to provision, no initial wake-up delays beyond Workers' own, globally distributed by default. Binds directly to Vectorize, D1 (SQL database), KV (Key-Value storage), and Workers AI — no extra API hops.

Hybrid vector search

BM25 (a keyword ranking function) combined with dense vector retrieval via Cloudflare Vectorize. Cross-encoder reranking re-scores the fused results. Better recall than pure vector search, especially on short or ambiguous queries.

Knowledge reflection

After retrieval, a synthesis layer (Gemma 4 MoE via Workers AI) reads related chunks and generates a 3-sentence cross-document insight — stored back in the index. Connections across documents you didn't consciously notice surface on the next search.

Daily cron ingestion

Cloudflare Cron Triggers run the ingestion pipeline on a schedule. New content is embedded, chunked, and indexed automatically. No manual re-indexing.

D1 for structured data

Benchmark results, query logs, and structured metadata live in D1 alongside the vector index. A single Worker reads both — no cross-service latency for mixed queries.

Zero external dependencies

The entire pipeline — embed, retrieve, rerank, reflect — runs inside one Worker using native Cloudflare bindings. No OpenAI API calls. No external database. No data leaving the edge.

Two MCP servers in production — not demos.

edge-context-mode — context API for Claude Code sessions

An MCP server that prevents context window overflow by sandboxing tool execution and storing only concise summaries in the active context. Full outputs are persisted in D1 (SQLite) and retrieved on demand via ctx_get. Hybrid BM25 + semantic search via Vectorize. Seven tools: ctx_execute, ctx_annotate, ctx_search, ctx_history, ctx_reflect, ctx_get, ctx_purge. Runs locally via stdio or deployed as a Cloudflare Worker. Used in production on every Claude Code session that touches naija-vpn.com or dannwaneri.com.

github.com/dannwaneri/edge-context-mode →
vectorize-mcp-worker — 100,302 documents, $5/month

The server indexes 45,053 tweets (11,835 bookmarks + 33,218 likes), enriched with Llama 4 Scout vision descriptions for 7,155 photo tweets. Daily cron syncs new content automatically. Total infrastructure cost: $5/month on Cloudflare's paid plan. No managed database. No separate embedding service.

Gemma 4 migration under a deprecation deadline

When Cloudflare deprecated the previous reflection model with 22 days notice, the migration was a one-line env var change. The architecture was designed for model swaps — REFLECTION_MODEL is a runtime config, not a hardcoded string. Benchmark data comparing both models is in the repo. Read the full migration writeup →

Benchmark endpoint

POST /benchmark runs both models in parallel against the same query, logs latency and response quality to D1, and returns side-by-side results. Built during the Gemma 4 migration to justify the model choice with real data instead of assumptions.

Cloudflare Workers Cloudflare Vectorize Workers AI (Gemma 4 MoE) D1 Cron Triggers BGE Small (embeddings) BGE Cross-Encoder (reranking) TypeScript MCP Protocol

Is there a Cloudflare MCP server?

Yes. MCP (Model Context Protocol) defines how AI agents like Claude connect to external tools and data. Cloudflare Workers are a natural fit — stateless by design, globally distributed, and with native bindings to Vectorize (vector search), D1 (SQL), KV (key-value), and Workers AI (inference). vectorize-mcp-worker is a live open-source example: hybrid BM25 + vector search, cross-encoder reranking, and a Gemma 4 reflection layer, all running inside a single Worker. The repo is at github.com/dannwaneri/vectorize-mcp-worker.

Can I host a MCP server?

Yes — and Cloudflare Workers is one of the best options for it. Workers are stateless HTTP endpoints with V8 isolates, which maps cleanly to MCP's request/response model. You get global edge deployment, native AI and database bindings, and no servers to manage. The free tier covers most development workloads; production use at scale fits comfortably on Cloudflare's $5/month paid plan.

How to install Cloudflare MCP?

Deploy via Wrangler: npm install -g wrangler, authenticate with wrangler login, configure your wrangler.toml with the required bindings (Vectorize, D1, Workers AI depending on your use case), then wrangler deploy. The vectorize-mcp-worker repo includes a complete wrangler.toml and setup instructions for all bindings.

Are there any remote MCP servers?

Yes — MCP servers don't need to run locally. A Cloudflare Worker is a remote MCP server: it runs at Cloudflare's edge, responds to HTTP requests, and can be connected to any MCP-compatible client over the network. This is the architecture vectorize-mcp-worker uses — the server runs remotely on Cloudflare, the client (bookmark-cli) calls it from anywhere.

Need an MCP server built on Cloudflare?

I can design and deploy the full pipeline — vector ingestion, hybrid search, reranking, reflection layer, and cron-based updates — or integrate an MCP server into an existing Cloudflare Workers setup.

Hire Me See the Live Project

Related services

Custom AI Agents → Cloudflare Automation →