Pushing Metrics from Cloudflare Workers to Prometheus via OTLP
Cloudflare Workers have built-in observability - you can enable tracing in wrangler.jsonc and even forward trace events via Logpush. But that gives you logs and traces, not Prometheus metrics. You can’t set up alerts on request error rates, track p99 latency over time, or monitor LLM costs in Grafana without getting the data into your own metrics stack.
The problem: Workers are stateless. Prometheus can’t scrape them because there’s no persistent process holding counters between requests. Each invocation starts fresh and dies after responding.
Prometheus 3.x has a built-in OTLP receiver. Workers have fetch() and waitUntil(). So: push metrics in OTLP format at the end of each request, after the response is already sent.
Why Not Pushgateway or a Durable Object?
I considered a few approaches before landing on direct OTLP push:
- Durable Object as aggregator - Workers push to a DO that accumulates counters, Prometheus scrapes it. Works, but adds cost per DO request and a coordination layer.
- Pushgateway - Designed for batch jobs, not per-request metrics. Also needs to be internet-reachable.
- Direct OTLP push - Workers POST metrics straight to Prometheus. No middleman, no extra infrastructure.
I went with the third option - it’s the simplest. One fetch() call per invocation, no extra infrastructure.
Enabling OTLP on Prometheus
Two flags are needed:
--web.enable-otlp-receiver
--enable-feature=otlp-deltatocumulative
The first exposes POST /api/v1/otlp/v1/metrics on the Prometheus port. The second is critical - without it, Prometheus rejects delta temporality metrics with “invalid temporality and type combination”. Since Workers send each request’s contribution as a delta (they can’t maintain running totals), Prometheus needs to convert deltas to cumulative counters server-side.
Exposing the Endpoint
Prometheus binds to localhost only. I put a reverse proxy in front with bearer token auth and a path rewrite so Workers can POST to https://otlp.example.com/ instead of the internal /api/v1/otlp/v1/metrics path.
In Caddy, this looks like:
@otlp host otlp.example.com
handle @otlp {
@no_token not header Authorization "Bearer {env.OTLP_AUTH_TOKEN}"
respond @no_token "" 401
rewrite * /api/v1/otlp/v1/metrics
reverse_proxy 127.0.0.1:9090
}
The token is shared between the proxy config and each Worker’s secrets.
The OTLP Client
The existing OpenTelemetry ecosystem doesn’t fit Workers well. @opentelemetry/exporter-metrics-otlp-http uses Node’s http module. @microlabs/otel-cf-workers does tracing, not metrics. The OTel SDK’s PeriodicExportingMetricReader relies on setInterval which doesn’t exist in Workers.
So I wrote a minimal client: @else42/cf-worker-otel. About 200 lines, zero dependencies. It constructs the OTLP JSON payload directly and POSTs via fetch().
import { createMetrics } from "@else42/cf-worker-otel";
export default {
async fetch(request, env, ctx) {
const metrics = createMetrics({
serviceName: "my-worker",
endpoint: env.OTLP_ENDPOINT,
token: env.OTLP_AUTH_TOKEN,
});
const start = Date.now();
const response = await handleRequest(request);
metrics.counter("http_requests_total", 1, {
method: request.method,
status: String(response.status),
});
metrics.histogram("http_request_duration_ms", Date.now() - start);
ctx.waitUntil(metrics.flush());
return response;
},
};
Each createMetrics() call creates an isolated collector. You record counters, gauges, and histograms during request handling, then flush() serializes everything into a single OTLP JSON payload and POSTs it. The waitUntil() ensures the POST happens after the response is sent - zero latency impact on the user.
If the endpoint or token isn’t configured, flush() silently does nothing. Safe to instrument unconditionally.
Delta Temporality
Traditional Prometheus metrics are cumulative - a counter goes up and never resets. OTLP supports both cumulative and delta temporality. Delta means “here’s what changed since last time.” For Workers, delta is the natural fit because each invocation only knows about its own request.
When Prometheus receives delta counters, the otlp-deltatocumulative feature converts them. It tracks the running total per time series and applies each incoming delta. The result is a normal cumulative counter that rate() and increase() work on as usual.
One subtlety: Prometheus needs at least two delta pushes before a time series appears in queries. The first push establishes the baseline, the second produces a data point.
Avoiding Cardinality Explosions
A common trap with request metrics is including the raw URL path as a label. Paths like /recipe/crispy-gochujang-chicken or /user/12345 create unbounded cardinality - every unique path becomes a separate time series, and Prometheus performance degrades fast.
For SvelteKit apps on Cloudflare, event.route.id gives you the route pattern (/recipe/[slug]) instead of the actual path. This is a finite set defined by your source tree, making it safe as a Prometheus label.
metrics.counter("http_requests_total", 1, {
method: event.request.method,
status: String(response.status),
route: event.route.id ?? "unknown",
});
For plain Workers without a framework, map paths to a fixed set of route categories yourself.
SvelteKit Integration
SvelteKit on Cloudflare Workers has a clean hook system. Using sequence(), a metrics hook wraps the entire request lifecycle including middleware:
import { createMetrics } from "@else42/cf-worker-otel";
import { sequence } from "@sveltejs/kit/hooks";
const metricsHandle = async ({ event, resolve }) => {
const env = event.platform?.env;
const metrics = createMetrics({
serviceName: "my-app",
endpoint: env?.OTLP_ENDPOINT,
token: env?.OTLP_AUTH_TOKEN,
});
const start = Date.now();
let status = "500";
try {
const response = await resolve(event);
status = String(response.status);
return response;
} finally {
metrics.counter("http_requests_total", 1, {
method: event.request.method,
status,
route: event.route.id ?? "unknown",
});
metrics.histogram("http_request_duration_ms", Date.now() - start, {
route: event.route.id ?? "unknown",
});
event.platform?.context?.waitUntil(metrics.flush());
}
};
export const handle = sequence(metricsHandle, appHandle);
The try/finally ensures metrics are recorded even if resolve() throws. The response status defaults to 500 and gets overwritten on success.
Instrumenting Durable Objects
Durable Objects are a different beast - they’re stateful and long-lived. The interesting metrics come from their alarm handlers (periodic tasks). In my case, one DO polls the Hetzner API every 60 seconds, another imports auction data every 5 minutes.
async alarm() {
const metrics = createMetrics({
serviceName: "my-worker",
endpoint: this.env.OTLP_ENDPOINT,
token: this.env.OTLP_AUTH_TOKEN,
});
const start = Date.now();
let failed = false;
try {
await this.doWork();
} catch (error) {
failed = true;
} finally {
metrics.counter("do_alarm_total", 1, { do_class: "MyDO" });
metrics.histogram("do_alarm_duration_ms", Date.now() - start, {
do_class: "MyDO",
});
if (failed) {
metrics.counter("do_alarm_errors_total", 1, { do_class: "MyDO" });
}
this.ctx.waitUntil(metrics.flush());
await this.ctx.storage.setAlarm(Date.now() + this.intervalMs);
}
}
Deeper Instrumentation: LLM Costs
For one project, I also track LLM API calls - duration, token usage, and cost per model and pipeline stage. The OpenRouter API returns usage data in the response body (usage.prompt_tokens, usage.completion_tokens) and cost in the x-openrouter-cost header.
metrics.histogram("llm_duration_ms", result.elapsedMs, {
model: result.model,
stage: "recipe_generation",
});
metrics.counter("llm_tokens_total", result.usage.promptTokens, {
model: result.model,
direction: "input",
stage: "recipe_generation",
});
metrics.counter("llm_cost_dollars", result.usage.cost, {
model: result.model,
stage: "recipe_generation",
});
This gives me Grafana dashboards showing cost per model over time, which stages are most expensive, and where latency spikes happen.
What You End Up With
After setting this up across a few Workers, I have Grafana dashboards showing:
- Request rate and error rate by route
- Latency percentiles (p50, p95, p99) over time
- Average latency per route
- DO alarm frequency and duration
- LLM cost accumulation per model
- Token usage breakdown (input vs output)
- Image generation success rates
All of this feeds into my homelab Prometheus instance, where it sits alongside the usual node exporters, network metrics, and service monitoring.
Two caveats worth knowing about:
Each flush() is a subrequest that counts against Cloudflare’s metered request billing and subrequest limits. For low-traffic Workers the cost is negligible, but it effectively doubles the billable request count.
It also means every Worker invocation makes an outbound fetch() to push metrics. A traffic spike means a proportional spike of OTLP pushes hitting my home server. For my current traffic this is fine, but for a high-traffic Worker, pre-aggregating in a Durable Object would make more sense - the DO batches metrics from many requests and flushes on a timer instead of per-invocation.
The Library
@else42/cf-worker-otel is MIT-licensed and on npm. It supports counters, gauges, and histograms with delta temporality. The entire thing is about 200 lines with zero runtime dependencies.