Receiving OpenRouter Traces in Grafana Tempo
I already push metrics from Cloudflare Workers to Prometheus via OTLP. That gives me dashboards for request rates, latency, and LLM costs. But metrics only tell you aggregates - when something looks off, you want the full request context. That’s what traces are for.
OpenRouter has a broadcast feature that sends OTLP traces for every API request you make through their platform. You give it an endpoint URL and optional headers, and it POSTs resourceSpans payloads in standard OTLP JSON format. No custom receiver needed - any OTLP-compatible backend works.
I went with Grafana Tempo since it integrates with my existing Grafana instance and speaks OTLP natively.
What OpenRouter Sends
Each LLM request produces a trace with a single span containing attributes that follow the OpenTelemetry GenAI semantic conventions:
gen_ai.request.model/gen_ai.response.model- the model you asked for and the one that respondedgen_ai.usage.input_tokens,gen_ai.usage.output_tokens,gen_ai.usage.total_tokensgen_ai.usage.input_cost,gen_ai.usage.output_cost,gen_ai.usage.total_cost- in USDgen_ai.prompt/gen_ai.completion- the actual request and response contentgen_ai.provider.name- which provider fulfilled the request (Google AI Studio, Anthropic, etc.)gen_ai.response.finish_reason- stop, length, etc.
There’s also OpenRouter-specific metadata under trace.metadata.openrouter.*:
api_key_name- which API key was usedprovider_slug- provider identifierinput_unit_price/output_unit_price- per-token pricing
The span duration gives you end-to-end latency including time-to-first-token and streaming.
Setting Up Tempo
NixOS has a services.tempo module. The config is a Nix attrset that maps directly to Tempo’s YAML config:
services.tempo = {
enable = true;
settings = {
server = {
http_listen_address = "127.0.0.1";
http_listen_port = 3200;
grpc_listen_address = "127.0.0.1";
grpc_listen_port = 9097;
};
distributor.receivers = {
otlp.protocols.http = {
endpoint = "127.0.0.1:4318";
};
};
storage.trace = {
backend = "local";
wal.path = "/var/lib/tempo/wal";
local.path = "/var/lib/tempo/blocks";
};
compactor.compaction.block_retention = "720h"; # 30 days
};
};
This gives you an OTLP HTTP receiver on port 4318 and the Tempo API on port 3200. Both bind to localhost only.
Generating Prometheus Metrics from Traces
Tempo can extract metrics from incoming spans and push them to Prometheus via remote write. This is useful because PromQL is much better for dashboards than TraceQL - you get rate(), histogram_quantile(), and all the usual aggregation functions.
metrics_generator = {
storage = {
path = "/var/lib/tempo/generator/wal";
remote_write = [
{ url = "http://127.0.0.1:9090/api/v1/write"; }
];
};
processor.span_metrics = {
dimensions = [
"gen_ai.request.model"
"gen_ai.provider.name"
"trace.metadata.openrouter.api_key_name"
"trace.metadata.openrouter.provider_slug"
];
};
};
overrides.defaults.metrics_generator.processors = [
"span-metrics"
];
The dimensions list controls which span attributes become Prometheus labels. With this config, Tempo produces metrics like:
traces_spanmetrics_calls_total{gen_ai_request_model="google/gemini-3-flash-preview", ...}- request counttraces_spanmetrics_latency_bucket{...}- latency histogramtraces_spanmetrics_size_total{...}- span size
Prometheus needs --web.enable-remote-write-receiver for this to work. I already had OTLP receiver enabled from the Workers metrics setup, so it was one extra flag:
--web.enable-remote-write-receiver
Exposing the Endpoint
Same pattern as my OTLP metrics endpoint - Caddy reverse proxy with bearer token auth:
@otlp_traces host otlp-traces.example.com
handle @otlp_traces {
@no_token not header Authorization "Bearer {env.OTLP_TRACES_AUTH_TOKEN}"
respond @no_token "" 401
reverse_proxy 127.0.0.1:4318
}
No path rewrite needed here - OpenRouter POSTs directly to /v1/traces, which is the standard OTLP HTTP traces path that Tempo expects.
The DNS record points to the same DMZ entry point as the metrics endpoint. The token is stored in sops and injected into Caddy’s environment.
Configuring OpenRouter
In OpenRouter’s settings under Observability, enable broadcast and configure the OTel Collector:
- Endpoint:
https://otlp-traces.example.com - Headers:
{"Authorization": "Bearer <your-token>"}
Hit “Test Connection” - you should see a trace with rootTraceName: openrouter-connection-test appear in Tempo. After that, every API request through OpenRouter produces a trace automatically.
Building a Dashboard
With span metrics flowing into Prometheus, building a Grafana dashboard is straightforward. I added template variables for model and API key filtering, then built panels for:
Overview stats: total requests, request rate, p50/p99 latency - all from traces_spanmetrics_calls_total and traces_spanmetrics_latency_bucket.
Usage over time: stacked time series of requests by model and by provider. This shows which models are getting the most traffic and when.
sum by (gen_ai_request_model) (
rate(traces_spanmetrics_calls_total{
service="openrouter",
span_name="LLM Generation"
}[$__rate_interval])
)
Breakdowns: pie charts for model, provider, and API key distribution over the selected time range.
Recent traces: a Tempo search table showing the last 20 LLM requests with clickable trace IDs. Clicking through shows the full span with all attributes - including the actual prompt and completion text if you haven’t enabled privacy mode in OpenRouter.
Cost Tracking
The cost data is in the trace attributes (gen_ai.usage.total_cost) but not in the span metrics - Tempo’s metrics generator only produces call counts, latency histograms, and size counters. It doesn’t extract arbitrary numeric attributes as metric values.
For cost aggregation, there are a few options:
- TraceQL metrics in Grafana - query Tempo directly for cost aggregation. Works but is slower than PromQL.
- Recording rules - periodically query Tempo and write results to Prometheus. More complex but gives you PromQL access.
- Application-side metrics - push cost as a Prometheus counter from the application itself, like I do for my Workers LLM instrumentation.
I already have per-request cost counters from the Workers side, so the trace data serves as a cross-check and gives me the full request/response context when investigating anomalies.
The Stack So Far
| Pillar | Source | Pipeline | Backend |
|---|---|---|---|
| Metrics | Cloudflare Workers | OTLP push | Prometheus |
| Traces | OpenRouter | OTLP broadcast | Tempo |
| Logs | systemd journal, network devices | Promtail, syslog | Loki |
All three feed into the same Grafana instance. Prometheus alerts on metric thresholds, Loki alerts on log patterns, and traces give the full context when something fires. The Tempo metrics generator bridges traces back into Prometheus, so I can use PromQL for trace-derived dashboards without learning a new query language.