Receiving OpenRouter Traces in Grafana Tempo

Published April 9, 2026

openrouter opentelemetry grafana monitoring nixos homelab

I already push metrics from Cloudflare Workers to Prometheus via OTLP. That gives me dashboards for request rates, latency, and LLM costs. But metrics only tell you aggregates - when something looks off, you want the full request context. That’s what traces are for.

OpenRouter has a broadcast feature that sends OTLP traces for every API request you make through their platform. You give it an endpoint URL and optional headers, and it POSTs resourceSpans payloads in standard OTLP JSON format. No custom receiver needed - any OTLP-compatible backend works.

I went with Grafana Tempo since it integrates with my existing Grafana instance and speaks OTLP natively.

What OpenRouter Sends

Each LLM request produces a trace with a single span containing attributes that follow the OpenTelemetry GenAI semantic conventions:

gen_ai.request.model / gen_ai.response.model - the model you asked for and the one that responded
gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.total_tokens
gen_ai.usage.input_cost, gen_ai.usage.output_cost, gen_ai.usage.total_cost - in USD
gen_ai.prompt / gen_ai.completion - the actual request and response content
gen_ai.provider.name - which provider fulfilled the request (Google AI Studio, Anthropic, etc.)
gen_ai.response.finish_reason - stop, length, etc.

There’s also OpenRouter-specific metadata under trace.metadata.openrouter.*:

api_key_name - which API key was used
provider_slug - provider identifier
input_unit_price / output_unit_price - per-token pricing

The span duration gives you end-to-end latency including time-to-first-token and streaming.

Setting Up Tempo

NixOS has a services.tempo module. The config is a Nix attrset that maps directly to Tempo’s YAML config:

services.tempo = {
  enable = true;
  settings = {
    server = {
      http_listen_address = "127.0.0.1";
      http_listen_port = 3200;
      grpc_listen_address = "127.0.0.1";
      grpc_listen_port = 9097;
    };

    distributor.receivers = {
      otlp.protocols.http = {
        endpoint = "127.0.0.1:4318";
      };
    };

    storage.trace = {
      backend = "local";
      wal.path = "/var/lib/tempo/wal";
      local.path = "/var/lib/tempo/blocks";
    };

    compactor.compaction.block_retention = "720h"; # 30 days
  };
};

This gives you an OTLP HTTP receiver on port 4318 and the Tempo API on port 3200. Both bind to localhost only.

Generating Prometheus Metrics from Traces

Tempo can extract metrics from incoming spans and push them to Prometheus via remote write. This is useful because PromQL is much better for dashboards than TraceQL - you get rate(), histogram_quantile(), and all the usual aggregation functions.

metrics_generator = {
  storage = {
    path = "/var/lib/tempo/generator/wal";
    remote_write = [
      { url = "http://127.0.0.1:9090/api/v1/write"; }
    ];
  };
  processor.span_metrics = {
    dimensions = [
      "gen_ai.request.model"
      "gen_ai.provider.name"
      "trace.metadata.openrouter.api_key_name"
      "trace.metadata.openrouter.provider_slug"
    ];
  };
};

overrides.defaults.metrics_generator.processors = [
  "span-metrics"
];

The dimensions list controls which span attributes become Prometheus labels. With this config, Tempo produces metrics like:

traces_spanmetrics_calls_total{gen_ai_request_model="google/gemini-3-flash-preview", ...} - request count
traces_spanmetrics_latency_bucket{...} - latency histogram
traces_spanmetrics_size_total{...} - span size

Prometheus needs --web.enable-remote-write-receiver for this to work. I already had OTLP receiver enabled from the Workers metrics setup, so it was one extra flag:

--web.enable-remote-write-receiver

Exposing the Endpoint

Same pattern as my OTLP metrics endpoint - Caddy reverse proxy with bearer token auth:

@otlp_traces host otlp-traces.example.com
handle @otlp_traces {
    @no_token not header Authorization "Bearer {env.OTLP_TRACES_AUTH_TOKEN}"
    respond @no_token "" 401

    reverse_proxy 127.0.0.1:4318
}

No path rewrite needed here - OpenRouter POSTs directly to /v1/traces, which is the standard OTLP HTTP traces path that Tempo expects.

The DNS record points to the same DMZ entry point as the metrics endpoint. The token is stored in sops and injected into Caddy’s environment.

Configuring OpenRouter

In OpenRouter’s settings under Observability, enable broadcast and configure the OTel Collector:

Endpoint: https://otlp-traces.example.com
Headers: {"Authorization": "Bearer <your-token>"}

Hit “Test Connection” - you should see a trace with rootTraceName: openrouter-connection-test appear in Tempo. After that, every API request through OpenRouter produces a trace automatically.

Building a Dashboard

With span metrics flowing into Prometheus, building a Grafana dashboard is straightforward. I added template variables for model and API key filtering, then built panels for:

Overview stats: total requests, request rate, p50/p99 latency - all from traces_spanmetrics_calls_total and traces_spanmetrics_latency_bucket.

Usage over time: stacked time series of requests by model and by provider. This shows which models are getting the most traffic and when.

sum by (gen_ai_request_model) (
  rate(traces_spanmetrics_calls_total{
    service="openrouter",
    span_name="LLM Generation"
  }[$__rate_interval])
)

Breakdowns: pie charts for model, provider, and API key distribution over the selected time range.

Recent traces: a Tempo search table showing the last 20 LLM requests with clickable trace IDs. Clicking through shows the full span with all attributes - including the actual prompt and completion text if you haven’t enabled privacy mode in OpenRouter.

Cost Tracking

The cost data is in the trace attributes (gen_ai.usage.total_cost) but not in the span metrics - Tempo’s metrics generator only produces call counts, latency histograms, and size counters. It doesn’t extract arbitrary numeric attributes as metric values.

For cost aggregation, there are a few options:

TraceQL metrics in Grafana - query Tempo directly for cost aggregation. Works but is slower than PromQL.
Recording rules - periodically query Tempo and write results to Prometheus. More complex but gives you PromQL access.
Application-side metrics - push cost as a Prometheus counter from the application itself, like I do for my Workers LLM instrumentation.

I already have per-request cost counters from the Workers side, so the trace data serves as a cross-check and gives me the full request/response context when investigating anomalies.

The Stack So Far

Pillar	Source	Pipeline	Backend
Metrics	Cloudflare Workers	OTLP push	Prometheus
Traces	OpenRouter	OTLP broadcast	Tempo
Logs	systemd journal, network devices	Promtail, syslog	Loki

All three feed into the same Grafana instance. Prometheus alerts on metric thresholds, Loki alerts on log patterns, and traces give the full context when something fires. The Tempo metrics generator bridges traces back into Prometheus, so I can use PromQL for trace-derived dashboards without learning a new query language.