Observability¶
Why read this: request IDs for support tickets, optional one-line-per-request gateway logs for HTTP debugging, readiness (/api/readyz) behavior, JSON logging for aggregators, SLO-oriented alerting examples, and a recipe for OpenTelemetry later.
Request IDs¶
The gateway and optional FastAPI access logging use fluxlit.logging.context.REQUEST_ID_HEADER (X-Request-ID). The ID is stored in a contextvars.ContextVar for the duration of each request; see fluxlit.logging.
The gateway replaces X-Request-ID on the Streamlit upstream (HTTP and WebSocket) with that same resolved value so sidecar access logs can join to gateway and API lines.
Correlation path¶
flowchart LR
browser[Browser]
gw[FluxLit gateway]
api[FastAPI /api]
st[Streamlit sidecar]
browser -->|"X-Request-ID (optional)"| gw
gw -->|"X-Request-ID (authoritative)"| st
gw --> api
For Streamlit → /api calls from Python, correlation is separate processes unless you pass a header explicitly (advanced: propagate from browser context when your Streamlit version exposes it).
Structured gateway logs¶
When enable_gateway_access_log is True, the gateway emits one INFO log per request with extra fields:
fluxlit_dispatch—apiorstreamlithttp_method_or_type— HTTP method orwebsocketpath— ASGI path seen by the gatewayquery— raw query string from the ASGI scope with sensitive keys redacted (defaultfluxlit_sid, plusurl_session_query_paramwhen set); seefluxlit.logging.redact, URL session continuity (no cookies), and URL sessions, query tokens, and email links (security)
With the default (False), the same line is logged at DEBUG only.
When debug mode is on (FLUXLIT_DEBUG=1 or fluxlit dev / run / workbench --debug), gateway access logging and API request logging are enabled by default, the default log level moves to debug if it was still info, and the gateway emits an extra DEBUG line per request with the split between API path and Streamlit path. A redacted JSON snapshot is available at GET /__fluxlit/debug (disabled if that path would collide with api_mount_path); see Configuration, the Debug mode section in Troubleshooting, and how that interacts with query tokens in URL sessions, query tokens, and email links (security).
If you enable gateway INFO logs in production, combine them with your normal log pipeline (filters, aggregators) and scrub or avoid echoing sensitive headers. For copying header dicts into logs or debug output, use fluxlit.logging.redact. Broader secrets and rotation guidance: Secrets lifecycle.
Gateway log schema¶
The gateway access-log field catalog is exported as
fluxlit.logging.GATEWAY_ACCESS_LOG_FIELDS. Treat these fields as stable for
dashboards and alert routing:
Field |
Stability |
Notes |
|---|---|---|
|
Stable |
Present in the formatted message and JSON logs when emitted as |
|
Stable |
|
|
Stable |
HTTP method or |
|
Stable |
Raw ASGI path; consider route normalization in your own log pipeline if paths include user IDs. |
|
Stable, redacted |
Sensitive query values are redacted, including URL-session keys. |
Internal / debug-only (not part of the stable access-log contract): when debug mode is on, the gateway may emit additional DEBUG lines (for example path-split hints and dispatch diagnostics). Do not build alerts or SLO dashboards on those lines; they can change without a semver note in patch releases. The DEBUG line emitted when a Prometheus histogram observe fails is similarly diagnostic-only (see Metrics contract below).
JSON log lines (Loki / Datadog-style)¶
Use JsonLogFormatter so each log record is a single JSON object with at least time, level, logger, message, plus any attributes from logger.info(..., extra={...}) (for example request_id, fluxlit_dispatch, path from gateway access logs).
Suggested field conventions for log stacks:
Field |
Role |
|---|---|
|
Event timestamp (formatter output). |
|
Filter and split by component. |
|
Human-readable line. |
|
Join gateway, API, and upstream Streamlit lines when present. |
|
|
The base JSON field contract is exported as
fluxlit.logging.JSON_LOG_BASE_FIELDS.
Attach the formatter to Uvicorn and your app loggers via logging.dictConfig (often from a small Python file referenced by LOGGING_CONFIG or equivalent in your process manager):
LOGGING = {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"json": {
"()": "fluxlit.logging.json_formatter.JsonLogFormatter",
},
},
"handlers": {
"default": {
"class": "logging.StreamHandler",
"formatter": "json",
},
},
"loggers": {
"uvicorn": {"handlers": ["default"], "level": "INFO", "propagate": False},
"uvicorn.error": {"handlers": ["default"], "level": "INFO", "propagate": False},
"uvicorn.access": {"handlers": ["default"], "level": "INFO", "propagate": False},
"fluxlit": {"handlers": ["default"], "level": "INFO", "propagate": False},
"fluxlit.gateway": {"handlers": ["default"], "level": "INFO", "propagate": False},
},
"root": {"handlers": ["default"], "level": "WARNING"},
}
Then logging.config.dictConfig(LOGGING) during startup, or point Uvicorn at a module path that does the same. Tune levels per environment; avoid duplicating access logs if both Uvicorn access and fluxlit.gateway INFO lines are too chatty.
SLOs & alerting¶
FluxLit does not ship Prometheus rules or managed alerts; operators should define SLOs on the same probes documented in Deployment.
Liveness (GET /api/healthz) — the inner API process responds; use for restart if wedged style checks. Example SLO: 99.9% of probes succeed over 30 days; alert on sustained probe failure (Pod restarts) rather than single blips.
Readiness (GET /api/readyz) — when Streamlit upstream is configured, 503 means the sidecar is not accepting traffic the way the gateway expects. Example SLO: readyz success rate (or error budget when expressed as allowed 503 minutes per month). Burn alerts when the 5xx rate on readyz rises over a short window (e.g. 5m) while healthz stays green — that pattern isolates Streamlit/upstream issues.
Example Kubernetes probes (paths depend on api_mount_path, default /api):
livenessProbe:
httpGet:
path: /api/healthz
port: http
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/readyz
port: http
periodSeconds: 5
Example Prometheus alert sketch (adjust labels and job to your setup): fire when rate(http_requests_total{path="/api/readyz",status="503"}[5m]) is above a small threshold while healthz success stays high — indicating Streamlit or upstream misconfiguration rather than total process death.
Gateway Prometheus metrics (RED)¶
When FLUXLIT_ENABLE_GATEWAY_PROMETHEUS_METRICS=1 and prometheus-client is installed (pip install "fluxlit[metrics]" or include in your image), the gateway exposes GET on FLUXLIT_GATEWAY_PROMETHEUS_METRICS_PATH (default /__fluxlit/metrics) in Prometheus text format.
fluxlit_gateway_requests_total— labelsdispatch(apivsstreamlit) andmethod_kind(HTTP method orWEBSOCKET).fluxlit_gateway_request_duration_seconds— histogram bydispatch(wall time for one gateway request; scrape path responses are excluded).
If the histogram observe step raises (for example a client or label mismatch), the gateway logs a DEBUG line on the fluxlit.gateway logger with the dispatch label and exception context, then continues serving the request (metrics for that hop may be missing).
The path must not be under your api_mount_path or it will shadow API routes (the runtime logs a warning and disables metrics). Secure the endpoint at your ingress (allow only Prometheus scrapers) or keep metrics disabled in untrusted networks.
USE-style saturation (CPU, memory, file descriptors) is not emitted by FluxLit core; scrape the node or cAdvisor / kube-state-metrics alongside these application counters.
Metrics contract¶
The current metric catalog is exported as
fluxlit.gateway.metrics.GATEWAY_PROMETHEUS_METRICS (each entry includes a
machine-readable stability field mirrored here):
Metric |
Type |
Labels |
Stability |
|---|---|---|---|
|
Counter |
|
Stable for 0.x dashboards. |
|
Histogram |
|
Stable name and labels; see bucket policy below. |
Label semantics (low cardinality by design):
dispatch:apiorstreamlit— seefluxlit.gateway.dispatchfor when each applies.method_kind: upper-case HTTP method (for exampleGET) orWEBSOCKETfor WebSocket upgrades.
Histogram buckets: changing default bucket boundaries in FluxLit is treated as a semver-visible change (minor or major), because it affects histogram compatibility in Prometheus and Grafana heatmaps. Adding new buckets without removing old ones is less disruptive but still note it in the changelog.
FluxLit intentionally does not label by raw path, status code, exception type, user, tenant, or query string in core metrics. Add those in your own app metrics only after you have a cardinality budget.
Python logging filters¶
Use a logging.Filter to drop noisy loggers or scrub fields before logs reach stdout or a log aggregator (in addition to fluxlit.logging.redact for header maps).
OpenTelemetry tracing hook recipe¶
FluxLit does not bundle OpenTelemetry. Keep OTel dependencies in your app image
and bridge the no-dependency FluxLit hook into your tracer. A runnable example is
available in examples/otel_tracing/:
python -m pip install -e .
python -m pip install -r examples/otel_tracing/requirements.txt
fluxlit run examples.otel_tracing.app:app --no-pidfile
Minimal hook shape:
from contextlib import contextmanager
from fluxlit import set_trace_hook
@contextmanager
def otel_trace_hook(name, attributes):
with tracer.start_as_current_span(name) as span:
for key, value in attributes.items():
if value is not None:
span.set_attribute(key, value)
yield
set_trace_hook(otel_trace_hook)
Gateway dispatch wraps each request in a span named fluxlit.gateway.request with
low-cardinality attributes including fluxlit.dispatch, http.method_or_type, url.path, and
request_id. The gateway → Streamlit HTTP hop adds fluxlit.gateway.upstream_http
with http.request.method, url.full, and fluxlit.request_id (experimental attribute
names; treat as diagnostic until 1.0 charter tightens).
Because the browser hits a single port, ingress spans should
label whether work happened on the API (/api/...) or the Streamlit proxy path.
For a fuller deployment, also consider:
FastAPI: use
opentelemetry-instrumentation-fastapionapp.api.Outbound HTTP: instrument
httpxif you propagate traces to upstreams beyond FluxLit’s hop.Streamlit subprocess: treat it as its own service unless you add custom propagation.
Trace context (W3C traceparent)¶
The gateway already forwards X-Request-ID to Streamlit on proxied HTTP and WebSockets.
traceparent / tracestate: when present on the incoming browser request, they pass
through :func:fluxlit.gateway.header_filter.filter_request_headers and are copied onto the
gateway → Streamlit HTTP hop (same as other non–hop-by-hop client headers). FluxLit does
not synthesize W3C trace IDs from request_id; inject them at your edge or in middleware
if you need OTel interoperability. For OpenTelemetry, map the FluxLit hook spans above
onto your tracer and align IDs with headers your mesh already emits.
Correlation limits (gateway vs Streamlit)¶
Gateway-centered IDs:
X-Request-IDis set in the gateway ASGI process and forwarded to the Streamlit upstream. Gateway access logs and nginx can align on that header.Streamlit subprocess: Streamlit page code runs in the child process. The parent’s
contextvars.ContextVarused forget_request_id()in the gateway is not automatically visible inside arbitrary Streamlit callbacks. For server-sidehttpxcalls from Streamlit to/api, useApiClientwithpropagate_request_id=Trueonly when your code has set a correlation id in that process (or pass headers explicitly). Do not assume the browser’s request id appears in Streamlit without your own propagation.
Readiness¶
GET /api/readyz (hidden from OpenAPI) probes the Streamlit sidecar when FLUXLIT_STREAMLIT_UPSTREAM is set; see fluxlit.health. The probe requires a 2xx response from GET on the upstream root (not merely “any HTTP answer”). For Kubernetes-style probe configuration and curl examples, see Deployment. If probes fail in production, see Troubleshooting.
See also¶
Secrets lifecycle — keep credentials out of log pipelines.
Production TLS and edge headers — align log and probe URLs with production TLS and proxy trust.