Observability

Why read this: request IDs for support tickets, optional one-line-per-request gateway logs for HTTP debugging, readiness (/api/readyz) behavior, JSON logging for aggregators, SLO-oriented alerting examples, and a recipe for OpenTelemetry later.

Request IDs

The gateway and optional FastAPI access logging use fluxlit.logging.context.REQUEST_ID_HEADER (X-Request-ID). The ID is stored in a contextvars.ContextVar for the duration of each request; see fluxlit.logging.

The gateway replaces X-Request-ID on the Streamlit upstream (HTTP and WebSocket) with that same resolved value so sidecar access logs can join to gateway and API lines.

Correlation path

flowchart LR
  browser[Browser]
  gw[FluxLit gateway]
  api[FastAPI /api]
  st[Streamlit sidecar]

  browser -->|"X-Request-ID (optional)"| gw
  gw -->|"X-Request-ID (authoritative)"| st
  gw --> api

For Streamlit → /api calls from Python, correlation is separate processes unless you pass a header explicitly (advanced: propagate from browser context when your Streamlit version exposes it).

Structured gateway logs

When enable_gateway_access_log is True, the gateway emits one INFO log per request with extra fields:

With the default (False), the same line is logged at DEBUG only.

When debug mode is on (FLUXLIT_DEBUG=1 or fluxlit dev / run / workbench --debug), gateway access logging and API request logging are enabled by default, the default log level moves to debug if it was still info, and the gateway emits an extra DEBUG line per request with the split between API path and Streamlit path. A redacted JSON snapshot is available at GET /__fluxlit/debug (disabled if that path would collide with api_mount_path); see Configuration, the Debug mode section in Troubleshooting, and how that interacts with query tokens in URL sessions, query tokens, and email links (security).

If you enable gateway INFO logs in production, combine them with your normal log pipeline (filters, aggregators) and scrub or avoid echoing sensitive headers. For copying header dicts into logs or debug output, use fluxlit.logging.redact. Broader secrets and rotation guidance: Secrets lifecycle.

Gateway log schema

The gateway access-log field catalog is exported as fluxlit.logging.GATEWAY_ACCESS_LOG_FIELDS. Treat these fields as stable for dashboards and alert routing:

Field

Stability

Notes

request_id

Stable

Present in the formatted message and JSON logs when emitted as extra; joins gateway/API/upstream evidence.

fluxlit_dispatch

Stable

api or streamlit; low cardinality and safe for dashboards.

http_method_or_type

Stable

HTTP method or websocket; safe for grouping.

path

Stable

Raw ASGI path; consider route normalization in your own log pipeline if paths include user IDs.

query

Stable, redacted

Sensitive query values are redacted, including URL-session keys.

Internal / debug-only (not part of the stable access-log contract): when debug mode is on, the gateway may emit additional DEBUG lines (for example path-split hints and dispatch diagnostics). Do not build alerts or SLO dashboards on those lines; they can change without a semver note in patch releases. The DEBUG line emitted when a Prometheus histogram observe fails is similarly diagnostic-only (see Metrics contract below).

JSON log lines (Loki / Datadog-style)

Use JsonLogFormatter so each log record is a single JSON object with at least time, level, logger, message, plus any attributes from logger.info(..., extra={...}) (for example request_id, fluxlit_dispatch, path from gateway access logs).

Suggested field conventions for log stacks:

Field

Role

time

Event timestamp (formatter output).

level / logger

Filter and split by component.

message

Human-readable line.

request_id

Join gateway, API, and upstream Streamlit lines when present.

fluxlit_dispatch

api vs streamlit for quick routing dashboards.

The base JSON field contract is exported as fluxlit.logging.JSON_LOG_BASE_FIELDS.

Attach the formatter to Uvicorn and your app loggers via logging.dictConfig (often from a small Python file referenced by LOGGING_CONFIG or equivalent in your process manager):

LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "json": {
            "()": "fluxlit.logging.json_formatter.JsonLogFormatter",
        },
    },
    "handlers": {
        "default": {
            "class": "logging.StreamHandler",
            "formatter": "json",
        },
    },
    "loggers": {
        "uvicorn": {"handlers": ["default"], "level": "INFO", "propagate": False},
        "uvicorn.error": {"handlers": ["default"], "level": "INFO", "propagate": False},
        "uvicorn.access": {"handlers": ["default"], "level": "INFO", "propagate": False},
        "fluxlit": {"handlers": ["default"], "level": "INFO", "propagate": False},
        "fluxlit.gateway": {"handlers": ["default"], "level": "INFO", "propagate": False},
    },
    "root": {"handlers": ["default"], "level": "WARNING"},
}

Then logging.config.dictConfig(LOGGING) during startup, or point Uvicorn at a module path that does the same. Tune levels per environment; avoid duplicating access logs if both Uvicorn access and fluxlit.gateway INFO lines are too chatty.

SLOs & alerting

FluxLit does not ship Prometheus rules or managed alerts; operators should define SLOs on the same probes documented in Deployment.

Liveness (GET /api/healthz) — the inner API process responds; use for restart if wedged style checks. Example SLO: 99.9% of probes succeed over 30 days; alert on sustained probe failure (Pod restarts) rather than single blips.

Readiness (GET /api/readyz) — when Streamlit upstream is configured, 503 means the sidecar is not accepting traffic the way the gateway expects. Example SLO: readyz success rate (or error budget when expressed as allowed 503 minutes per month). Burn alerts when the 5xx rate on readyz rises over a short window (e.g. 5m) while healthz stays green — that pattern isolates Streamlit/upstream issues.

Example Kubernetes probes (paths depend on api_mount_path, default /api):

livenessProbe:
  httpGet:
    path: /api/healthz
    port: http
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /api/readyz
    port: http
  periodSeconds: 5

Example Prometheus alert sketch (adjust labels and job to your setup): fire when rate(http_requests_total{path="/api/readyz",status="503"}[5m]) is above a small threshold while healthz success stays high — indicating Streamlit or upstream misconfiguration rather than total process death.

Gateway Prometheus metrics (RED)

When FLUXLIT_ENABLE_GATEWAY_PROMETHEUS_METRICS=1 and prometheus-client is installed (pip install "fluxlit[metrics]" or include in your image), the gateway exposes GET on FLUXLIT_GATEWAY_PROMETHEUS_METRICS_PATH (default /__fluxlit/metrics) in Prometheus text format.

  • fluxlit_gateway_requests_total — labels dispatch (api vs streamlit) and method_kind (HTTP method or WEBSOCKET).

  • fluxlit_gateway_request_duration_seconds — histogram by dispatch (wall time for one gateway request; scrape path responses are excluded).

If the histogram observe step raises (for example a client or label mismatch), the gateway logs a DEBUG line on the fluxlit.gateway logger with the dispatch label and exception context, then continues serving the request (metrics for that hop may be missing).

The path must not be under your api_mount_path or it will shadow API routes (the runtime logs a warning and disables metrics). Secure the endpoint at your ingress (allow only Prometheus scrapers) or keep metrics disabled in untrusted networks.

USE-style saturation (CPU, memory, file descriptors) is not emitted by FluxLit core; scrape the node or cAdvisor / kube-state-metrics alongside these application counters.

Metrics contract

The current metric catalog is exported as fluxlit.gateway.metrics.GATEWAY_PROMETHEUS_METRICS (each entry includes a machine-readable stability field mirrored here):

Metric

Type

Labels

Stability

fluxlit_gateway_requests_total

Counter

dispatch, method_kind

Stable for 0.x dashboards.

fluxlit_gateway_request_duration_seconds

Histogram

dispatch

Stable name and labels; see bucket policy below.

Label semantics (low cardinality by design):

  • dispatch: api or streamlit — see fluxlit.gateway.dispatch for when each applies.

  • method_kind: upper-case HTTP method (for example GET) or WEBSOCKET for WebSocket upgrades.

Histogram buckets: changing default bucket boundaries in FluxLit is treated as a semver-visible change (minor or major), because it affects histogram compatibility in Prometheus and Grafana heatmaps. Adding new buckets without removing old ones is less disruptive but still note it in the changelog.

FluxLit intentionally does not label by raw path, status code, exception type, user, tenant, or query string in core metrics. Add those in your own app metrics only after you have a cardinality budget.

Python logging filters

Use a logging.Filter to drop noisy loggers or scrub fields before logs reach stdout or a log aggregator (in addition to fluxlit.logging.redact for header maps).

OpenTelemetry tracing hook recipe

FluxLit does not bundle OpenTelemetry. Keep OTel dependencies in your app image and bridge the no-dependency FluxLit hook into your tracer. A runnable example is available in examples/otel_tracing/:

python -m pip install -e .
python -m pip install -r examples/otel_tracing/requirements.txt
fluxlit run examples.otel_tracing.app:app --no-pidfile

Minimal hook shape:

from contextlib import contextmanager

from fluxlit import set_trace_hook


@contextmanager
def otel_trace_hook(name, attributes):
    with tracer.start_as_current_span(name) as span:
        for key, value in attributes.items():
            if value is not None:
                span.set_attribute(key, value)
        yield


set_trace_hook(otel_trace_hook)

Gateway dispatch wraps each request in a span named fluxlit.gateway.request with low-cardinality attributes including fluxlit.dispatch, http.method_or_type, url.path, and request_id. The gateway → Streamlit HTTP hop adds fluxlit.gateway.upstream_http with http.request.method, url.full, and fluxlit.request_id (experimental attribute names; treat as diagnostic until 1.0 charter tightens).

Because the browser hits a single port, ingress spans should label whether work happened on the API (/api/...) or the Streamlit proxy path.

For a fuller deployment, also consider:

  1. FastAPI: use opentelemetry-instrumentation-fastapi on app.api.

  2. Outbound HTTP: instrument httpx if you propagate traces to upstreams beyond FluxLit’s hop.

  3. Streamlit subprocess: treat it as its own service unless you add custom propagation.

Trace context (W3C traceparent)

The gateway already forwards X-Request-ID to Streamlit on proxied HTTP and WebSockets.

traceparent / tracestate: when present on the incoming browser request, they pass through :func:fluxlit.gateway.header_filter.filter_request_headers and are copied onto the gateway → Streamlit HTTP hop (same as other non–hop-by-hop client headers). FluxLit does not synthesize W3C trace IDs from request_id; inject them at your edge or in middleware if you need OTel interoperability. For OpenTelemetry, map the FluxLit hook spans above onto your tracer and align IDs with headers your mesh already emits.

Correlation limits (gateway vs Streamlit)

  • Gateway-centered IDs: X-Request-ID is set in the gateway ASGI process and forwarded to the Streamlit upstream. Gateway access logs and nginx can align on that header.

  • Streamlit subprocess: Streamlit page code runs in the child process. The parent’s contextvars.ContextVar used for get_request_id() in the gateway is not automatically visible inside arbitrary Streamlit callbacks. For server-side httpx calls from Streamlit to /api, use ApiClient with propagate_request_id=True only when your code has set a correlation id in that process (or pass headers explicitly). Do not assume the browser’s request id appears in Streamlit without your own propagation.

Readiness

GET /api/readyz (hidden from OpenAPI) probes the Streamlit sidecar when FLUXLIT_STREAMLIT_UPSTREAM is set; see fluxlit.health. The probe requires a 2xx response from GET on the upstream root (not merely “any HTTP answer”). For Kubernetes-style probe configuration and curl examples, see Deployment. If probes fail in production, see Troubleshooting.

See also