feat(OBSERV-02): KPI analytics dashboard — active users, conversations & messages#1722
Draft
florian-muller wants to merge 13 commits into
Draft
feat(OBSERV-02): KPI analytics dashboard — active users, conversations & messages#1722florian-muller wants to merge 13 commits into
florian-muller wants to merge 13 commits into
Conversation
Replace per-backend ad-hoc KPI wiring with a shared KpiObservabilityConfig in fred-core, covering log, Prometheus, and OpenSearch sinks independently. Add build_kpi_writer() factory so all three backends use identical sink stacking logic. Prod defaults enable Prometheus + OpenSearch; local dev enables log only.
…ntator Add a shared FastAPI/Starlette middleware in fred-core that emits api.request_latency_ms for every request. Mounted in control-plane, knowledge-flow, and agent pods. Supports eager and lazy (lifespan-safe) KPIWriter resolution; skips health/readiness probes; tags unauthenticated calls as actor_subtype=anonymous. Removes prometheus_fastapi_instrumentator from knowledge-flow-backend. Also includes the KPI-ANALYTICS-RFC describing the full analytics architecture.
- Fix ruff import-order in fred-core kpi/__init__.py and fred-runtime agent_app.py
- Fix ruff format in knowledge-flow tests/core/test_main_worker.py
- Add get_kpi_writer() and start_metrics_exporter() stubs to _FakeContainer
in control-plane test_main.py to match the new main.py composition root
- Update fred-runtime test configs to use the new KpiObservabilityConfig schema
(observability.kpi.{log,prometheus,opensearch}) instead of the removed
observability.metrics string and app.metrics_port fields; port 9900 is now
passed directly under prometheus config
- Disable opensearch KPI sink in all offline test fixtures (conftest.py) to
prevent OpenSearchKPIStore from attempting TCP connections under --disable-socket
- Add ObservabilityConfig with opensearch KPI disabled to knowledge-flow test
conftest so ingestion service tests no longer trigger socket blocks
…in test configs - Fix ruff import-order in control-plane app/context.py and knowledge-flow tests/conftest.py - Add offline KPI config (opensearch disabled) to test_history.py and test_openai_compat_router.py local config builders — these were missing the observability.kpi block, causing OpenSearchKPIStore to attempt TCP connections under --disable-socket
- Add /kpi/presets router to control-plane-backend with a preset system - Add active_users_over_time preset: cardinality of user_id per time bucket, filtered to human actors on api.request_latency_ms - Auto-select bucket interval (1s/1m/1h/1d) from since/until span, mirroring getPrecisionForRange() thresholds from frontend timeAxis.ts - Use extended_bounds to fill empty buckets across the full range - since/until are Python datetime params (ISO 8601); defaults to last 30 days - Add kpi/utils.py with resolve_interval() shared helper
…ive_users_over_time)
- Add AnalyticsPage with active-users-over-time chart wired to the /kpi/presets/active_users_over_time endpoint - Add TimeRangeSelector molecule (Grafana-style): preset list + custom since/until panel side-by-side in a dropdown - Add DateTimeInput atom with 24h locale forcing and styled segments - Add i18n keys (en/fr) under rework.analytics.* - Extend MaterialIconType with refresh, schedule, edit_calendar, expand_less, expand_more - Update OpenAPI types and RTK hook alias to active_users_over_time - Sync document.documentElement.lang with i18n language for native datetime-local 24h formatting in Chrome
…eChart molecule Introduce a common TimeSeriesResponse/TimeSeriesPoint contract for time-bucketed KPI presets. Replace the hand-rolled BarChart with a reusable TimeSeriesLineChart molecule (Recharts) that resolves design-system tokens at runtime. Move the refresh button to the page header next to the time range selector. Add a KPI README for future preset authors.
Add ScalarResponse common type and unique_users_total preset (cardinality aggregation over the full time range). Display the result in a new KpiStatCard molecule — label top-left, number centred — placed to the left of the active users line chart in a responsive flex row that wraps on narrow viewports.
…hes on timeRange change
… dashboard - Emit session.created_total at session creation (dims: team_id, scope_type, agent_instance_id, user_id) - Fix agent.turn_completed actor from system to human so turns carry user attribution - Add sessions_over_time and messages_over_time preset endpoints - Add KpiSection/KpiRow layout molecule with compactFirst/compactLast support - Wire two new charts and totals (summed client-side) into the analytics page
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
api.request_latency_ms): already shipped — drives active users metricssession.created_totalat session creation with dimsteam_id,scope_type(personal/team),agent_instance_id, and user attribution via actoragent.turn_completedandagent.turn_error_totalactor fromsystemtohumanso turns carryuser_idfor analyticsGET /kpi/presets/sessions_over_time— new conversations per time bucketGET /kpi/presets/messages_over_time— agent turns per time bucket/admin/analytics):KpiSection+KpiRowreusable layout molecules (compactFirst/compactLastprops)Test plan
GET /control-plane/v1/kpi/presets/sessions_over_time— returns{"rows": [...], "interval": "1d", ...}GET /control-plane/v1/kpi/presets/messages_over_time— same shapesession.created_totalKPI event appears in OpenSearch withdims.user_idsetagent.turn_completedevent hasdims.actor_type = "human"anddims.user_idset (was"system"before)/admin/analytics— three rows render: unique users, conversations, messages