docs: update PRD for Bishop State focus and add Gamma slide deck update prompt

William-Hill · William-Hill · commit 4f5ceb8fc3eb · 2026-02-22T00:25:06.000-05:00
- Reframe PRD from multi-institution prototype to live Bishop State deployment - Update all 12 sections to reflect what was actually built (7 models, readiness engine, NLQ with prompt history, Vercel deployment, methodology page) - Remove KCTCS as co-primary institution; retain as future onboarding target - Add concrete delivered metrics (4,000 students, 83.9% high readiness, live URL) - Add docs/gamma-update-prompt.md with ready-to-paste Gamma.app prompt for updating the slide deck Closes #63
diff --git a/docs/CodeBenders-PRD_Student_Success_Analytics.md b/docs/CodeBenders-PRD_Student_Success_Analytics.md
@@ -0,0 +1,225 @@
+**AI-Powered Student Success Analytics – Product Requirements Document (PRD)**
+
+**1\. Overview**
+
+This Product Requirements Document (PRD) defines the goals, requirements, and deliverables for the AI-Powered Student Success Analytics platform, developed by CodeBenders for the Datathon. The platform was built and deployed for **Bishop State Community College (BSCC)** — a historically Black community college serving ~4,000 students annually in Mobile, Alabama — and is designed to be extensible to other institutions using the Postsecondary Data Partnership (PDP) data standard.
+
+The platform combines seven machine learning models, a rule-based readiness scoring engine, a natural-language query interface, and a live analytics dashboard to give advisors, faculty, and institutional leadership actionable, data-informed insights for improving student retention and success.
+
+**2\. Problem Statement**
+
+Bishop State Community College faces challenges common to community colleges serving under-resourced student populations:
+
+• Students are majority Black/African American (59%), with high rates of part-time enrollment (68%) and first-generation college attendance — populations for whom early intervention is most impactful.
+
+• Existing data systems lack unified predictive capabilities. Advisors cannot quickly identify which students are at risk before academic difficulties compound.
+
+• Gateway course bottlenecks — particularly in math and English — are a leading predictor of non-retention, but course-level risk is not surfaced in existing tools.
+
+• Institutional reporting is slow and manual, limiting the ability to act on PDP data between annual submission cycles.
+
+**3\. Primary Users**
+
+• **Advisors** – Need early-warning insights and student-level risk indicators to prioritize caseloads.
+
+• **Institutional Researchers** – Need structured access to PDP + AR files + SIS data for analysis and federal reporting.
+
+• **Faculty** – Need course-level success indicators, gateway course insights, and readiness trends by cohort.
+
+• **Leadership** – Needs high-level retention, readiness, and enrollment metrics for resource planning and grant reporting.
+
+• **IT/Data Teams** – Need a streamlined, automated, validated data submission and ingestion workflow.
+
+**4\. Goals & Objectives**
+
+1\. Deliver a unified analytics dashboard integrating PDP, AR, and institutional data for Bishop State.
+
+2\. Provide seven predictive models covering retention, gateway course success, readiness, GPA risk, time-to-credential, and credential type outcomes.
+
+3\. Enable natural-language queries for fast, self-service analytics without SQL knowledge.
+
+4\. Surface a transparent, PDP-aligned readiness score for every student with human-readable explanations.
+
+5\. Improve student success metrics by enabling early, data-informed interventions.
+
+**5\. Scope**
+
+IN SCOPE:
+
+• Data ingestion pipeline (PDP → AR merge → institutional sources → Postgres/Supabase warehouse).
+
+• Unified dashboard with NLQ (natural-language querying) and prompt history/audit trail.
+
+• Seven predictive models: retention, at-risk early warning, gateway math success, gateway English success, GPA prediction, time-to-credential, credential type.
+
+• Readiness index calculation (0.0–1.0 scale, PDP-aligned, rule-based with full traceability).
+
+• Methodology page with research citations and worked examples.
+
+• Live deployment to Vercel backed by hosted Supabase.
+
+OUT OF SCOPE (for Datathon):
+
+• Real-time pipelines beyond PDP/AR files.
+
+• SIS system integration requiring institutional credentials.
+
+• GitHub Actions CI/CD (manual deploy script provided as interim solution).
+
+**6\. Institutional Requirements — Bishop State Community College**
+
+• Role-based access to PDP dashboards for advisors, faculty, and leadership.
+
+• A faculty-facing AI tool for chart generation and natural-language querying of student data.
+
+• Course sequencing insights and identification of high-risk gateway courses.
+
+• Readiness scoring that accounts for math placement level, enrollment intensity, and PDP momentum metrics.
+
+• Transparent, explainable predictions that advisors can act on without data science expertise.
+
+• FERPA-compliant data handling: no PII transmitted to LLM providers; student identifiers excluded from stored features.
+
+**7\. Functional Requirements**
+
+FR1. Data Integration
+
+• System must ingest PDP cohort and course files.
+
+• System must ingest AR files and merge with PDP using unique student IDs.
+
+• System must support mapping to institutional data schemas.
+
+FR2. Readiness Assessment
+
+• System must compute a readiness score (0.0–1.0) composed of academic (40%), engagement (30%), and ML risk (30%) sub-scores.
+
+• Score must be PDP-aligned, using the five PDP momentum metrics as inputs.
+
+• Every score must be fully traceable to its input features (stored as JSONB, no PII).
+
+• Score tier thresholds: High ≥ 0.65, Medium 0.40–0.64, Low < 0.40.
+
+FR3. Predictive Analytics
+
+• Seven predictive models: retention probability, at-risk alert level, gateway math success, gateway English success, first-semester GPA risk, time-to-credential, credential type.
+
+• Models must provide calibrated probabilities, not just binary predictions.
+
+• At-risk alerts must be consistent with retention probability (no contradictions).
+
+FR4. Dashboard Requirements
+
+• KPI tiles: overall retention rate, at-risk count, average readiness score, enrollment counts.
+
+• Charts: retention risk distribution, readiness distribution, at-risk breakdown.
+
+• Student-level drill-down with all prediction columns visible.
+
+• Filtering by cohort, term, demographic attributes, and credential type.
+
+FR5. AI Querying
+
+• NLQ interface must translate natural-language prompts into SQL and return visualizations.
+
+• Supported query types: retention trends, readiness distributions, gateway course performance, demographic equity gaps.
+
+• All queries must be logged to a prompt history panel (client) and server-side audit log (JSONL).
+
+• Users must be able to re-run any prior query from the history panel.
+
+FR6. Role-Based Access
+
+• Admin, Advisor, IR, Faculty, Leadership roles.
+
+• Access rules must define PDP visibility, AR visibility, and student-level data controls.
+
+FR7. Reporting & Methodology
+
+• Methodology page must document the scoring formula, research citations (PDP, CCRC, CAPR), and worked examples showing end-to-end score calculations for both high- and low-readiness students.
+
+• Server-side query audit log must be exportable for compliance review.
+
+**8\. Non-Functional Requirements**
+
+NFR1. Performance – Dashboard responses must render within 2–4 seconds for typical queries.
+
+NFR2. Security – PDP and AR files contain PII; encryption at rest + access control required. No student identifiers transmitted to LLM providers.
+
+NFR3. Maintainability – Models must be retrainable as new cohorts are added. Re-running the pipeline upserts scores without duplicates.
+
+NFR4. Usability – Dashboards and methodology page must be accessible to non-technical users (advisors, faculty).
+
+NFR5. Auditability – All data transformations and NLQ queries traceable for compliance (federal/state reporting). Prompt history logged server-side.
+
+**9\. Data Pipeline Requirements**
+
+• Must clean, validate, and conform PDP files to required schema.
+
+• Must support merging PDP cohort + PDP course + AR files into a unified student-level dataset.
+
+• Seven ML models trained and scored against the merged dataset in a single pipeline run.
+
+• Rule-based readiness scoring run as a separate, re-runnable step after the ML pipeline.
+
+• All outputs upserted to Postgres (Supabase) — no duplicates on re-run.
+
+• Data refreshed by re-running `scripts/deploy.sh --with-data`.
+
+**10\. Success Metrics**
+
+• **4,000 Bishop State students** scored with retention probability, readiness level, and seven prediction columns.
+
+• **Live deployment** at Vercel, backed by hosted Supabase (US East region).
+
+• **Readiness engine** producing High (83.9%), Medium (16.1%) distributions with full PDP alignment.
+
+• **NLQ interface** with prompt history, re-run, and server-side audit logging.
+
+• **Methodology page** with research citations and worked examples for advisor trust and transparency.
+
+• **Seven ML models** trained with cross-validation and overfitting checks, performance metrics stored in database.
+
+**11\. Risks & Assumptions**
+
+RISKS:
+
+• AR and PDP data schemas vary by institution — onboarding additional institutions requires schema mapping work.
+
+• Annual PDP submission cycles limit real-time insights between cohort years.
+
+• Connection pooler configuration varies by Supabase region — must be verified per deployment.
+
+ASSUMPTIONS:
+
+• PDP + AR data is accessible and provided for Bishop State.
+
+• Vercel serverless functions use the Supabase transaction pooler (port 6543), not the direct connection.
+
+• Dashboard usage patterns will mirror reported advisor and faculty workflows.
+
+**12\. Current Status & Next Steps**
+
+DELIVERED:
+
+• Full ML pipeline (7 models) trained and scored against 4,000 Bishop State students.
+
+• Rule-based readiness engine (PDP-aligned) with audit logging.
+
+• Live dashboard deployed to Vercel with Supabase backend.
+
+• NLQ interface with prompt history and server-side audit trail.
+
+• Methodology page with research citations, scoring formula, and worked examples.
+
+NEXT STEPS:
+
+• Set up GitHub Actions for automated Vercel deploy on push to `main` (interim: `scripts/deploy.sh`).
+
+• Obtain `devcolor/codebenders-datathon` repo write access for CI/CD integration.
+
+• Onboard additional institutions (University of Akron, KCTCS) using the same PDP-aligned pipeline.
+
+• Add role-based access control (advisor vs. leadership vs. IR views).
+
+• Explore scheduled pipeline re-runs for quarterly PDP refresh cycles.
diff --git a/docs/gamma-update-prompt.md b/docs/gamma-update-prompt.md
@@ -0,0 +1,160 @@
+# Gamma.app Slide Deck Update Prompt
+
+Paste the prompt below into Gamma.app to update the **AI-Powered Student Success Analytics** slide deck.
+
+---
+
+## Prompt
+
+Update the existing "AI-Powered Student Success Analytics" slide deck for the CodeBenders Datathon submission. The platform has been fully built and deployed. Replace any draft/planned framing with current, delivered-state language. Apply the following changes slide by slide, then ensure visual consistency throughout.
+
+---
+
+### Overall Framing Changes
+
+- The platform is no longer described as a multi-institution prototype. It is a **live, deployed system built for Bishop State Community College (BSCC)**, a historically Black community college in Mobile, Alabama (~4,000 students/year). It is designed to extend to other PDP institutions.
+- Remove or de-emphasize KCTCS as a co-equal institution. References to University of Akron and KCTCS can appear as "future institution onboarding" examples only.
+- Replace any "we will build" or "planned" language with "we built" and "delivered."
+
+---
+
+### Slide: Title / Cover
+
+- Title: **AI-Powered Student Success Analytics**
+- Subtitle: **Bishop State Community College × CodeBenders**
+- Add: "Live at [your-vercel-url]"
+- Tagline: *Turning PDP data into proactive student interventions*
+
+---
+
+### Slide: Problem Statement
+
+Replace with:
+
+**The Challenge at Bishop State Community College**
+
+- 59% Black/African American student population — early intervention matters most for this community
+- 68% part-time enrollment — students juggling work, family, and school need proactive outreach
+- Gateway course bottlenecks in math and English are the #1 predictor of non-retention — but aren't surfaced in existing tools
+- Advisors lack unified, predictive views of student risk before academic difficulties compound
+- PDP reporting is manual and annual — no mid-cycle alerting capability
+
+---
+
+### Slide: Solution Overview
+
+**What We Built**
+
+A full-stack AI analytics platform with three layers:
+
+1. **ML Pipeline** — 7 predictive models trained on 4,000 Bishop State students (retention, at-risk, gateway math/English success, GPA risk, time-to-credential, credential type)
+2. **Readiness Engine** — PDP-aligned rule-based scoring (academic 40% + engagement 30% + ML risk 30%) with full traceability and human-readable explanations
+3. **Live Dashboard** — Natural-language query interface, KPI tiles, retention risk charts, prompt history with re-run and audit trail
+
+---
+
+### Slide: Architecture / Tech Stack
+
+Update the architecture diagram to reflect:
+
+- **Data Layer:** Bishop State PDP cohort + AR files → Python ML pipeline → Postgres (Supabase, hosted, US East)
+- **ML Layer:** XGBoost + Random Forest + Logistic Regression, 7 models, scikit-learn
+- **Application Layer:** Next.js 16 + React 19 + TypeScript, deployed on Vercel
+- **AI Features:** OpenAI-powered NLQ → SQL → Recharts visualizations
+- **Audit:** Server-side JSONL query log, prompt history in localStorage
+
+Stack badges: Python · XGBoost · scikit-learn · Next.js · Supabase · Vercel · OpenAI
+
+---
+
+### Slide: The 7 Predictive Models
+
+| Model | Output | Algorithm |
+|-------|--------|-----------|
+| Retention Prediction | Probability + risk tier | Logistic Regression |
+| At-Risk Early Warning | URGENT / HIGH / MODERATE / LOW | Composite rule engine |
+| Gateway Math Success | Pass probability | XGBoost |
+| Gateway English Success | Pass probability | XGBoost |
+| First-Semester GPA Risk | Low GPA probability | XGBoost |
+| Time-to-Credential | Predicted years to completion | Random Forest Regressor |
+| Credential Type | Associate / Certificate / Bachelor | Random Forest Classifier |
+
+Trained on 4,000 Bishop State students with cross-validation and overfitting checks.
+
+---
+
+### Slide: Readiness Score
+
+**PDP-Aligned Readiness Index**
+
+Formula:
+> Readiness = (Academic × 40%) + (Engagement × 30%) + (ML Risk × 30%)
+
+Tiers: 🟢 High ≥ 0.65 · 🟡 Medium 0.40–0.64 · 🔴 Low < 0.40
+
+Current Bishop State distribution:
+- High Readiness: 83.9% (3,355 students)
+- Medium Readiness: 16.1% (645 students)
+
+Grounded in: PDP momentum metrics, CCRC Multiple Measures research, Bird et al. (2021) transparency in predictive analytics.
+
+Every score is fully traceable — no black box.
+
+---
+
+### Slide: Dashboard Features
+
+**What Advisors & Leadership See**
+
+- **KPI Tiles:** Overall retention rate, at-risk student count, average readiness score
+- **Charts:** Retention risk distribution, readiness breakdown, at-risk alert levels
+- **NLQ Query Interface:** Type a question in plain English → get a chart + data table
+- **Prompt History:** Every query logged with timestamp, re-runnable in one click
+- **Methodology Page:** Research citations, scoring formula, worked examples (Maria T. → 0.699 High; Jordan M. → 0.386 Low)
+
+---
+
+### Slide: FERPA & Transparency
+
+**Built for Institutional Trust**
+
+- No PII transmitted to any LLM provider — only aggregate behavioral metrics (GPA group, completion rate, placement level)
+- Student GUIDs excluded from stored features
+- Every readiness score traceable to its inputs
+- Server-side audit log of all NLQ queries (JSONL)
+- Methodology page publicly accessible for advisor onboarding
+
+Complies with FERPA §99.31(a)(1) for legitimate educational interest use.
+
+---
+
+### Slide: Results & Impact
+
+**Delivered for Bishop State**
+
+- ✅ 4,000 students scored across 7 prediction dimensions
+- ✅ Live dashboard deployed (Vercel + Supabase)
+- ✅ NLQ interface with prompt history and audit trail
+- ✅ PDP-aligned readiness engine with research citations
+- ✅ Methodology page with worked examples for advisor transparency
+- ✅ Deploy script for ongoing data refresh
+
+---
+
+### Slide: Next Steps / Roadmap
+
+- **CI/CD:** GitHub Actions for automated Vercel deploy on `main` push
+- **Multi-institution:** Onboard University of Akron and KCTCS using the same PDP-aligned pipeline
+- **Role-based access:** Advisor vs. Leadership vs. IR views
+- **Scheduled refresh:** Quarterly PDP pipeline re-runs
+- **Enhanced NLQ:** Demographic equity gap queries, cohort comparison
+
+---
+
+### Design Notes for Gamma
+
+- Keep the existing color scheme and layout style
+- Use data callout cards for the key numbers (4,000 students, 7 models, 83.9% High Readiness)
+- The architecture slide should use a left-to-right flow diagram: Data → ML Pipeline → Supabase → Next.js/Vercel → User
+- The readiness score slide should visually show the three weighted components adding up to the final score
+- Add Bishop State's colors (navy and gold) as accent colors where appropriate