Skip to content

Commit 4f5ceb8

Browse files
committed
docs: update PRD for Bishop State focus and add Gamma slide deck update prompt
- Reframe PRD from multi-institution prototype to live Bishop State deployment - Update all 12 sections to reflect what was actually built (7 models, readiness engine, NLQ with prompt history, Vercel deployment, methodology page) - Remove KCTCS as co-primary institution; retain as future onboarding target - Add concrete delivered metrics (4,000 students, 83.9% high readiness, live URL) - Add docs/gamma-update-prompt.md with ready-to-paste Gamma.app prompt for updating the slide deck Closes #63
1 parent 390dfa5 commit 4f5ceb8

2 files changed

Lines changed: 385 additions & 0 deletions

File tree

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
**AI-Powered Student Success Analytics – Product Requirements Document (PRD)**
2+
3+
**1\. Overview**
4+
5+
This Product Requirements Document (PRD) defines the goals, requirements, and deliverables for the AI-Powered Student Success Analytics platform, developed by CodeBenders for the Datathon. The platform was built and deployed for **Bishop State Community College (BSCC)** — a historically Black community college serving ~4,000 students annually in Mobile, Alabama — and is designed to be extensible to other institutions using the Postsecondary Data Partnership (PDP) data standard.
6+
7+
The platform combines seven machine learning models, a rule-based readiness scoring engine, a natural-language query interface, and a live analytics dashboard to give advisors, faculty, and institutional leadership actionable, data-informed insights for improving student retention and success.
8+
9+
**2\. Problem Statement**
10+
11+
Bishop State Community College faces challenges common to community colleges serving under-resourced student populations:
12+
13+
• Students are majority Black/African American (59%), with high rates of part-time enrollment (68%) and first-generation college attendance — populations for whom early intervention is most impactful.
14+
15+
• Existing data systems lack unified predictive capabilities. Advisors cannot quickly identify which students are at risk before academic difficulties compound.
16+
17+
• Gateway course bottlenecks — particularly in math and English — are a leading predictor of non-retention, but course-level risk is not surfaced in existing tools.
18+
19+
• Institutional reporting is slow and manual, limiting the ability to act on PDP data between annual submission cycles.
20+
21+
**3\. Primary Users**
22+
23+
**Advisors** – Need early-warning insights and student-level risk indicators to prioritize caseloads.
24+
25+
**Institutional Researchers** – Need structured access to PDP + AR files + SIS data for analysis and federal reporting.
26+
27+
**Faculty** – Need course-level success indicators, gateway course insights, and readiness trends by cohort.
28+
29+
**Leadership** – Needs high-level retention, readiness, and enrollment metrics for resource planning and grant reporting.
30+
31+
**IT/Data Teams** – Need a streamlined, automated, validated data submission and ingestion workflow.
32+
33+
**4\. Goals & Objectives**
34+
35+
1\. Deliver a unified analytics dashboard integrating PDP, AR, and institutional data for Bishop State.
36+
37+
2\. Provide seven predictive models covering retention, gateway course success, readiness, GPA risk, time-to-credential, and credential type outcomes.
38+
39+
3\. Enable natural-language queries for fast, self-service analytics without SQL knowledge.
40+
41+
4\. Surface a transparent, PDP-aligned readiness score for every student with human-readable explanations.
42+
43+
5\. Improve student success metrics by enabling early, data-informed interventions.
44+
45+
**5\. Scope**
46+
47+
IN SCOPE:
48+
49+
• Data ingestion pipeline (PDP → AR merge → institutional sources → Postgres/Supabase warehouse).
50+
51+
• Unified dashboard with NLQ (natural-language querying) and prompt history/audit trail.
52+
53+
• Seven predictive models: retention, at-risk early warning, gateway math success, gateway English success, GPA prediction, time-to-credential, credential type.
54+
55+
• Readiness index calculation (0.0–1.0 scale, PDP-aligned, rule-based with full traceability).
56+
57+
• Methodology page with research citations and worked examples.
58+
59+
• Live deployment to Vercel backed by hosted Supabase.
60+
61+
OUT OF SCOPE (for Datathon):
62+
63+
• Real-time pipelines beyond PDP/AR files.
64+
65+
• SIS system integration requiring institutional credentials.
66+
67+
• GitHub Actions CI/CD (manual deploy script provided as interim solution).
68+
69+
**6\. Institutional Requirements — Bishop State Community College**
70+
71+
• Role-based access to PDP dashboards for advisors, faculty, and leadership.
72+
73+
• A faculty-facing AI tool for chart generation and natural-language querying of student data.
74+
75+
• Course sequencing insights and identification of high-risk gateway courses.
76+
77+
• Readiness scoring that accounts for math placement level, enrollment intensity, and PDP momentum metrics.
78+
79+
• Transparent, explainable predictions that advisors can act on without data science expertise.
80+
81+
• FERPA-compliant data handling: no PII transmitted to LLM providers; student identifiers excluded from stored features.
82+
83+
**7\. Functional Requirements**
84+
85+
FR1. Data Integration
86+
87+
• System must ingest PDP cohort and course files.
88+
89+
• System must ingest AR files and merge with PDP using unique student IDs.
90+
91+
• System must support mapping to institutional data schemas.
92+
93+
FR2. Readiness Assessment
94+
95+
• System must compute a readiness score (0.0–1.0) composed of academic (40%), engagement (30%), and ML risk (30%) sub-scores.
96+
97+
• Score must be PDP-aligned, using the five PDP momentum metrics as inputs.
98+
99+
• Every score must be fully traceable to its input features (stored as JSONB, no PII).
100+
101+
• Score tier thresholds: High ≥ 0.65, Medium 0.40–0.64, Low < 0.40.
102+
103+
FR3. Predictive Analytics
104+
105+
• Seven predictive models: retention probability, at-risk alert level, gateway math success, gateway English success, first-semester GPA risk, time-to-credential, credential type.
106+
107+
• Models must provide calibrated probabilities, not just binary predictions.
108+
109+
• At-risk alerts must be consistent with retention probability (no contradictions).
110+
111+
FR4. Dashboard Requirements
112+
113+
• KPI tiles: overall retention rate, at-risk count, average readiness score, enrollment counts.
114+
115+
• Charts: retention risk distribution, readiness distribution, at-risk breakdown.
116+
117+
• Student-level drill-down with all prediction columns visible.
118+
119+
• Filtering by cohort, term, demographic attributes, and credential type.
120+
121+
FR5. AI Querying
122+
123+
• NLQ interface must translate natural-language prompts into SQL and return visualizations.
124+
125+
• Supported query types: retention trends, readiness distributions, gateway course performance, demographic equity gaps.
126+
127+
• All queries must be logged to a prompt history panel (client) and server-side audit log (JSONL).
128+
129+
• Users must be able to re-run any prior query from the history panel.
130+
131+
FR6. Role-Based Access
132+
133+
• Admin, Advisor, IR, Faculty, Leadership roles.
134+
135+
• Access rules must define PDP visibility, AR visibility, and student-level data controls.
136+
137+
FR7. Reporting & Methodology
138+
139+
• Methodology page must document the scoring formula, research citations (PDP, CCRC, CAPR), and worked examples showing end-to-end score calculations for both high- and low-readiness students.
140+
141+
• Server-side query audit log must be exportable for compliance review.
142+
143+
**8\. Non-Functional Requirements**
144+
145+
NFR1. Performance – Dashboard responses must render within 2–4 seconds for typical queries.
146+
147+
NFR2. Security – PDP and AR files contain PII; encryption at rest + access control required. No student identifiers transmitted to LLM providers.
148+
149+
NFR3. Maintainability – Models must be retrainable as new cohorts are added. Re-running the pipeline upserts scores without duplicates.
150+
151+
NFR4. Usability – Dashboards and methodology page must be accessible to non-technical users (advisors, faculty).
152+
153+
NFR5. Auditability – All data transformations and NLQ queries traceable for compliance (federal/state reporting). Prompt history logged server-side.
154+
155+
**9\. Data Pipeline Requirements**
156+
157+
• Must clean, validate, and conform PDP files to required schema.
158+
159+
• Must support merging PDP cohort + PDP course + AR files into a unified student-level dataset.
160+
161+
• Seven ML models trained and scored against the merged dataset in a single pipeline run.
162+
163+
• Rule-based readiness scoring run as a separate, re-runnable step after the ML pipeline.
164+
165+
• All outputs upserted to Postgres (Supabase) — no duplicates on re-run.
166+
167+
• Data refreshed by re-running `scripts/deploy.sh --with-data`.
168+
169+
**10\. Success Metrics**
170+
171+
**4,000 Bishop State students** scored with retention probability, readiness level, and seven prediction columns.
172+
173+
**Live deployment** at Vercel, backed by hosted Supabase (US East region).
174+
175+
**Readiness engine** producing High (83.9%), Medium (16.1%) distributions with full PDP alignment.
176+
177+
**NLQ interface** with prompt history, re-run, and server-side audit logging.
178+
179+
**Methodology page** with research citations and worked examples for advisor trust and transparency.
180+
181+
**Seven ML models** trained with cross-validation and overfitting checks, performance metrics stored in database.
182+
183+
**11\. Risks & Assumptions**
184+
185+
RISKS:
186+
187+
• AR and PDP data schemas vary by institution — onboarding additional institutions requires schema mapping work.
188+
189+
• Annual PDP submission cycles limit real-time insights between cohort years.
190+
191+
• Connection pooler configuration varies by Supabase region — must be verified per deployment.
192+
193+
ASSUMPTIONS:
194+
195+
• PDP + AR data is accessible and provided for Bishop State.
196+
197+
• Vercel serverless functions use the Supabase transaction pooler (port 6543), not the direct connection.
198+
199+
• Dashboard usage patterns will mirror reported advisor and faculty workflows.
200+
201+
**12\. Current Status & Next Steps**
202+
203+
DELIVERED:
204+
205+
• Full ML pipeline (7 models) trained and scored against 4,000 Bishop State students.
206+
207+
• Rule-based readiness engine (PDP-aligned) with audit logging.
208+
209+
• Live dashboard deployed to Vercel with Supabase backend.
210+
211+
• NLQ interface with prompt history and server-side audit trail.
212+
213+
• Methodology page with research citations, scoring formula, and worked examples.
214+
215+
NEXT STEPS:
216+
217+
• Set up GitHub Actions for automated Vercel deploy on push to `main` (interim: `scripts/deploy.sh`).
218+
219+
• Obtain `devcolor/codebenders-datathon` repo write access for CI/CD integration.
220+
221+
• Onboard additional institutions (University of Akron, KCTCS) using the same PDP-aligned pipeline.
222+
223+
• Add role-based access control (advisor vs. leadership vs. IR views).
224+
225+
• Explore scheduled pipeline re-runs for quarterly PDP refresh cycles.

docs/gamma-update-prompt.md

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# Gamma.app Slide Deck Update Prompt
2+
3+
Paste the prompt below into Gamma.app to update the **AI-Powered Student Success Analytics** slide deck.
4+
5+
---
6+
7+
## Prompt
8+
9+
Update the existing "AI-Powered Student Success Analytics" slide deck for the CodeBenders Datathon submission. The platform has been fully built and deployed. Replace any draft/planned framing with current, delivered-state language. Apply the following changes slide by slide, then ensure visual consistency throughout.
10+
11+
---
12+
13+
### Overall Framing Changes
14+
15+
- The platform is no longer described as a multi-institution prototype. It is a **live, deployed system built for Bishop State Community College (BSCC)**, a historically Black community college in Mobile, Alabama (~4,000 students/year). It is designed to extend to other PDP institutions.
16+
- Remove or de-emphasize KCTCS as a co-equal institution. References to University of Akron and KCTCS can appear as "future institution onboarding" examples only.
17+
- Replace any "we will build" or "planned" language with "we built" and "delivered."
18+
19+
---
20+
21+
### Slide: Title / Cover
22+
23+
- Title: **AI-Powered Student Success Analytics**
24+
- Subtitle: **Bishop State Community College × CodeBenders**
25+
- Add: "Live at [your-vercel-url]"
26+
- Tagline: *Turning PDP data into proactive student interventions*
27+
28+
---
29+
30+
### Slide: Problem Statement
31+
32+
Replace with:
33+
34+
**The Challenge at Bishop State Community College**
35+
36+
- 59% Black/African American student population — early intervention matters most for this community
37+
- 68% part-time enrollment — students juggling work, family, and school need proactive outreach
38+
- Gateway course bottlenecks in math and English are the #1 predictor of non-retention — but aren't surfaced in existing tools
39+
- Advisors lack unified, predictive views of student risk before academic difficulties compound
40+
- PDP reporting is manual and annual — no mid-cycle alerting capability
41+
42+
---
43+
44+
### Slide: Solution Overview
45+
46+
**What We Built**
47+
48+
A full-stack AI analytics platform with three layers:
49+
50+
1. **ML Pipeline** — 7 predictive models trained on 4,000 Bishop State students (retention, at-risk, gateway math/English success, GPA risk, time-to-credential, credential type)
51+
2. **Readiness Engine** — PDP-aligned rule-based scoring (academic 40% + engagement 30% + ML risk 30%) with full traceability and human-readable explanations
52+
3. **Live Dashboard** — Natural-language query interface, KPI tiles, retention risk charts, prompt history with re-run and audit trail
53+
54+
---
55+
56+
### Slide: Architecture / Tech Stack
57+
58+
Update the architecture diagram to reflect:
59+
60+
- **Data Layer:** Bishop State PDP cohort + AR files → Python ML pipeline → Postgres (Supabase, hosted, US East)
61+
- **ML Layer:** XGBoost + Random Forest + Logistic Regression, 7 models, scikit-learn
62+
- **Application Layer:** Next.js 16 + React 19 + TypeScript, deployed on Vercel
63+
- **AI Features:** OpenAI-powered NLQ → SQL → Recharts visualizations
64+
- **Audit:** Server-side JSONL query log, prompt history in localStorage
65+
66+
Stack badges: Python · XGBoost · scikit-learn · Next.js · Supabase · Vercel · OpenAI
67+
68+
---
69+
70+
### Slide: The 7 Predictive Models
71+
72+
| Model | Output | Algorithm |
73+
|-------|--------|-----------|
74+
| Retention Prediction | Probability + risk tier | Logistic Regression |
75+
| At-Risk Early Warning | URGENT / HIGH / MODERATE / LOW | Composite rule engine |
76+
| Gateway Math Success | Pass probability | XGBoost |
77+
| Gateway English Success | Pass probability | XGBoost |
78+
| First-Semester GPA Risk | Low GPA probability | XGBoost |
79+
| Time-to-Credential | Predicted years to completion | Random Forest Regressor |
80+
| Credential Type | Associate / Certificate / Bachelor | Random Forest Classifier |
81+
82+
Trained on 4,000 Bishop State students with cross-validation and overfitting checks.
83+
84+
---
85+
86+
### Slide: Readiness Score
87+
88+
**PDP-Aligned Readiness Index**
89+
90+
Formula:
91+
> Readiness = (Academic × 40%) + (Engagement × 30%) + (ML Risk × 30%)
92+
93+
Tiers: 🟢 High ≥ 0.65 · 🟡 Medium 0.40–0.64 · 🔴 Low < 0.40
94+
95+
Current Bishop State distribution:
96+
- High Readiness: 83.9% (3,355 students)
97+
- Medium Readiness: 16.1% (645 students)
98+
99+
Grounded in: PDP momentum metrics, CCRC Multiple Measures research, Bird et al. (2021) transparency in predictive analytics.
100+
101+
Every score is fully traceable — no black box.
102+
103+
---
104+
105+
### Slide: Dashboard Features
106+
107+
**What Advisors & Leadership See**
108+
109+
- **KPI Tiles:** Overall retention rate, at-risk student count, average readiness score
110+
- **Charts:** Retention risk distribution, readiness breakdown, at-risk alert levels
111+
- **NLQ Query Interface:** Type a question in plain English → get a chart + data table
112+
- **Prompt History:** Every query logged with timestamp, re-runnable in one click
113+
- **Methodology Page:** Research citations, scoring formula, worked examples (Maria T. → 0.699 High; Jordan M. → 0.386 Low)
114+
115+
---
116+
117+
### Slide: FERPA & Transparency
118+
119+
**Built for Institutional Trust**
120+
121+
- No PII transmitted to any LLM provider — only aggregate behavioral metrics (GPA group, completion rate, placement level)
122+
- Student GUIDs excluded from stored features
123+
- Every readiness score traceable to its inputs
124+
- Server-side audit log of all NLQ queries (JSONL)
125+
- Methodology page publicly accessible for advisor onboarding
126+
127+
Complies with FERPA §99.31(a)(1) for legitimate educational interest use.
128+
129+
---
130+
131+
### Slide: Results & Impact
132+
133+
**Delivered for Bishop State**
134+
135+
- ✅ 4,000 students scored across 7 prediction dimensions
136+
- ✅ Live dashboard deployed (Vercel + Supabase)
137+
- ✅ NLQ interface with prompt history and audit trail
138+
- ✅ PDP-aligned readiness engine with research citations
139+
- ✅ Methodology page with worked examples for advisor transparency
140+
- ✅ Deploy script for ongoing data refresh
141+
142+
---
143+
144+
### Slide: Next Steps / Roadmap
145+
146+
- **CI/CD:** GitHub Actions for automated Vercel deploy on `main` push
147+
- **Multi-institution:** Onboard University of Akron and KCTCS using the same PDP-aligned pipeline
148+
- **Role-based access:** Advisor vs. Leadership vs. IR views
149+
- **Scheduled refresh:** Quarterly PDP pipeline re-runs
150+
- **Enhanced NLQ:** Demographic equity gap queries, cohort comparison
151+
152+
---
153+
154+
### Design Notes for Gamma
155+
156+
- Keep the existing color scheme and layout style
157+
- Use data callout cards for the key numbers (4,000 students, 7 models, 83.9% High Readiness)
158+
- The architecture slide should use a left-to-right flow diagram: Data → ML Pipeline → Supabase → Next.js/Vercel → User
159+
- The readiness score slide should visually show the three weighted components adding up to the final score
160+
- Add Bishop State's colors (navy and gold) as accent colors where appropriate

0 commit comments

Comments
 (0)