|
1 | 1 |
|
2 | | -KCTCS ML PIPELINE - SUMMARY REPORT |
| 2 | +BISHOP STATE ML PIPELINE - SUMMARY REPORT |
3 | 3 | ================================================================================ |
4 | | -Generated: 2025-10-28 17:29:21 |
| 4 | +Generated: 2026-02-21 12:59:23 |
5 | 5 |
|
6 | 6 | DATASET OVERVIEW |
7 | 7 | -------------------------------------------------------------------------------- |
8 | | -Total Students: 32,800 |
9 | | -Total Course Records: 145,918 |
| 8 | +Total Students: 4,000 |
| 9 | +Total Course Records: 4,000 |
10 | 10 |
|
11 | 11 | MODEL PERFORMANCE SUMMARY |
12 | 12 | -------------------------------------------------------------------------------- |
13 | 13 |
|
14 | 14 | 1. RETENTION PREDICTION MODEL |
15 | 15 | Algorithm: XGBoost Classifier |
16 | | - Features Used: 31 |
| 16 | + Features Used: 23 |
17 | 17 | Test Set Performance: |
18 | | - - Accuracy: 0.5224 |
19 | | - - AUC-ROC: 0.5355 |
| 18 | + - Accuracy: 0.7238 |
| 19 | + - AUC-ROC: 0.6134 |
20 | 20 |
|
21 | 21 | Risk Distribution: |
22 | | - Critical Risk 242 ( 0.7%) |
23 | | - High Risk 15,755 ( 48.0%) |
24 | | - Moderate Risk 15,202 ( 46.3%) |
25 | | - Low Risk 1,601 ( 4.9%) |
| 22 | + Critical Risk 0 ( 0.0%) |
| 23 | + High Risk 82 ( 2.1%) |
| 24 | + Moderate Risk 2,195 ( 54.9%) |
| 25 | + Low Risk 1,723 ( 43.1%) |
26 | 26 |
|
27 | 27 | 2. EARLY WARNING SYSTEM |
28 | 28 | Algorithm: Composite Risk Score (Retention + Performance Metrics) |
29 | 29 | Approach: Aligned with retention predictions to eliminate contradictions |
30 | 30 | Alert Distribution: |
31 | | - URGENT 487 ( 1.5%) |
32 | | - HIGH 8,344 ( 25.4%) |
33 | | - MODERATE 19,823 ( 60.4%) |
34 | | - LOW 4,146 ( 12.6%) |
| 31 | + URGENT 0 ( 0.0%) |
| 32 | + HIGH 21 ( 0.5%) |
| 33 | + MODERATE 2,210 ( 55.2%) |
| 34 | + LOW 1,769 ( 44.2%) |
35 | 35 |
|
36 | 36 | 3. TIME TO CREDENTIAL PREDICTION |
37 | 37 | Algorithm: XGBoost Regressor |
38 | | - Mean Predicted Time: 4.29 years |
39 | | - Median Predicted Time: 4.39 years |
| 38 | + Mean Predicted Time: 2.97 years |
| 39 | + Median Predicted Time: 2.96 years |
40 | 40 |
|
41 | 41 | 4. CREDENTIAL TYPE PREDICTION |
42 | 42 | Algorithm: Random Forest Classifier |
43 | 43 | Predicted Distribution: |
44 | | - No Credential 32,735 ( 99.8%) |
45 | | - Associate 59 ( 0.2%) |
46 | | - Bachelor 6 ( 0.0%) |
| 44 | + No Credential 4,000 (100.0%) |
47 | 45 |
|
48 | | -5. COURSE SUCCESS (GPA) PREDICTION |
49 | | - Algorithm: Random Forest Regressor |
50 | | - Mean Predicted GPA: 2.06 |
| 46 | +5. GATEWAY MATH SUCCESS PREDICTION (NEW!) |
| 47 | + Algorithm: XGBoost Classifier |
| 48 | + Students with Gateway Math Data: 4,000 |
| 49 | + Average Pass Probability: 0.0% |
51 | 50 |
|
52 | | - Performance vs. Expected: |
53 | | - As Expected 32,800 (100.0%) |
| 51 | + Gateway Math Risk Distribution: |
| 52 | + High Risk 4,000 (100.0%) |
| 53 | + |
| 54 | +6. GATEWAY ENGLISH SUCCESS PREDICTION (NEW!) |
| 55 | + Algorithm: XGBoost Classifier |
| 56 | + Students with Gateway English Data: 4,000 |
| 57 | + Average Pass Probability: 0.0% |
| 58 | + |
| 59 | + Gateway English Risk Distribution: |
| 60 | + High Risk 4,000 (100.0%) |
| 61 | + |
| 62 | +7. FIRST-SEMESTER LOW GPA (<2.0) PREDICTION (NEW!) |
| 63 | + Algorithm: XGBoost Classifier |
| 64 | + Average Low GPA Probability: 13.1% |
| 65 | + Students Predicted Low GPA: 231 |
| 66 | + |
| 67 | + Academic Risk Level Distribution: |
| 68 | + Low Risk 3,078 ( 77.0%) |
| 69 | + Moderate Risk 597 ( 14.9%) |
| 70 | + High Risk 258 ( 6.5%) |
| 71 | + Critical Risk 67 ( 1.7%) |
54 | 72 |
|
55 | 73 | OUTPUT: DATABASE TABLES |
56 | 74 | -------------------------------------------------------------------------------- |
57 | 75 | 1. student_predictions (Table) |
58 | 76 | - Student-level data with all predictions |
59 | | - - 32,800 students |
60 | | - - 156 columns |
| 77 | + - 4,000 students |
| 78 | + - 164 columns |
61 | 79 |
|
62 | 80 | 2. course_predictions (Table) |
63 | 81 | - Course-level data with predictions |
64 | | - - 145,918 records |
65 | | - - 151 columns |
| 82 | + - 4,000 records |
| 83 | + - 159 columns |
66 | 84 |
|
67 | 85 | 3. ml_model_performance (Table) |
68 | 86 | - Model performance metrics |
@@ -90,9 +108,20 @@ Credential Type: |
90 | 108 | - predicted_credential_label (text label) |
91 | 109 | - prob_no_credential, prob_certificate, prob_associate, prob_bachelor |
92 | 110 |
|
93 | | -Course Success: |
94 | | - - predicted_gpa (0-4 scale) |
95 | | - - gpa_performance (Above/Below/As Expected) |
| 111 | +Gateway Math Success: |
| 112 | + - gateway_math_probability (0-1 scale) |
| 113 | + - gateway_math_prediction (0=Won't Pass, 1=Will Pass) |
| 114 | + - gateway_math_risk (High Risk/Moderate Risk/Likely Pass/Very Likely Pass) |
| 115 | + |
| 116 | +Gateway English Success: |
| 117 | + - gateway_english_probability (0-1 scale) |
| 118 | + - gateway_english_prediction (0=Won't Pass, 1=Will Pass) |
| 119 | + - gateway_english_risk (High Risk/Moderate Risk/Likely Pass/Very Likely Pass) |
| 120 | + |
| 121 | +First-Semester GPA < 2.0 Risk: |
| 122 | + - low_gpa_probability (0-1 scale) |
| 123 | + - low_gpa_prediction (0=Adequate GPA, 1=Low GPA) |
| 124 | + - academic_risk_level (Low Risk/Moderate Risk/High Risk/Critical Risk) |
96 | 125 |
|
97 | 126 | ================================================================================ |
98 | 127 | PIPELINE COMPLETE! |
|
0 commit comments