The Model Card Template That Passes FDA Pre-Cert Review
FDA's Software Pre-Certification program requires AI transparency. Here's the model card template that gets medical device AI approved faster.
The FDA Submission That Got Rejected
Startup: "We're submitting our AI diagnostic tool for FDA Pre-Cert."
FDA Reviewer: "Provide documentation: training data, model architecture, evaluation metrics, clinical validation."
Startup: "We have a white paper..."
FDA: "We need structured documentation. Model card, data card, and clinical evaluation report. Resubmit in 6 months."
The Delay: 6 months of scrambling to create documentation that should've existed from day one.
What FDA Pre-Cert Requires (The Checklist)
Three Documents:
- Model Card: What the AI does, how it was trained, limitations
- Data Card: Where training data came from, bias testing, quality control
- Clinical Evaluation Report: Real-world validation, safety monitoring
Timeline:
- Without documentation: 12-18 months to approval
- With documentation: 6-9 months
Cost Savings: 6 months of eng time + faster time to market
The FDA-Ready Model Card Template
Section 1: Intended Use
What FDA Wants:
- Medical condition/disease targeted
- Patient population (age, sex, comorbidities)
- Clinical setting (hospital, clinic, home use)
- User (physician, nurse, patient)
Example:
INTENDED USE
Medical Condition: Type 2 Diabetes screening
Patient Population: Adults 18-75, no prior diabetes diagnosis
Clinical Setting: Primary care clinic
Primary User: Primary care physician
Decision Support: AI flags high-risk patients for lab testing (HbA1c)
What NOT to Say: "General health screening" (too vague—FDA will reject)
Section 2: Model Architecture
What FDA Wants:
- Algorithm type (e.g., "Gradient boosting classifier")
- Input features (e.g., "Age, BMI, blood pressure, family history")
- Output (e.g., "Risk score 0-100, with threshold at 70 for high-risk")
Example:
MODEL ARCHITECTURE
Algorithm: XGBoost (gradient boosting decision trees)
Version: XGBoost 1.7.0
Inputs: 12 clinical features (age, BMI, systolic BP, fasting glucose, etc.)
Output: Diabetes risk score (0-100)
Threshold: Score ≥70 = High Risk (recommend HbA1c lab test)
Why This Matters: FDA needs to understand how the AI makes decisions (interpretability requirement).
Section 3: Training Data
What FDA Wants:
- Source (where data came from)
- Volume (how many patients)
- Demographics (age, sex, race, ethnicity)
- Date range (when data was collected)
- Quality control (how you ensured data accuracy)
Example:
TRAINING DATA
Source: Electronic Health Records from [Hospital System], IRB-approved (Protocol #12345)
Volume: 50,000 patients (2018-2023)
Demographics:
- Age: Mean 52 (range 18-75), SD 14
- Sex: 52% female, 48% male
- Race: 60% White, 20% Black, 12% Hispanic, 8% Asian
- Ethnicity: 85% Non-Hispanic, 15% Hispanic
Data Quality:
- Missing data: <5% per feature (imputed using median)
- Outliers: Values >99th percentile reviewed by clinician, corrected or removed
De-Identification: HIPAA-compliant (dates shifted, names removed, rare diagnoses aggregated)
Red Flag: If demographics don't match US population, FDA will ask about bias.
Section 4: Evaluation Metrics
What FDA Wants:
- Accuracy, sensitivity, specificity (clinical gold standards)
- Performance by demographic subgroup (fairness testing)
- Comparison to human clinicians (is AI better?)
- Clinical impact (does AI improve patient outcomes?)
Example:
EVALUATION METRICS
Test Set: 10,000 patients (held out, not used in training)
Overall Performance:
- Sensitivity (Recall): 87% (95% CI: 85-89%)
- Specificity: 82% (95% CI: 80-84%)
- AUC: 0.91
Subgroup Performance (Fairness Testing):
- Female: Sensitivity 88%, Specificity 83%
- Male: Sensitivity 86%, Specificity 81%
- White: Sensitivity 89%, Specificity 84%
- Black: Sensitivity 84%, Specificity 79% (within 5pp, acceptable)
Comparison to Physician:
- Physician sensitivity: 78% (AI +9pp improvement)
- Physician specificity: 85% (AI -3pp, acceptable trade-off)
Clinical Impact:
- Early detection: AI flags 12% more high-risk patients than physician alone
- Estimated prevented complications: 200 cases/year per 10,000 patients screened
Why This Matters: FDA cares about patient outcomes, not just model accuracy.
Section 5: Limitations and Warnings
What FDA Wants:
- Known failure modes (when AI is unreliable)
- Contraindications (when NOT to use AI)
- Required human oversight (physician must review)
Example:
LIMITATIONS
Known Failure Modes:
- Lower accuracy for patients with rare comorbidities (<1% of population)
- Not validated for patients under 18 or over 75
- Not validated for Type 1 Diabetes (only Type 2)
Contraindications:
- Do NOT use for patients with pre-existing diabetes diagnosis
- Do NOT use as sole diagnostic tool (lab confirmation required)
Required Human Oversight:
- Physician must review all high-risk flags before ordering lab tests
- AI is decision support, not autonomous diagnosis
- Physician retains final clinical decision authority
Why This Matters: FDA wants proof you're not overselling the AI's capabilities.
Section 6: Post-Market Surveillance
What FDA Wants:
- How you'll monitor AI performance in production
- What triggers a safety alert (accuracy drop, adverse events)
- How often you'll retrain/update the model
Example:
POST-MARKET SURVEILLANCE
Monitoring Plan:
- Monthly accuracy tracking on production data (random sample of 500 patients)
- Alert trigger: Sensitivity drops below 80% OR specificity drops below 75%
- Physician feedback: Track overrides, false positives, false negatives
Safety Reporting:
- Adverse events (patient harm) reported to FDA within 30 days
- Quarterly summary report to FDA (performance metrics, user feedback)
Model Updates:
- Annual retraining with new data (subject to FDA review)
- Version control: All model versions documented, old versions archived
Why This Matters: FDA Pre-Cert assumes continuous improvement (not "set it and forget it").
Real Example: Diabetic Retinopathy Detection AI
Product: AI analyzes retinal images, flags diabetic retinopathy.
FDA Submission:
Intended Use: Screen diabetic patients for retinopathy in primary care settings (not ophthalmology clinics).
Model: Convolutional neural network (ResNet-50 architecture)
Training Data: 120,000 retinal images from 5 hospital systems (2015-2020)
Evaluation:
- Sensitivity: 92% (FDA target: >85%)
- Specificity: 88%
- Comparison: Ophthalmologist sensitivity 95% (AI -3pp, acceptable for screening)
Limitations:
- Not for patients with cataracts (image quality too poor)
- Requires human ophthalmologist to confirm positive findings
Post-Market:
- Monthly monitoring: Random sample of 1,000 images re-reviewed by ophthalmologist
- Alert: If AI sensitivity drops below 88%, auto-disable pending investigation
FDA Decision: Approved (6 months from submission to clearance).
Why It Worked: Documentation was complete upfront. No back-and-forth with FDA.
The Data Card (Companion to Model Card)
What FDA Wants (separate document):
- Data provenance: IRB approval, patient consent, HIPAA compliance
- Bias testing: Performance by race, sex, age, socioeconomic status
- Data retention: How long you keep training data, why
- Data security: Encryption, access controls, audit logs
Example Snippet:
DATA CARD
Provenance:
- Source: [Hospital System] EHR database
- IRB: Approved under Protocol #12345, waiver of consent (de-identified data)
- HIPAA: Compliant (Business Associate Agreement signed)
Bias Testing:
- Racial parity: Sensitivity within 5pp across racial groups
- Gender parity: Sensitivity within 3pp (female 88%, male 86%)
- Age: Lower sensitivity for patients >70 (79% vs. 87% for 40-60 age group)
→ Mitigation: Added warning for physicians treating elderly patients
Data Retention:
- Training data: Retained for 10 years (FDA device record requirement)
- Production data: De-identified logs retained for 3 years (monitoring)
Data Security:
- Encryption: AES-256 at rest, TLS 1.3 in transit
- Access: Role-based (PM, ML engineer, clinical validator—7 people total)
- Audit logs: Reviewed quarterly by compliance team
Checklist: Is Your Model Card FDA-Ready?
- Intended use (specific medical condition, patient population, clinical setting)
- Model architecture (algorithm, inputs, outputs, threshold)
- Training data (source, volume, demographics, quality control)
- Evaluation metrics (sensitivity, specificity, AUC, subgroup performance)
- Comparison to human clinician (is AI better/worse?)
- Clinical impact (does AI improve patient outcomes?)
- Limitations (failure modes, contraindications, required oversight)
- Post-market surveillance (monitoring plan, safety reporting, update schedule)
If any box is unchecked, FDA will request more documentation.
Common PM Mistakes
Mistake 1: Claiming "General Purpose" AI
- Reality: FDA requires narrow, well-defined medical use cases
- Fix: Specify exact condition, population, setting (not "health screening")
Mistake 2: No Bias Testing
- Reality: FDA will reject if you haven't tested performance across demographics
- Fix: Report sensitivity/specificity by race, sex, age (minimum)
Mistake 3: No Post-Market Plan
- Reality: FDA Pre-Cert assumes you'll monitor and update the AI
- Fix: Document monitoring frequency, alert triggers, update process
Alex Welcing is a Senior AI Product Manager in New York who writes FDA-ready model cards before submitting medical device AI. His regulatory approvals take 6 months, not 18, because documentation is a product requirement from day one.
Related Research
The AI PM's September Checklist: Audit Season Prep for Q4 Compliance
Q4 brings SOC2 audits, HIPAA reviews, and year-end compliance checks. Here's the 30-day checklist to get your AI features audit-ready before November.
The NIH BRAIN Initiative Data Standard: What It Means for Neuroscience AI
Building AI for neuroscience research? NIH BRAIN Initiative requires BIDS data format, NWB metadata, and DANDI Archive deposits. Here's the compliance playbook.
NIH Data Management Policy for AI PMs: What It Means If You Use Health Data
NIH's 2023 Data Management and Sharing Policy now applies to AI research using federally-funded health datasets. Here's the compliance playbook for product teams.