Data Source Research — API Investigation Results
Date: 2026-03-07
1. FMP (Financial Modeling Prep) — Premium Plan
Auth: ?apikey=KEY or header apikey: KEY
Rate Limit: 750 req/min (Premium)
Base URL: https://financialmodelingprep.com/api/v3/ (some v4)
Relevant Endpoints
| Need | Endpoint | Key Fields |
|---|---|---|
| Biotech Universe | GET /stock-screener?sector=Healthcare&industry=Biotechnology&exchange=NASDAQ,NYSE | symbol, companyName, marketCap, sector, industry, country |
| Company Profile | GET /profile/{symbol} | ipoDate, sector, industry, mktCap, price, isActivelyTrading |
| Daily OHLCV | GET /historical-price-full/{symbol}?from=YYYY-MM-DD&to=YYYY-MM-DD | date, open, high, low, close, adjClose, volume |
| Balance Sheet (Q) | GET /balance-sheet-statement/{symbol}?period=quarter | cashAndCashEquivalents, shortTermInvestments, longTermInvestments, totalDebt, totalCurrentLiabilities |
| Income Statement (Q) | GET /income-statement/{symbol}?period=quarter | operatingIncome (≈ EBIT), netIncome, revenue |
| Cash Flow (Q) | GET /cash-flow-statement/{symbol}?period=quarter | operatingCashFlow, capitalExpenditure |
| Key Metrics | GET /key-metrics/{symbol}?period=quarter | cashPerShare, debtToEquity, currentRatio |
| SEC Filings | GET /sec_filings/{symbol}?type=S-1 | type, fillingDate, link (S-1, 424B for offerings) |
| Stock Peers | GET /stock_peers?symbol={symbol} | peersList |
| Delisted | GET /delisted-companies | symbol, delistedDate, exchange |
| Symbol Changes | GET /symbol_change | old/new symbol tracking |
| Share Float | GET /shares_float?symbol={symbol} | freeFloat, floatShares, outstandingShares |
| Price Change | GET /stock-price-change/{symbol} | 1D, 5D, 1M, 3M, 6M, ytd, 1Y, 3Y, 5Y, 10Y, max |
FMP Coverage Summary
- ✅ Universe screening (sector/industry filter)
- ✅ IPO date (via profile)
- ✅ Daily OHLCV (full history)
- ✅ Balance sheet / Income / Cash flow (quarterly)
- ✅ SEC filings list (S-1, 424B for dilution tracking)
- ✅ Share float / outstanding shares
- ✅ Delisted companies tracking
- ❌ Clinical trial pipeline data (NOT available)
- ❌ FDA PDUFA calendar (NOT available)
Rate Limit Strategy
- Premium: 750 req/min
- Batch where possible (some endpoints support comma-separated symbols)
- Historical backfill: throttle to ~500 req/min to be safe
- Daily updates: well within limits (~100-200 tickers × 3-4 endpoints = ~600 req)
2. ClinicalTrials.gov API v2 (FREE, no key)
Base URL: https://clinicaltrials.gov/api/v2/
Auth: None required
Rate Limit: Unspecified, be polite (1-2 req/sec)
Tested Endpoints
Sponsor Search
GET /studies?query.spons=Neumora+Therapeutics&pageSize=3
Actual Response Structure (verified):
{
"studies": [{
"protocolSection": {
"identificationModule": {
"nctId": "NCT06029426",
"briefTitle": "Study to Evaluate...",
"officialTitle": "A Phase 3...",
"organization": { "fullName": "Neumora Therapeutics, Inc." }
},
"statusModule": {
"overallStatus": "COMPLETED", // RECRUITING, ACTIVE_NOT_RECRUITING, COMPLETED, TERMINATED, etc.
"startDateStruct": { "date": "2023-09-20" },
"primaryCompletionDateStruct": { "date": "2024-12-03" },
"completionDateStruct": { "date": "2025-01-15" }
},
"conditionsModule": {
"conditions": ["Major Depressive Disorder"]
},
"designModule": {
"phases": ["PHASE3"],
"designInfo": { "allocation": "RANDOMIZED" },
"enrollmentInfo": { "count": 383, "type": "ACTUAL" }
},
"armsInterventionsModule": {
"interventions": [{
"type": "DRUG",
"name": "NMRA-335140",
"otherNames": ["Navacaprant"]
}]
}
}
}]
}Key Fields for Our Pipeline
| Our Need | CT.gov Field |
|---|---|
| Trial ID | identificationModule.nctId |
| Title | identificationModule.briefTitle |
| Sponsor | sponsorCollaboratorsModule.leadSponsor.name |
| Phase | designModule.phases[] → PHASE1, PHASE2, PHASE3, PHASE4 |
| Status | statusModule.overallStatus |
| Indication | conditionsModule.conditions[] |
| Drug Name | armsInterventionsModule.interventions[].name |
| Start Date | statusModule.startDateStruct.date |
| Est. Completion | statusModule.completionDateStruct.date |
| Enrollment | designModule.enrollmentInfo.count |
| FDA Regulated | oversightModule.isFdaRegulatedDrug |
Query Patterns
# By sponsor name
/studies?query.spons=Company+Name&pageSize=50
# Filter by phase
/studies?query.spons=Company&filter.phase=PHASE2,PHASE3
# Filter by status
/studies?query.spons=Company&filter.overallStatus=RECRUITING,ACTIVE_NOT_RECRUITING
# Combine
/studies?query.spons=Company&filter.phase=PHASE2,PHASE3&filter.overallStatus=RECRUITING
Challenge: Sponsor Name Matching
- CT.gov uses free-text sponsor names (e.g., “Neumora Therapeutics, Inc.“)
- Need fuzzy matching from FMP company names → CT.gov sponsor names
- Recommendation: build a mapping table, manually verify top candidates
3. SEC EDGAR (FREE, no key)
Base URL: Various Auth: None, but requires User-Agent header with email Rate Limit: 10 req/sec
XBRL Company Facts API
GET https://data.sec.gov/api/xbrl/companyfacts/CIK{10-digit-CIK}.json
Response: Full financial fact taxonomy with all historical filings
- us-gaap fields include: Assets, Cash, Debt, Revenue, etc.
- Can extract quarterly financials directly
Full-Text Search
GET https://efts.sec.gov/LATEST/search-index?q=offering&forms=S-1&entity=Company+Name
- Search for specific filing types (S-1, 424B = equity offerings)
- Can track dilution events
Assessment
- ✅ Good backup for financial data
- ✅ Can track equity offerings (S-1, 424B filings)
- ⚠️ More complex to parse than FMP
- Recommendation: Use FMP for financials (cleaner), EDGAR for offering/dilution tracking as supplement
4. FDA / PDUFA Calendar
Sources Investigated
openFDA API
https://api.fda.gov/drug/drugsfda.json— approved drugs database- Only contains APPROVED drugs, NOT upcoming PDUFA dates
- ❌ Not useful for our catalyst calendar
FDA Website
- PDUFA dates published at fda.gov but no structured API
- Requires scraping or manual tracking
Best Options for PDUFA Dates
- BioPharmCatalyst.com — best free source, but requires scraping
- FMP SEC Filings — companies often disclose PDUFA dates in 10-K/10-Q filings
- Manual curation — for our filtered universe (~100-200 stocks), manageable
- News scraping — PR Newswire / GlobeNewsWire for FDA announcements
Recommendation
- Start with manual PDUFA tracking for filtered universe
- Build a simple
pdufa_calendartable, update weekly - Later: consider scraping BioPharmCatalyst or building news ingestion
5. Data Source Summary
| Data Category | Primary Source | Backup Source | API Key Needed |
|---|---|---|---|
| Biotech Universe | FMP Screener | — | Yes (FMP) |
| Company Profile + IPO Date | FMP Profile | — | Yes (FMP) |
| Daily OHLCV | FMP Historical | — | Yes (FMP) |
| Balance Sheet / Financials | FMP Statements | SEC EDGAR XBRL | Yes (FMP) |
| SEC Filings (dilution) | FMP sec_filings | EDGAR EFTS | Yes (FMP) |
| Share Float | FMP shares_float | — | Yes (FMP) |
| Clinical Trials | ClinicalTrials.gov | — | No |
| FDA PDUFA Calendar | Manual + News | BioPharmCatalyst | No |
6. Gaps & Risks
- Sponsor name matching — FMP company name ≠ CT.gov sponsor name. Need mapping.
- PDUFA dates — No clean API. Manual effort required initially.
- Dilution events — FMP has SEC filings list, but parsing offering size requires reading filings.
- Historical clinical trial data — CT.gov has current state, limited historical snapshots.
- FMP downtime — Single source dependency for most financial data. EDGAR as fallback.
7. Next Steps
- Get FMP API key from Dan
- Test FMP endpoints with real calls (verify field names, response structure)
- Design DuckDB schema based on confirmed field structures
- Build collectors: FMP → universe/prices/financials, CT.gov → pipeline
- Build validation layer
- Set up daily cron jobs