Data Source Research — API Investigation Results

Date: 2026-03-07


1. FMP (Financial Modeling Prep) — Premium Plan

Auth: ?apikey=KEY or header apikey: KEY Rate Limit: 750 req/min (Premium) Base URL: https://financialmodelingprep.com/api/v3/ (some v4)

Relevant Endpoints

NeedEndpointKey Fields
Biotech UniverseGET /stock-screener?sector=Healthcare&industry=Biotechnology&exchange=NASDAQ,NYSEsymbol, companyName, marketCap, sector, industry, country
Company ProfileGET /profile/{symbol}ipoDate, sector, industry, mktCap, price, isActivelyTrading
Daily OHLCVGET /historical-price-full/{symbol}?from=YYYY-MM-DD&to=YYYY-MM-DDdate, open, high, low, close, adjClose, volume
Balance Sheet (Q)GET /balance-sheet-statement/{symbol}?period=quartercashAndCashEquivalents, shortTermInvestments, longTermInvestments, totalDebt, totalCurrentLiabilities
Income Statement (Q)GET /income-statement/{symbol}?period=quarteroperatingIncome (≈ EBIT), netIncome, revenue
Cash Flow (Q)GET /cash-flow-statement/{symbol}?period=quarteroperatingCashFlow, capitalExpenditure
Key MetricsGET /key-metrics/{symbol}?period=quartercashPerShare, debtToEquity, currentRatio
SEC FilingsGET /sec_filings/{symbol}?type=S-1type, fillingDate, link (S-1, 424B for offerings)
Stock PeersGET /stock_peers?symbol={symbol}peersList
DelistedGET /delisted-companiessymbol, delistedDate, exchange
Symbol ChangesGET /symbol_changeold/new symbol tracking
Share FloatGET /shares_float?symbol={symbol}freeFloat, floatShares, outstandingShares
Price ChangeGET /stock-price-change/{symbol}1D, 5D, 1M, 3M, 6M, ytd, 1Y, 3Y, 5Y, 10Y, max

FMP Coverage Summary

  • ✅ Universe screening (sector/industry filter)
  • ✅ IPO date (via profile)
  • ✅ Daily OHLCV (full history)
  • ✅ Balance sheet / Income / Cash flow (quarterly)
  • ✅ SEC filings list (S-1, 424B for dilution tracking)
  • ✅ Share float / outstanding shares
  • ✅ Delisted companies tracking
  • ❌ Clinical trial pipeline data (NOT available)
  • ❌ FDA PDUFA calendar (NOT available)

Rate Limit Strategy

  • Premium: 750 req/min
  • Batch where possible (some endpoints support comma-separated symbols)
  • Historical backfill: throttle to ~500 req/min to be safe
  • Daily updates: well within limits (~100-200 tickers × 3-4 endpoints = ~600 req)

2. ClinicalTrials.gov API v2 (FREE, no key)

Base URL: https://clinicaltrials.gov/api/v2/ Auth: None required Rate Limit: Unspecified, be polite (1-2 req/sec)

Tested Endpoints

GET /studies?query.spons=Neumora+Therapeutics&pageSize=3

Actual Response Structure (verified):

{
  "studies": [{
    "protocolSection": {
      "identificationModule": {
        "nctId": "NCT06029426",
        "briefTitle": "Study to Evaluate...",
        "officialTitle": "A Phase 3...",
        "organization": { "fullName": "Neumora Therapeutics, Inc." }
      },
      "statusModule": {
        "overallStatus": "COMPLETED",  // RECRUITING, ACTIVE_NOT_RECRUITING, COMPLETED, TERMINATED, etc.
        "startDateStruct": { "date": "2023-09-20" },
        "primaryCompletionDateStruct": { "date": "2024-12-03" },
        "completionDateStruct": { "date": "2025-01-15" }
      },
      "conditionsModule": {
        "conditions": ["Major Depressive Disorder"]
      },
      "designModule": {
        "phases": ["PHASE3"],
        "designInfo": { "allocation": "RANDOMIZED" },
        "enrollmentInfo": { "count": 383, "type": "ACTUAL" }
      },
      "armsInterventionsModule": {
        "interventions": [{
          "type": "DRUG",
          "name": "NMRA-335140",
          "otherNames": ["Navacaprant"]
        }]
      }
    }
  }]
}

Key Fields for Our Pipeline

Our NeedCT.gov Field
Trial IDidentificationModule.nctId
TitleidentificationModule.briefTitle
SponsorsponsorCollaboratorsModule.leadSponsor.name
PhasedesignModule.phases[] → PHASE1, PHASE2, PHASE3, PHASE4
StatusstatusModule.overallStatus
IndicationconditionsModule.conditions[]
Drug NamearmsInterventionsModule.interventions[].name
Start DatestatusModule.startDateStruct.date
Est. CompletionstatusModule.completionDateStruct.date
EnrollmentdesignModule.enrollmentInfo.count
FDA RegulatedoversightModule.isFdaRegulatedDrug

Query Patterns

# By sponsor name
/studies?query.spons=Company+Name&pageSize=50

# Filter by phase
/studies?query.spons=Company&filter.phase=PHASE2,PHASE3

# Filter by status
/studies?query.spons=Company&filter.overallStatus=RECRUITING,ACTIVE_NOT_RECRUITING

# Combine
/studies?query.spons=Company&filter.phase=PHASE2,PHASE3&filter.overallStatus=RECRUITING

Challenge: Sponsor Name Matching

  • CT.gov uses free-text sponsor names (e.g., “Neumora Therapeutics, Inc.“)
  • Need fuzzy matching from FMP company names → CT.gov sponsor names
  • Recommendation: build a mapping table, manually verify top candidates

3. SEC EDGAR (FREE, no key)

Base URL: Various Auth: None, but requires User-Agent header with email Rate Limit: 10 req/sec

XBRL Company Facts API

GET https://data.sec.gov/api/xbrl/companyfacts/CIK{10-digit-CIK}.json

Response: Full financial fact taxonomy with all historical filings

  • us-gaap fields include: Assets, Cash, Debt, Revenue, etc.
  • Can extract quarterly financials directly
GET https://efts.sec.gov/LATEST/search-index?q=offering&forms=S-1&entity=Company+Name
  • Search for specific filing types (S-1, 424B = equity offerings)
  • Can track dilution events

Assessment

  • ✅ Good backup for financial data
  • ✅ Can track equity offerings (S-1, 424B filings)
  • ⚠️ More complex to parse than FMP
  • Recommendation: Use FMP for financials (cleaner), EDGAR for offering/dilution tracking as supplement

4. FDA / PDUFA Calendar

Sources Investigated

openFDA API

  • https://api.fda.gov/drug/drugsfda.json — approved drugs database
  • Only contains APPROVED drugs, NOT upcoming PDUFA dates
  • ❌ Not useful for our catalyst calendar

FDA Website

  • PDUFA dates published at fda.gov but no structured API
  • Requires scraping or manual tracking

Best Options for PDUFA Dates

  1. BioPharmCatalyst.com — best free source, but requires scraping
  2. FMP SEC Filings — companies often disclose PDUFA dates in 10-K/10-Q filings
  3. Manual curation — for our filtered universe (~100-200 stocks), manageable
  4. News scraping — PR Newswire / GlobeNewsWire for FDA announcements

Recommendation

  • Start with manual PDUFA tracking for filtered universe
  • Build a simple pdufa_calendar table, update weekly
  • Later: consider scraping BioPharmCatalyst or building news ingestion

5. Data Source Summary

Data CategoryPrimary SourceBackup SourceAPI Key Needed
Biotech UniverseFMP ScreenerYes (FMP)
Company Profile + IPO DateFMP ProfileYes (FMP)
Daily OHLCVFMP HistoricalYes (FMP)
Balance Sheet / FinancialsFMP StatementsSEC EDGAR XBRLYes (FMP)
SEC Filings (dilution)FMP sec_filingsEDGAR EFTSYes (FMP)
Share FloatFMP shares_floatYes (FMP)
Clinical TrialsClinicalTrials.govNo
FDA PDUFA CalendarManual + NewsBioPharmCatalystNo

6. Gaps & Risks

  1. Sponsor name matching — FMP company name ≠ CT.gov sponsor name. Need mapping.
  2. PDUFA dates — No clean API. Manual effort required initially.
  3. Dilution events — FMP has SEC filings list, but parsing offering size requires reading filings.
  4. Historical clinical trial data — CT.gov has current state, limited historical snapshots.
  5. FMP downtime — Single source dependency for most financial data. EDGAR as fallback.

7. Next Steps

  1. Get FMP API key from Dan
  2. Test FMP endpoints with real calls (verify field names, response structure)
  3. Design DuckDB schema based on confirmed field structures
  4. Build collectors: FMP → universe/prices/financials, CT.gov → pipeline
  5. Build validation layer
  6. Set up daily cron jobs