FDA AI Search: Making FDA-Authorized AI Devices Searchable

FDA AI Search: Making FDA-Authorized AI Devices Searchable
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Over 1,200 AI-enabled medical devices have received marketing authorization from the U.S. FDA, yet identifying devices suited to specific clinical needs remains challenging because the FDA’s databases contain only limited metadata and non-searchable summary PDFs. To address this gap, we developed FDA AI Search, a website that enables semantic querying of FDA-authorized AI-enabled devices. The backend includes an embedding-based retrieval system, where LLM-extracted features from authorization summaries are compared to user queries to find relevant matches. We present quantitative and qualitative evaluation that support the effectiveness of the retrieval algorithm compared to keyword-based methods. As FDA-authorized AI devices become increasingly prevalent and their use cases expand, we envision that the tool will assist healthcare providers in identifying devices aligned with their clinical needs and support developers in formulating novel AI applications.


💡 Research Summary

The paper introduces “FDA AI Search,” a web‑based platform that enables clinicians and developers to semantically query the over 1,200 AI‑enabled medical devices that have received marketing authorization from the U.S. Food and Drug Administration (FDA). The authors identify a critical gap: FDA’s public databases provide only sparse tabular metadata (e.g., company name, review panel) and non‑searchable PDF summaries, making it difficult to locate devices that match specific clinical needs. To fill this gap, they build a two‑stage backend pipeline.

In the first stage, the team scrapes the FDA AI device list (as of July 10 2025) and downloads the associated authorization summary PDFs from the 510(k), De Novo, or PMA pathways. Each PDF is processed with the large language model Gemini‑2.5‑flash, which extracts five primary textual features: a concise summary, ten salient keywords, five relevant clinician questions, a two‑sentence thesis (purpose, methodology, science), and five key concepts. Additional derived features include a “search‑boost” term (company + device name + keywords) and three “query‑match” sentences that a clinician might use to find the device. After redundancy analysis, the final set of seven features (keywords, relevant questions, thesis, search‑boost, and three query‑match strings) is retained for embedding.

The second stage creates embeddings for each of these features using MedEmbed‑small‑v0.1, a medical‑domain transformer that outputs 384‑dimensional vectors. User queries are embedded with the same model. For a given query q and device d, the system computes a weighted sum of cosine similarities between the query embedding and each of the seven feature embeddings, using feature‑specific weights w_i. This semantic score is blended with a traditional BM25 lexical score (computed over a concatenation of the same textual fields) via a mixing parameter λ, yielding the final relevance score:

Score(q,d) = λ · ∑{i=1}^{7} w_i · cos(e_q, e{d,i}) + (1 − λ) · BM25(q,d).

Hyperparameters w_i and λ are tuned on a simulated validation set of 50 devices. For each device, a synthetic query is generated by the Gemini‑3 model (gemma3n:e2b) based on the device’s thesis and keywords. The optimization uses Optuna’s Bayesian TPE sampler to maximize mean Hit@5 across 50 trials, with 20 % of device‑query pairs randomly replaced each trial to avoid overfitting. After 1,830 trials, the best weight configuration is fixed; λ is then selected via a grid search over 21 values.

Performance is evaluated in two ways. First, a quantitative benchmark uses the curated FDA AI CAD dataset from McNamara et al. (2024), containing 140 FDA‑cleared imaging AI devices. For each device, two query types are constructed: (a) disease‑only (e.g., “lung cancer”) and (b) disease + modality (e.g., “lung cancer CT”). The ranking of the ground‑truth device is recorded for three retrieval strategies: embedding‑only, BM25‑only, and the hybrid. Results show that the hybrid method achieves the lowest average rank (1.37 for disease‑only, 1.80 for disease + modality) and the highest hit rates (Hit@1 = 0.829, Hit@3 = 0.951, Hit@5 = 0.976). Embedding‑only performs slightly better than BM25 alone, confirming that semantic similarity adds value beyond lexical matching.

Second, inference speed is measured on the 22,552 synthetic queries generated during hyperparameter tuning. The hybrid system processes a query in an average of 0.38 seconds (SD = 0.11 s), demonstrating suitability for interactive use.

Qualitative examples illustrate practical benefits: a keyword search for “genitourinary” returns no results, whereas the semantic search surfaces relevant devices, highlighting the system’s ability to capture synonyms and contextual relevance. The user interface, built with React/Next.js and deployed on Vercel, offers both the semantic search and a fallback keyword lookup that scans concatenated fields (submission number, device name, thesis, keywords, concepts).

The authors acknowledge limitations. LLM‑driven feature extraction can introduce hallucinations or bias, and the lack of a fully curated ground‑truth dataset hampers exhaustive evaluation. The tool is positioned as an assistive resource; errors are possible, and rigorous user studies are needed to assess clinical impact. Future work includes continuous updating of the FDA device list, iterative refinement of prompts and embedding models, and systematic collection of user feedback to improve relevance and trustworthiness.

In conclusion, “FDA AI Search” demonstrates that a hybrid semantic‑lexical retrieval pipeline, powered by domain‑specific embeddings and LLM‑generated features, can substantially improve discoverability of FDA‑authorized AI medical devices. By making these devices searchable in a clinically meaningful way, the platform promises to accelerate adoption of AI tools in healthcare and to aid developers in identifying unmet clinical niches for new AI solutions.


Comments & Academic Discussion

Loading comments...

Leave a Comment