Evaluation of YTEX and MetaMap for clinical concept recognition

Evaluation of YTEX and MetaMap for clinical concept recognition
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We used MetaMap and YTEX as a basis for the construc- tion of two separate systems to participate in the 2013 ShARe/CLEF eHealth Task 1[9], the recognition of clinical concepts. No modifications were directly made to these systems, but output concepts were filtered using stop concepts, stop concept text and UMLS semantic type. Con- cept boundaries were also adjusted using a small collection of rules to increase precision on the strict task. Overall MetaMap had better per- formance than YTEX on the strict task, primarily due to a 20% perfor- mance improvement in precision. In the relaxed task YTEX had better performance in both precision and recall giving it an overall F-Score 4.6% higher than MetaMap on the test data. Our results also indicated a 1.3% higher accuracy for YTEX in UMLS CUI mapping.


💡 Research Summary

The paper presents a head‑to‑head evaluation of two publicly available clinical concept‑recognition tools, MetaMap and YTEX, in the context of the 2013 ShARe/CLEF eHealth Task 1. The authors built two parallel pipelines (named CORAL.1 and CORAL.2) that simply wrapped the original systems without any algorithmic modifications. After the raw output was generated, a common post‑processing stage was applied: high‑level “stop” concepts (e.g., generic “Disease” or “Injury” CUIs) and concepts containing the strings “mouse” or “mice” were removed; only concepts belonging to a predefined set of UMLS semantic types were retained; and a small rule‑based boundary‑expansion step was added to capture preceding abbreviations or modifiers (e.g., “LA”, “MCA”, “LV”).

MetaMap was invoked via its UIMA annotator, limited to SNOMED CT and RxNorm vocabularies, and run with its built‑in word‑sense disambiguation server (‑y option). YTEX 0.8 was executed as a standalone system using its default configuration: a context window of ten tokens and the “intrinsic” semantic similarity metric. Both systems relied on the 2012 AB release of the UMLS.

Two data sources were used for evaluation. The first consisted of the official ShareCLEF training and test corpora, which contain manually annotated clinical problem mentions and their corresponding UMLS CUIs. The second source was a local collection of Patient Tracking List (PTL) notes from the University of Alabama at Birmingham. PTL documents are highly structured summaries; only the “Problem” section was considered, yielding 68 notes with 603 annotated entities (223 problems). Inter‑annotator agreement on PTL was 0.919 (Cohen’s κ).

Performance was measured on two tasks. Task 1a required detection of concept boundaries; results were reported for a strict mode (exact span match) and a relaxed mode (partial overlap allowed). In the strict mode on the test set, MetaMap achieved precision = 0.79, recall = 0.46, F‑score = 0.60, whereas YTEX obtained precision = 0.58, recall = 0.45, F‑score = 0.51. In the relaxed mode, YTEX outperformed MetaMap with precision = 0.94, recall = 0.60, F‑score = 0.73 versus MetaMap’s precision = 0.91, recall = 0.55, F‑score = 0.69. Task 1b (CUI mapping) showed a modest advantage for YTEX on the ShareCLEF training data (accuracy 0.41 % vs. 0.42 % for MetaMap) and a larger gap on PTL where MetaMap’s precision and recall were both around 80 % compared with YTEX’s 55 % and 68 % respectively.

Error analysis revealed common failure modes for both systems. Neither tool supports discontinuous spans, leading to missed modifiers or preceding abbreviations. MetaMap tended to over‑extend spans by including prepositions, while YTEX often missed compound nouns, reducing its precision. Both systems struggled with abbreviations (e.g., “LA”, “LV”) and polysemous terms; examples included YTEX mis‑classifying “Dr.” as diabetic retinopathy and MetaMap interpreting the verb “call” as “c‑ALL” (a leukemia subtype). The PTL dataset’s strict annotation guideline (only the most specific concept allowed) penalized YTEX, which frequently generated broader, more generic concepts.

The discussion concludes that MetaMap is preferable when precise boundary detection is critical, whereas YTEX offers broader coverage and slightly higher performance on relaxed‑match tasks. The authors note that simple parameter tuning—such as increasing YTEX’s context window, experimenting with alternative similarity metrics, or leveraging MetaMap’s scoring information for better cutoff selection—could further improve results. They also mention that other open‑source options like the NCBO Annotator were considered but discarded due to installation complexity and service‑usage restrictions.

Overall, the study provides a practical comparison for researchers and clinicians seeking “off‑the‑shelf” concept‑mapping solutions: YTEX delivers higher recall and better handling of ambiguous text, while MetaMap delivers higher precision and more accurate span boundaries. Both systems benefit from lightweight post‑processing, and future work should focus on enhancing discontinuous span support, richer abbreviation dictionaries, and systematic hyper‑parameter optimization.


Comments & Academic Discussion

Loading comments...

Leave a Comment