Zero-shot large vision-language model prompting for automated bone identification in paleoradiology x-ray archives
Paleoradiology, the use of modern imaging technologies to study archaeological and anthropological remains, offers new windows on millennial scale patterns of human health. Unfortunately, the radiographs collected during field campaigns are heterogeneous: bones are disarticulated, positioning is ad hoc, and laterality markers are often absent. Additionally, factors such as age at death, age of bone, sex, and imaging equipment introduce high variability. Thus, content navigation, such as identifying a subset of images with a specific projection view, can be time consuming and difficult, making efficient triaging a bottleneck for expert analysis. We report a zero shot prompting strategy that leverages a state of the art Large Vision Language Model (LVLM) to automatically identify the main bone, projection view, and laterality in such images. Our pipeline converts raw DICOM files to bone windowed PNGs, submits them to the LVLM with a carefully engineered prompt, and receives structured JSON outputs, which are extracted and formatted onto a spreadsheet in preparation for validation. On a random sample of 100 images reviewed by an expert board certified paleoradiologist, the system achieved 92% main bone accuracy, 80% projection view accuracy, and 100% laterality accuracy, with low or medium confidence flags for ambiguous cases. These results suggest that LVLMs can substantially accelerate code word development for large paleoradiology datasets, allowing for efficient content navigation in future anthropology workflows.
💡 Research Summary
This paper presents a zero‑shot prompting pipeline that leverages a state‑of‑the‑art Large Vision‑Language Model (LVLM) to automatically identify the main bone, projection view, and laterality in heterogeneous paleoradiology X‑ray archives. The authors begin by describing the challenges inherent to archaeological radiographs: bones are often disarticulated, positioning is ad‑hoc, laterality markers are missing, and variability arises from age at death, bone age, sex, and differing imaging equipment. Consequently, essential metadata such as bone type, view (AP, lateral, etc.), and side are not captured in DICOM headers, creating a bottleneck for researchers who must manually curate large collections.
A dataset of 8,423 DICOM files from eleven London excavation sites, provided by the Museum of London, served as the testbed. Pre‑processing was performed with a custom Python script: each DICOM was loaded via pydicom, bone‑windowing was applied using the VOI LUT derived from WindowWidth and WindowCenter, all images were forced to MONOCHROME2, linearly rescaled to the
Comments & Academic Discussion
Loading comments...
Leave a Comment