
์๋ฃ ํ์ฅ ๋ํ์ธ์ด๋ชจ๋ธ ํ๊ฐ๋ฅผ ์ํ MediEval๊ณผ ์์ ํ์ธํ๋
Large Language Models (LLMs) are increasingly applied to medicine, yet their adoption is limited by concerns over reliability and safety. Existing evaluations either test factual medical knowledge in isolation or assess patientlevel reasoning without















































