Darth Vecdor: An Open-Source System for Generating Knowledge Graphs Through Large Language Model Queries

Reading time: 6 minute
...

📝 Original Info

  • Title: Darth Vecdor: An Open-Source System for Generating Knowledge Graphs Through Large Language Model Queries
  • ArXiv ID: 2512.15906
  • Date: 2025-12-17
  • Authors: ** Jonathan A. Handler, MD 1) Keylog Solutions LLC, Northbrook, IL, USA (jhandler@gmail.com) 2) Clinical Intelligence and Advanced Data Lab, OSF HealthCare, Peoria, IL, USA (jonathan.a.handler@osfhealthcare.org) — **

📝 Abstract

Many large language models (LLMs) are trained on a massive body of knowledge present on the Internet. Darth Vecdor (DV) was designed to extract this knowledge into a structured, terminology-mapped, SQL database ("knowledge base" or "knowledge graph"). Knowledge graphs may be useful in many domains, including healthcare. Although one might query an LLM directly rather than a SQL-based knowledge graph, concerns such as cost, speed, safety, and confidence may arise, especially in high-volume operations. These may be mitigated when the information is pre-extracted from the LLM and becomes query-able through a standard database. However, the author found the need to address several issues. These included erroneous, off-topic, free-text, overly general, and inconsistent LLM responses, as well as allowing for multi-element responses. DV was built with features intended to mitigate these issues. To facilitate ease of use, and to allow for prompt engineering by those with domain expertise but little technical background, DV provides a simple, browser-based graphical user interface. DV has been released as free, open-source, extensible software, on an "as is" basis, without warranties or conditions of any kind, either express or implied. Users need to be cognizant of the potential risks and benefits of using DV and its outputs, and users are responsible for ensuring any use is safe and effective. DV should be assumed to have bugs, potentially very serious ones. However, the author hopes that appropriate use of current and future versions of DV and its outputs can help improve healthcare.

💡 Deep Analysis

Figure 1

📄 Full Content

DARTH VECDOR: AN OPEN-SOURCE SYSTEM FOR GENERATING KNOWLEDGE GRAPHS THROUGH LARGE LANGUAGE MODEL QUERIES Author: Jonathan A. Handler, MD (1, 2) Author Affiliations: 1) Keylog Solutions LLC, Northbrook, IL, USA. jhandler@gmail.com 2) Clinical Intelligence and Advanced Data Lab, OSF HealthCare, Peoria, IL, USA. jonathan.a.handler@osfhealthcare.org ABSTRACT Many large language models (LLMs) are trained on a massive body of knowledge present on the Internet. Darth Vecdor (DV) was designed to extract this knowledge into a structured, terminology-mapped, SQL database (“knowledge base” or “knowledge graph”). Knowledge graphs may be useful in many domains, including healthcare. Although one might query an LLM directly rather than a SQL-based knowledge graph, concerns such as cost, speed, safety, and confidence may arise, especially in high-volume operations. These may be mitigated when the information is pre-extracted from the LLM and becomes query-able through a standard database. However, the author found the need to address several issues. These included erroneous, off-topic, free-text, overly general, and inconsistent LLM responses, as well as allowing for multi-element responses. DV was built with features intended to mitigate these issues. To facilitate ease of use, and to allow for prompt engineering by those with domain expertise but little technical background, DV provides a simple, browser-based graphical user interface. DV has been released as free, open-source, extensible software, on an "as is" basis, without warranties or conditions of any kind, either express or implied. Users need to be cognizant of the potential risks and benefits of using DV and its outputs, and users are responsible for ensuring any use is safe and effective. DV should be assumed to have bugs, potentially very serious ones. However, the author hopes that appropriate use of current and future versions of DV and its outputs can help improve healthcare. INTRODUCTION Large language models (LLMs) have already had a significant impact in healthcare, and many more uses are reported in development and in planned implementation. Since LLMs are trained on huge volumes of data, LLMs are encoded with a significant swath of the knowledge present on the Internet and possibly other sources. Therefore, the author hypothesized that LLMs can be used to as a source to populate knowledge graphs in a database (or “knowledge base”) for various uses. For example, a knowledge graph of medications used to treat diseases might be used as a part of a research effort in which a database that includes the knowledge graph along with patient data is queried to find which patients have potentially untreated diseases (i.e., no medication has been prescribed that treats that disease). Querying a knowledge graph previously created through LLM queries, rather than just querying an LLM directly as needed, may have several potential advantages: 1. Cheaper a. Lower compute costs: In some cases, the cost of computation to query a knowledge graph may be dramatically lower than querying an LLM. b. Lower hardware costs and complexity: If the LLM that would have been used would be run on institutionally controlled servers, the costs, complexity, and management burden of the hardware stack required to achieve rapid responses may be prohibitive for many. The hardware costs and complexity needed for querying a knowledge graph in a database rather may be much lower in many cases than those needed to support many LLMs. 2. Faster a. Faster query speed: In many cases, querying a knowledge graph (e.g., via a vector database) may be orders of magnitude faster than querying an LLM. b. Facilitation of development and implementation: The people, processes, and technologies for building and implementing systems built on a knowledge graph (perhaps especially if implemented through a SQL database) may be more well-developed and readily available than systems using LLMs. 3. Safer a. Reduction of privacy and confidentiality risks: If the LLM that would have been used is a third party’s commercial service, querying a knowledge graph running on institutionally controlled servers instead may reduce the risks of submitting potentially sensitive data to a commercial third-party service. b. Reduction of many business risks: If the LLM is controlled by a third party, then using a knowledge graph in operational use rather than directly querying the LLM may reduce many business risks such as the third party deprecating the product, increasing pricing, or modifying functionality. 4. Surer a. More explainable responses: LLMs are often considered “black boxes” since the actual logic for producing a given output commonly cannot be provided in a format meaningful to humans. Although an LLM’s population of the knowledge graph may be considered “black box,” the downstream usage of that knowledge graph can often be more explainable since

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut