Archetypes for Representing Data about the Brazilian Public Hospital Information System and Outpatient High Complexity Procedures System

The Brazilian Ministry of Health has selected the openEHR model as a standard for electronic health record systems. This paper presents a set of archetypes to represent the main data from the Brazilian Public Hospital Information System and the High Complexity Procedures Module of the Brazilian public Outpatient Health Information System. The archetypes from the public openEHR Clinical Knowledge Manager (CKM), were examined in order to select archetypes that could be used to represent the data of the above mentioned systems. For several concepts, it was necessary to specialize the CKM archetypes, or design new ones. A total of 22 archetypes were used: 8 new, 5 specialized and 9 reused from CKM. This set of archetypes can be used not only for information exchange, but also for generating a big anonymized dataset for testing openEHR-based systems.

💡 Research Summary

The paper addresses the challenge of standardising the massive and heterogeneous data generated by Brazil’s public health information systems, specifically the Hospital Information System (HIS) and the Outpatient High‑Complexity Procedures (OHIP) module. In 2018 the Brazilian Ministry of Health officially selected the openEHR reference model as the national standard for electronic health records (EHR). The authors set out to create a coherent set of openEHR archetypes that can faithfully represent the core data elements of these two systems, thereby enabling interoperable data exchange, facilitating system development, and providing a basis for generating large, anonymised test datasets.

The methodology follows three logical steps. First, the authors performed a detailed domain analysis of the HIS and OHIP data specifications. HIS contains roughly thirty distinct entities covering patient demographics, admission and discharge events, bed allocation, clinical orders, laboratory results, and billing information. OHIP focuses on high‑complexity outpatient procedures (e.g., chemotherapy, advanced imaging) and includes procedure codes, execution timestamps, responsible clinicians, and cost‑recovery items. Both systems rely heavily on the SUS (Sistema Único de Saúde) coding scheme, which is not directly aligned with international terminologies such as ICD‑10 or SNOMED CT.

Second, the authors surveyed the openEHR Clinical Knowledge Manager (CKM) repository, which hosts hundreds of community‑maintained archetypes. They identified a subset of archetypes that could be reused without modification—primarily those representing generic clinical concepts such as Diagnosis, Medication Order, and Laboratory Test Result. Reusing these archetypes preserves compatibility with the broader openEHR ecosystem and reduces duplication of effort.

Third, for concepts that had no direct counterpart in CKM, the authors employed two complementary strategies: specialization and new design. Specialization involved extending existing CKM archetypes to incorporate Brazil‑specific attributes. For example, the generic “Procedure” archetype was specialised into a “High‑Complexity Procedure” archetype that adds fields for SUS procedure codes, reimbursement categories, and performing institution identifiers. New design was required for purely administrative data such as admission pathways, bed allocation logic, patient flow, and insurance classification. The authors created eight entirely new archetypes, the most notable being “Hospital Admission,” which captures admission date‑time, admission type (elective vs emergency), insurance status, and the responsible clinical team.

In total, the study produced 22 archetypes: 9 directly reused from CKM, 5 specialised, and 8 newly created. All archetypes conform to the openEHR hierarchical structure (COMPOSITION → ENTRY → ELEMENT) and employ standard data types (DV_TEXT, DV_CODED_TEXT, DV_DATE_TIME, etc.). A key technical contribution is the handling of code‑mapping. The authors leveraged the openEHR CODE_PHRASE construct to allow a single data element to carry multiple terminology bindings (e.g., SUS code, ICD‑10, SNOMED CT). Terminology binding metadata is explicitly recorded, facilitating future integration with terminology servers and automated translation between national and international code sets.

Beyond modelling, the authors demonstrate a practical use case: the generation of a synthetic, anonymised dataset for system testing. Real‑world records are extracted, transformed into openEHR COMPOSITION instances according to the defined archetypes, and then de‑identified through hashing or random substitution of personal identifiers. This pipeline satisfies Brazil’s LGPD (Lei Geral de Proteção de Dados) requirements while providing developers with realistic data for performance benchmarking, interface validation, and training.

The discussion highlights several implications. By providing a concrete archetype set, the paper reduces the barrier for health IT vendors to develop openEHR‑compliant solutions for Brazil, easing migration from legacy systems. The inclusion of both clinical and administrative domains ensures that the resulting EHR can support end‑to‑end workflows, from patient admission to billing. Moreover, the approach serves as a blueprint for other Latin American countries or low‑resource settings seeking to adopt openEHR: start with CKM reuse, specialise where national coding schemes exist, and design new archetypes for uniquely local processes.

In conclusion, the authors successfully translate the complex data structures of Brazil’s public hospital and high‑complexity outpatient systems into a compact, interoperable openEHR archetype suite. This work not only advances national health information standardisation but also contributes a reusable methodology for extending openEHR to accommodate country‑specific requirements, thereby promoting global health data interoperability.

💡 Research Summary

📜 Original Paper Content