HepML, an XML-based format for describing simulated data in high energy physics

In this paper we describe a HepML format and a corresponding C++ library developed for keeping complete description of parton level events in a unified and flexible form. HepML tags contain enough inf

HepML, an XML-based format for describing simulated data in high energy   physics

In this paper we describe a HepML format and a corresponding C++ library developed for keeping complete description of parton level events in a unified and flexible form. HepML tags contain enough information to understand what kind of physics the simulated events describe and how the events have been prepared. A HepML block can be included into event files in the LHEF format. The structure of the HepML block is described by means of several XML Schemas. The Schemas define necessary information for the HepML block and how this information should be located within the block. The library libhepml is a C++ library intended for parsing and serialization of HepML tags, and representing the HepML block in computer memory. The library is an API for external software. For example, Matrix Element Monte Carlo event generators can use the library for preparing and writing a header of a LHEF file in the form of HepML tags. In turn, Showering and Hadronization event generators can parse the HepML header and get the information in the form of C++ classes. libhepml can be used in C++, C, and Fortran programs. All necessary parts of HepML have been prepared and we present the project to the HEP community.


💡 Research Summary

The paper introduces HepML, an XML‑based metadata format designed to provide a complete, machine‑readable description of parton‑level events in high‑energy physics, and a companion C++ library called libhepml that parses, serializes, and represents HepML blocks in memory. Traditional Les Houches Event Files (LHEF) contain the kinematic information of generated events but lack a standardized way to embed essential context such as the physics model, parameter values, parton distribution functions (PDFs), factorisation and renormalisation scales, and cut definitions. This missing information hampers reproducibility and creates friction when passing events from matrix‑element generators to showering and hadronisation programs.

HepML addresses this gap by defining a hierarchy of XML tags, each governed by an XML Schema (XSD). Core tags include <model> (theoretical model and its parameters), <process> (initial and final state particles, cross‑section information), <generator> (name, version, configuration options of the event generator), <cut> (kinematic cuts such as p_T or η thresholds), and <pdf> (PDF set identification, version, and scale). The schemas explicitly mark required versus optional elements, enforce data types (integers, floating‑point numbers, strings, arrays), and even specify units, thereby eliminating ambiguities that arise when different programs use slightly different conventions.

The HepML block is intended to be embedded directly into an LHEF file’s <header> section using a CDATA wrapper, e.g. `<!


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...