Pippi - painless parsing, post-processing and plotting of posterior and likelihood samples
Interpreting samples from likelihood or posterior probability density functions is rarely as straightforward as it seems it should be. Producing publication-quality graphics of these distributions is often similarly painful. In this short note I describe pippi, a simple, publicly-available package for parsing and post-processing such samples, as well as generating high-quality PDF graphics of the results. Pippi is easily and extensively configurable and customisable, both in its options for parsing and post-processing samples, and in the visual aspects of the figures it produces. I illustrate some of these using an existing supersymmetric global fit, performed in the context of a gamma-ray search for dark matter. Pippi can be downloaded and followed at http://github.com/patscott/pippi .
💡 Research Summary
The paper addresses a common bottleneck in Bayesian and likelihood‑based analyses: the cumbersome process of turning large collections of posterior or likelihood samples into clear, publication‑ready visualizations. While many tools exist for individual steps—such as reading data files, re‑weighting samples, or drawing histograms—none combine all required functionalities into a single, easily configurable workflow. To fill this gap the author introduces pippi, a lightweight, open‑source Python package that streamlines parsing, post‑processing, and high‑quality plotting of sample sets.
Core functionality is divided into three layers. The first layer is a flexible parser that automatically detects and reads a variety of common formats (CSV/TSV, HDF5, ROOT TTrees, etc.). It extracts not only the raw parameter values but also auxiliary metadata such as sample weights, log‑likelihood values, and parameter names. Lazy loading and chunked reading keep memory consumption low even for multi‑gigabyte files. The second layer implements a modular post‑processing pipeline. Users can apply weight normalisation, transform parameters (e.g., log‑scale to linear), compute derived quantities, and perform marginalisation in one or two dimensions. All operations are driven by a simple YAML/JSON configuration file or by command‑line flags, enabling fully reproducible analyses. The third layer is a Matplotlib‑based visualisation engine that produces vector‑PDF output suitable for journal submission. It supports 1‑D histograms, 2‑D contour plots, kernel‑density estimates, and “corner” (triangle) plots. Every subplot can be individually customised with LaTeX‑formatted axis labels, custom colour palettes, font sizes, legends, and tick formatting. Layouts are automatically arranged in grids or triangular matrices, and the final PDF is generated at a user‑specified resolution (typically ≥300 dpi).
The author demonstrates pippi on an existing supersymmetric global fit that explores 19 model parameters with one million samples, derived from a gamma‑ray dark‑matter search. Using a single configuration file and a one‑line command, pippi parses the ROOT output, re‑weights the samples, computes several derived parameters (e.g., neutralino mass), marginalises over nuisance dimensions, and produces a suite of high‑resolution PDFs—including 1‑D marginal distributions, 2‑D confidence contours, and a full triangle plot—within roughly three minutes. The resulting figures are ready for publication without any further editing.
Beyond the core features, pippi is built with extensibility in mind. A plugin architecture allows users to add custom parsers, new statistical transformations, or entirely new plot types without modifying the core codebase. The repository (http://github.com/patscott/pippi) is released under the permissive MIT license, encouraging community contributions. Planned future enhancements include native parallel processing, interactive web‑based visualisation, and machine‑learning‑driven sample compression. In summary, pippi offers a comprehensive, reproducible, and highly customisable solution to the “painful” aspects of sample analysis and plotting, turning what is traditionally a manual, error‑prone workflow into an automated, publication‑grade pipeline.