MassChroQ: A versatile tool for mass spectrometry quantification
Recently, many software tools have been developed to perform quantification in LC-MS analyses. However, most of them are specific to either a quantification strategy (e.g. label-free or isotopic labelling) or a mass-spectrometry system (e.g. high or low resolution). In this context, we have developed MassChroQ, a versatile software that performs LC-MS data alignment and peptide quantification by peak area integration on extracted ion chromatograms. MassChroQ is suitable for quantification with or without labelling and is not limited to high resolution systems. Peptides of interest (for example all the identified peptides) can be determined automatically or manually by providing targeted m/z and retention time values. It can handle large experiments that include protein or peptide fractionation (as SDS-PAGE, 2D-LC). It is fully configurable. Every processing step is traceable, the produced data are in open standard format and its modularity allows easy integration into proteomic pipelines. The output results are ready for use in statistical analyses. Evaluation of MassChroQ on complex label-free data obtained from low and high resolution mass spectrometers showed low CVs for technical reproducibility (1.4%) and high coefficients of correlation to protein quantity (0.98). MassChroQ is freely available under the GNU General Public Licence v3.0 at http://pappso.inra.fr/bioinfo/masschroq/.
💡 Research Summary
MassChroQ is a versatile, open‑source software package designed for quantitative LC‑MS proteomics that works equally well with high‑resolution (e.g., Orbitrap) and low‑resolution (e.g., LTQ ion trap) instruments and supports both label‑free and isotopic labeling strategies. The core quantification approach relies on extracted ion chromatograms (XICs). For low‑resolution data a 0.3 Th m/z extraction window is used, while high‑resolution data employ a 10 ppm window, allowing the same algorithm to accommodate differing mass accuracies.
Users can supply peptide targets either from MS/MS identification results or by providing explicit m/z‑retention time (RT) lists. In labeling experiments, the software accepts definitions of modified residues and corresponding mass shifts, enabling simultaneous quantification of multiple isotopic labels. After XIC extraction, a preprocessing pipeline applies an average filter followed by mathematical morphology operations: a closing step removes thin valleys while preserving peak maxima, and an opening step eliminates spurious spikes while preserving minima. Peak boundaries are detected on the closed profile; the original, unaltered XIC is then integrated between these boundaries to obtain the final area‑based quantitative value.
Two alignment methods are implemented to correct RT drift between runs. The first, OBI‑Warp (Ordered Bijective Interpolated Warping), uses only MS survey scans to compute a global warping function. The second, an in‑house MS/MS‑based alignment, treats common peptide identifications as landmarks; the RT differences (ΔMS/MS) are linearly interpolated across the chromatogram, with additional smoothing (average and median filters) to suppress low‑frequency noise. After alignment, peak matching is performed within user‑defined groups of runs (e.g., same SCX fraction), and a quantitative value is assigned to a peptide only if its MS/MS RT falls inside the detected peak boundaries.
Input files can be mzXML or mzML. Identification results from X!Tandem, or generic TSV/CSV files, are accepted. All processing parameters are defined in an XML configuration file (masschroqML), which can be generated automatically from a schema or edited manually. The software outputs results in TSV, gnumeric spreadsheet, or XML formats, facilitating direct import into statistical packages or proteomics repositories such as PROTICdb. XICs themselves can be exported for visual inspection.
Performance was evaluated on a benchmark set consisting of twelve LC‑MS runs (six low‑resolution, six high‑resolution) of a Saccharomyces cerevisiae whole‑cell digest spiked with bovine serum albumin (BSA) at six concentrations (4.5–1500 fmol). Across the combined data set, 5 831 unique peptide sequences were identified, corresponding to 556 proteins (0.3 % FDR). From these, 5 936 XICs were extracted from LR runs and 4 936 from HR runs. Reproducibility (presence in at least five of six replicates) was 97 % for HR peptides and 67 % for LR peptides, reflecting higher noise and complexity in the low‑resolution data.
Technical variation, expressed as the coefficient of variation (CV) of peptide intensities after log10 transformation and normalization, was 1.31 % for HR and 1.40 % for LR, indicating excellent precision. Correlation between peptide intensity and spiked BSA amount was 0.98 for both HR and LR data (with a few LR peptides deviating due to co‑eluting yeast peptides of similar m/z/RT). The mean intensity of peptides common to both platforms correlated at 0.89, demonstrating that the quantification pipeline yields comparable results across instrument types.
Processing speed was modest: on a 2.93 GHz Linux machine, the full analysis of the 12 runs (≈6 GB total) and >5 000 XICs required about one hour, with the majority of time spent handling non‑centroid LR data.
MassChroQ is implemented in C++ with the Qt framework, runs on Linux and Windows, and is distributed under the GNU GPL v3 license. It is a command‑line tool that also provides a library for integration into larger proteomics pipelines such as TPP or OpenMS. Future development plans include support for SRM data, a graphical user interface for interactive parameter tuning with real‑time XIC visualization, and further optimization of computational performance.
Comments & Academic Discussion
Loading comments...
Leave a Comment