Motivation: Antifungal resistance has become an increasing global concern in both clinical and environmental health. Detecting known resistance mutations directly from sequencing reads, in special metagenomic samples, remains a major challenge. As fungal pathogens are often neglected compared with bacterial pathogens, most available tools are designed for bacterial taxa, whereas tools targeting fungi typically require assembled genomes. In metagenomic datasets, assembly-based strategies may result in substantial information loss due to genome fragmentation, low-abundance species, or incomplete recovery of resistance loci. Results: Here, we present FUNGAR, an open-source pipeline for the rapid identification of antifungal resistance genes and mutations directly from short-read data. FUNGAR employs translated alignments with DIAMOND and curated data from the FungAMR database to detect amino acid substitutions across all six open reading frames. The pipeline produces structured, reproducible reports linking detected variants to their associated antifungal drugs and can be easily customized for new species or databases.
Global emergence and spread of resistance to antifungal drugs represents an increasing threat to both human and environmental health (van Rhijn and Rhodes 2025). Cases of resistance to azoles, echinocandins, and polyenes are frequently reported, especially in clinically relevant genera such as Aspergillus, Candida and Cryptococcus (Lockhart et al. 2023). Molecular mechanisms underlying such resistance phenotypes are, in most cases, well characterized and often produced by mutations in specific genes, hereafter named antifungal resistance genes (AfRGs). Those are mostly related to the alteration in the binding sites of drugs, alteration in cell wall components and nucleic acid synthesis or repair, as well as the production of biofilm (Czajka et al. 2023;Lee et al. 2023).
Despite the wide availability and the increasing ease of obtaining high throughput sequencing data, identifying AfRGs in metagenomic samples is still challenging. As demonstrated for bacteria (Maguire et al. 2020;Abramova et al. 2024), assembly-based strategies may lead to loss of information due to fragmentation of low-abundance genomes, chimeric contigs, or incomplete recovery of target loci (Mirete et al. 2025). Most of the existing software or tools were designed for bacterial taxa, and either focus on assembled single genomes (Bédard et al. 2025) or are not easily customizable for fungi species (Matsumura et al. 2025).
Here, we present FUNGAR, an open-source pipeline for the rapid identification of AfRGs and associated mutations in metagenomic samples.
FUNGAR is based on translated alignments with DIAMOND and curated resistance data from the FungAMR database (Bédard et al. 2025) to detect amino acid substitutions across the six open reading frames. The pipeline is easily customizable and provides reports linking detected variants to their associated antifungal drugs.
FUNGAR is a pipeline written in Bash and Python 3 languages and is designed for Unix-like operating systems (built and tested on Ubuntu 24.04.3 LTS). The only required input files are either quality-filtered paired-or single-end FASTQ files. Reads are then aligned against protein sequences and their respective mutations described in the database FungAMR (Bédard et al. 2025) with the blastx mode in DIAMOND (Buchfink et al. 2021), which screens for alignments in all six possible open reading frames (ORFs). By default, the minimum length of a translated sequence (-orf) is set to 50 amino acids (for read length of 150bp) for increased specificity. The user can also set custom values for minimum query coverage (–min-query-cover), minimum percent of identity (–min-pident), and the genetic code used for query translation (-code). The overall pipeline flow is outlined in Figure 1.
The output files are then processed with the pandas library by comparing in both reference and query sequences the amino acid present in the known position of mutations described in the database FungAMR (Bédard et al. 2025); detected variants are cross-referenced with their associated antifungal drugs, and results are compiled into a structured CSV file. This output includes the sequencing read ID, mutation details (gene, position, reference/variant amino acids), and the corresponding drug resistance profile. Results are also summarized into a CSV file containing the gene, position, reference/variant amino acids, drug resistance profile, and number of reads supporting each mutation.
The pipeline includes prebuilt DIAMOND databases for fungal species commonly found in environmental and clinical samples, which will be continuously updated. Users may also supply custom databases by specifying a directory (via the -d flag) containing two files: (1) a DIAMOND-formatted protein database (*.dmnd); and (2) a CSV file listing resistance mutation with columns for gene, position, reference, mutation, and fungicide. This flexibility allows the pipeline to adapt to novel species or emerging resistance mechanisms.
The functionality of FUNGAR was first evaluated with a manually built dataset of FASTQ files. Two mutations were introduced in the second residue of the dihydrofolate reductase (DHFR) protein sequence of Pneumocystis jirovecii (NCBI accession number ABB84736.1): (i) D2E, which confers resistance to methotrexate (Bédard et al. 2024); and (ii) D2F, a neutral mutation with no documented resistance phenotype. The amino acid sequences were then reverse translated with EMBOSS 6.6.0 backtranseq (Rice et al. 2000). Eight paired-end reads (150 bp) were constructed covering the first 300 nucleotidesthe first four reads with the mutations in the forward strand, while the remaining four with the mutations in the reverse complement strand (Table S1). All base calls were assigned a Phred quality score of 30 to simulate high-quality sequencing data. The pipeline successfully detected only the D2E mutation described in the database. Furthermore, it was able to detect mutations in different ORFs, and in both forward
This content is AI-processed based on open access ArXiv data.