Running Freyja on other pathogens

This guide provides instructions for analyzing non-SARS-CoV-2 pathogens such as influenza or MPox using Freyja. The process is similar to SARS-CoV-2 analysis, but with some key differences.

Data Availability

Data for various pathogens can be found in the following repository: Freyja Barcodes

Folders are organized by pathogen, with each subfolder named after the date the barcode was generated, using the format YYYY-MM-DD. Barcode files are named barcode.csv, and reference genome files are named reference.fasta.

Note

Influenza barcodes are available upon request.

Required Files

To perform these analyses, you will need the following files for the MPox pathogen:

test.sorted.bam: Aligned, trimmed, and sorted BAM file
reference.fasta: Reference genome file
barcode.csv: Barcode file

Setting Up Output Directories

Since you will likely be working with multiple wastewater samples, it is advisable to create directories for storing output files:

mkdir variants_files depth_files demix_files

Analysis Steps

The first step is to generate a variant file. Use the following command to perform this step:

freyja variants test.sorted.bam --ref reference.fasta --variants variants_files/test.tsv --depths depth_files/test.depth

Please note that you will be passing the reference genome file provided in the pathogen folder as the --ref argument. In cases where multiple reference genomes are present in the reference fasta, you can specify the name of the desired reference genome with --refname [name-of-reference].

Once the variant file is generated, proceed to the de-mixing step with the following command:

freyja demix variants_files/test.tsv depth_files/test.depth --barcodes barcode.csv --output demix_files/test.output

Please note that you will be passing the barcode file provided in the pathogen folder as the --barcodes argument.

Once you’ve run demix on a bunch of samples, you can aggregate all of the output files using the command

freyja aggregate demix_files/ --output bunch_of_files.tsv

From there, it’s easy to view the output files in any standard TSV viewer (Excel, Numbers, LibreOffice Calc, etc.). You should see something like this:

        summarized      lineages        abundances      resid   coverage
test.tsv        [('Other', 0.999999999530878)]  MPX-A.3 MPX-A.2.2       0.79798000 0.20202000   7.5952064496123075      99.94117915510955