Running Freyja on other pathogens

This guide provides instructions for analyzing non-SARS-CoV-2 pathogens such as influenza or MPox using Freyja. The process is similar to SARS-CoV-2 analysis, but with some key differences.

Data Availability

Data for various pathogens can be found in the following repository: Freyja Barcodes

Folders are organized by pathogen, with each subfolder named after the date the barcode was generated, using the format YYYY-MM-DD. Barcode files are named barcode.csv, and reference genome files are named reference.fasta.

Note

Influenza barcodes are available upon request.

Required Files

To perform these analyses, you will need the following files for the MPox pathogen:

Setting Up Output Directories

Since you will likely be working with multiple wastewater samples, it is advisable to create directories for storing output files:

mkdir variants_files depth_files demix_files

Analysis Steps

The first step is to generate a variant file. Use the following command to perform this step:

freyja variants test.sorted.bam --ref reference.fasta --variants variants_files/test.tsv --depths depth_files/test.depth

Please note that you will be passing the reference genome file provided in the pathogen folder as the --ref argument. In cases where multiple reference genomes are present in the reference fasta, you can specify the name of the desired reference genome with --refname [name-of-reference].

Once the variant file is generated, proceed to the de-mixing step with the following command:

freyja demix variants_files/test.tsv depth_files/test.depth --barcodes barcode.csv --output demix_files/test.output

Please note that you will be passing the barcode file provided in the pathogen folder as the --barcodes argument.

Once you’ve run demix on a bunch of samples, you can aggregate all of the output files using the command

freyja aggregate demix_files/ --output bunch_of_files.tsv

From there, it’s easy to view the output files in any standard TSV viewer (Excel, Numbers, LibreOffice Calc, etc.). You should see something like this:

        summarized      lineages        abundances      resid   coverage
test.tsv        [('Other', 0.999999999530878)]  MPX-A.3 MPX-A.2.2       0.79798000 0.20202000   7.5952064496123075      99.94117915510955