Running Freyja on other pathogens
This guide provides instructions for analyzing non-SARS-CoV-2 pathogens such as influenza or MPox using Freyja. The process is similar to SARS-CoV-2 analysis, but with some key differences.
Data Availability
Data for various pathogens can be found in the following repository: Freyja Barcodes
Folders are organized by pathogen, with each subfolder named after the date the
barcode was generated, using the format YYYY-MM-DD. Barcode files are named
barcode.csv, and reference genome files are named reference.fasta.
Note
Influenza barcodes are available upon request.
Required Files
To perform these analyses, you will need the following files for the MPox pathogen:
- test.sorted.bam: Aligned, trimmed, and sorted BAM file 
- reference.fasta: Reference genome file 
- barcode.csv: Barcode file 
Setting Up Output Directories
Since you will likely be working with multiple wastewater samples, it is advisable to create directories for storing output files:
mkdir variants_files depth_files demix_files
Analysis Steps
The first step is to generate a variant file. Use the following command to perform this step:
freyja variants test.sorted.bam --ref reference.fasta --variants variants_files/test.tsv --depths depth_files/test.depth
Please note that you will be passing the reference genome file provided in the
pathogen folder as the --ref argument. In cases where multiple reference
genomes are present in the reference fasta, you can specify the name of the
desired reference genome with --refname [name-of-reference].
Once the variant file is generated, proceed to the de-mixing step with the following command:
freyja demix variants_files/test.tsv depth_files/test.depth --barcodes barcode.csv --output demix_files/test.output
Please note that you will be passing the barcode file provided in the pathogen
folder as the --barcodes argument.
Once you’ve run demix on a bunch of samples, you can aggregate all of the output files using the command
freyja aggregate demix_files/ --output bunch_of_files.tsv
From there, it’s easy to view the output files in any standard TSV viewer (Excel, Numbers, LibreOffice Calc, etc.). You should see something like this:
        summarized      lineages        abundances      resid   coverage
test.tsv        [('Other', 0.999999999530878)]  MPX-A.3 MPX-A.2.2       0.79798000 0.20202000   7.5952064496123075      99.94117915510955