Running Freyja on other pathogens
This guide provides instructions for analyzing non-SARS-CoV-2 pathogens such as influenza or MPox using Freyja. The process is similar to SARS-CoV-2 analysis, but with some key differences.
Data Availability
Data for various pathogens can be found in the following repository: Freyja Barcodes
Folders are organized by pathogen, with each subfolder named after the date the
barcode was generated, using the format YYYY-MM-DD
. Barcode files are named
barcode.csv
, and reference genome files are named reference.fasta
.
Note
Influenza barcodes are available upon request.
Required Files
To perform these analyses, you will need the following files for the MPox pathogen:
test.sorted.bam: Aligned, trimmed, and sorted BAM file
reference.fasta: Reference genome file
barcode.csv: Barcode file
Setting Up Output Directories
Since you will likely be working with multiple wastewater samples, it is advisable to create directories for storing output files:
mkdir variants_files depth_files demix_files
Analysis Steps
The first step is to generate a variant file. Use the following command to perform this step:
freyja variants test.sorted.bam --ref reference.fasta --variants variants_files/test.tsv --depths depth_files/test.depth
Please note that you will be passing the reference genome file provided in the
pathogen folder as the --ref
argument. In cases where multiple reference
genomes are present in the reference fasta, you can specify the name of the
desired reference genome with --refname [name-of-reference]
.
Once the variant file is generated, proceed to the de-mixing step with the following command:
freyja demix variants_files/test.tsv depth_files/test.depth --barcodes barcode.csv --output demix_files/test.output
Please note that you will be passing the barcode file provided in the pathogen
folder as the --barcodes
argument.
Once you’ve run demix on a bunch of samples, you can aggregate all of the output files using the command
freyja aggregate demix_files/ --output bunch_of_files.tsv
From there, it’s easy to view the output files in any standard TSV viewer (Excel, Numbers, LibreOffice Calc, etc.). You should see something like this:
summarized lineages abundances resid coverage
test.tsv [('Other', 0.999999999530878)] MPX-A.3 MPX-A.2.2 0.79798000 0.20202000 7.5952064496123075 99.94117915510955