Creating custom barcodes

Follow these steps to generate lineage‑specific barcodes with BarcodeForge.

  1. Install dependencies

    conda install -c bioconda barcodeforge
    
  2. Prepare input files

    • Reference genome – FASTA file of the reference sequence.

    • Multiple‑sequence alignment – FASTA of all sequences to be barcoded.

    • Phylogenetic tree – Newick or Nexus file containing every lineage to be barcoded.

    • Lineage table – TSV mapping each sequence ID to its lineage (lineage<TAB>sequence_id).

    • Barcode prefix (optional) – string to prepend to each barcode (e.g. RSVa for RSV‑A).

  3. Run BarcodeForge

    Basic syntax:

    barcodeforge barcode REFERENCE_GENOME ALIGNMENT TREE LINEAGES [OPTIONS]
    

    Common options:

    • --tree_format {newick,nexus} – tree file format (default: newick)

    • --usher-args "<args>" – extra flags passed to usher

    • --threads N – number of CPU cores (default: 1)

    • --matutils-overlap FLOAT – value for matUtils annotate --set-overlap (default: 0)

    • --prefix TEXT – prefix prepended to lineage names (default: empty)

    Note

    Use --help to see every available flag.

  4. Retrieve the output

    The pipeline writes results to the current directory:

    • barcodes.csv – barcode definitions for each lineage

    • barcodes.html – the same barcodes in an interactive HTML format

    Note

    The barcodeforge_workdir folder contains intermediate files generated by the pipeline.

Worked example: RSV‑A

The following example shows how to generate barcodes for the RSV-A lineage tree:

  1. Download demo data:

    mkdir data
    wget https://raw.githubusercontent.com/andersen-lab/BarcodeForge/refs/heads/main/barcodeforge/assets/tree.nwk -O data/tree.nwk
    wget https://raw.githubusercontent.com/andersen-lab/BarcodeForge/refs/heads/main/barcodeforge/assets/aligned.fasta -O data/aligned.fasta
    wget https://raw.githubusercontent.com/andersen-lab/BarcodeForge/refs/heads/main/barcodeforge/assets/reference.fasta -O data/reference.fasta
    wget https://raw.githubusercontent.com/andersen-lab/BarcodeForge/refs/heads/main/barcodeforge/assets/lineages.tsv -O data/lineages.tsv
    
  2. Generate barcodes:

    barcodeforge barcode data/reference.fasta data/aligned.fasta data/tree.nwk data/lineages.tsv --tree_format newick --threads 4 --prefix RSVa
    
  1. View the results

    The barcodes are saved in the current directory:

    • RSVa-barcodes.csv – barcode definitions for each lineage

    • RSVa-barcodes.html – the same barcodes in an interactive HTML format

    • barcodeforge_workdir – intermediate files generated by the pipeline