Creating custom barcodes ------------------------------------------------------------------------------- Follow these steps to generate lineage‑specific barcodes with **BarcodeForge**. 1. Install dependencies - **Conda / Mamba** – see . - **BarcodeForge** – install from *Bioconda*: .. code-block:: bash conda install -c bioconda barcodeforge 2. Prepare input files - **Reference genome** – FASTA file of the reference sequence. - **Multiple‑sequence alignment** – FASTA of all sequences to be barcoded. - **Phylogenetic tree** – Newick or Nexus file containing every lineage to be barcoded. - **Lineage table** – TSV mapping each sequence ID to its lineage (``lineagesequence_id``). - **Barcode prefix** *(optional)* – string to prepend to each barcode (e.g. ``RSVa`` for RSV‑A). 3. Run BarcodeForge Basic syntax: .. code-block:: bash barcodeforge barcode REFERENCE_GENOME ALIGNMENT TREE LINEAGES [OPTIONS] Common options: - ``--tree_format {newick,nexus}`` – tree file format (default: ``newick``) - ``--usher-args ""`` – extra flags passed to **usher** - ``--threads N`` – number of CPU cores (default: ``1``) - ``--matutils-overlap FLOAT`` – value for ``matUtils annotate --set-overlap`` (default: ``0``) - ``--prefix TEXT`` – prefix prepended to lineage names (default: empty) .. note:: Use ``--help`` to see every available flag. 4. Retrieve the output The pipeline writes results to the current directory: * ``barcodes.csv`` – barcode definitions for each lineage * ``barcodes.html`` – the same barcodes in an interactive HTML format .. note:: The `barcodeforge_workdir` folder contains intermediate files generated by the pipeline. Worked example: RSV‑A ====================== The following example shows how to generate barcodes for the RSV-A lineage tree: 1. **Download demo data**: .. code-block:: bash mkdir data wget https://raw.githubusercontent.com/andersen-lab/BarcodeForge/refs/heads/main/barcodeforge/assets/tree.nwk -O data/tree.nwk wget https://raw.githubusercontent.com/andersen-lab/BarcodeForge/refs/heads/main/barcodeforge/assets/aligned.fasta -O data/aligned.fasta wget https://raw.githubusercontent.com/andersen-lab/BarcodeForge/refs/heads/main/barcodeforge/assets/reference.fasta -O data/reference.fasta wget https://raw.githubusercontent.com/andersen-lab/BarcodeForge/refs/heads/main/barcodeforge/assets/lineages.tsv -O data/lineages.tsv - `RSV-A tree `_ - `RSV-A alignment `_ - `RSV-A reference genome `_ - `RSV-A lineages per sample `_ 2. **Generate barcodes**: .. code-block:: bash barcodeforge barcode data/reference.fasta data/aligned.fasta data/tree.nwk data/lineages.tsv --tree_format newick --threads 4 --prefix RSVa 4. **View the results** The barcodes are saved in the current directory: * ``RSVa-barcodes.csv`` – barcode definitions for each lineage * ``RSVa-barcodes.html`` – the same barcodes in an interactive HTML format * `barcodeforge_workdir` – intermediate files generated by the pipeline