Enabled by data from GISAID.


Table 1.1: Key Mutations that define the strain
Gene Nucleotide Mutations Amino Acid Changes
ORF1ab C3266T, T6953C, C5387A, 11288:11296 deletion T1001I, I2230T, A1708D
S A23062T, C23270A, A23402G, C23603A, C23708T, T24505G, G24913C DEL69-70, DEL144Y, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H
N GAT28279CTA, C28976T D3L, S235F
ORF8 G28047T, C27971T, A28110G R52I, Q27_, Y73C
Figure 1.1: genetic distance (root-to-tip), a measure of evolutionary changes, plotted for strain and non-strain (related) samples (excluding other well known VOCs e.g. B.1.135).

State Prevalence


Figure 2.1: spatial (geographical) prevalence of the B117 strain across California.
Figure 2.2: temporal (over time) prevalence of the strain across California.

National Prevalence


Figure 3.1: the spatial (geographical) prevalence of the strain across the US.
Figure 3.2: the temporal (over time) prevalence of the strain across the US.

Global Prevalence


Figure: the spatial (geographical) prevalence of the strain across the world.
Figure 4.2: the temporal (over time) prevalence of the strain across the world.

Notes on Sampling


As figure 3.2 indicates, the majority of B.1.1.7 genomes identified in the US to-date were identified by S-gene target failures (SGTF) in community-based diagnostic PCR testing. SGTF indicates a deletion mutation that is one of several mutations able to distinguish the B.1.1.7 from other SARS-CoV-2 strains.
This tells us the lineage is present in the US. However, since this was not an unbiased sampling approach, it does not indicate the true prevalence of the B117 lineage in the US. Estimates of true prevalence in the US are discussed here.

Figure 5: a simple illustration of how genomic surveillance of COVID-19 samples could allow us to elucidate an increasingly clear picture of how the virus is evolving and spreading. The pictures above are electromagnetic microscopy images of SARS-CoV-2 (credit: NIAID) that are "crappified" (salt & pepper noise) to varying degrees depending on the rate of COVID-19 sequencing at each location. As a reference, we include a clear picture on the right to indicate that a 5% genomic sampling rate would be an ideal (first) objective to be able to observe statistically significant phenomena. across the world.

Comments


Research laboratories across the US are encouraged to contribute to COVID-19 genomic sequencing efforts. More detailed information can be found here.

Specifically when uploading genomes to GISAID or GenBank, please indicate if the sample was identified via S-gene target failures (SGTF). This can be indicated under the fields "purpose_of_sequencing" (GISAID) or "Additional host information". This will help in identifying the true prevalence of the lineage across the country.

Acknowledgements

We would like to thank the GISAID Initiative and are grateful to all of the data contributors, i.e. the Authors, the Originating laboratories responsible for obtaining the specimens, and the Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based.

Elbe, S., and Buckland-Merrett, G. (2017) Data, disease and diplomacy: GISAID's innovative contribution to global health. Global Challenges, 1:33-46.
DOI: 10.1002/gch2.1018 PMCID: 31565258

Note: When using results from these analyses in your manuscript, ensure that you also acknowledge the Contributors of data, i.e. "We gratefully acknowledge all the Authors, the Originating laboratories responsible for obtaining the specimens, and the Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based."

Also, cite the following reference: Shu, Y., McCauley, J. (2017) GISAID: From vision to reality. EuroSurveillance, 22(13)
DOI: 10.2807/1560-7917.ES.2017.22.13.30494 PMCID: PMC5388101