Identifying mutations in SARS-CoV-2, especially functional variants, and their distributions in the population, is crucial to successfully inoculate individuals against COVID-19, as well as to identify and treat new cases. Therefore, Doğa Eskier, Dr. Gökhan Karakülah and Dr. Yavuz Oktay from IBG sought to identify the distribution and variances of three major SARS-CoV-2 variants (B.1.1.7, B.1.351, P.1) in Turkey.
To do so, we downloaded the SARS-CoV-2 isolate genome sequences available on the GISAID (https://www.gisaid.org) EpiCoV database, applying the filters “Europe / Turkey” for location, “Human” for host, and selecting “complete”, “high coverage”, “low coverage excl”, and “collection date compl” to obtain high quality genomes with complete metadata. As a result, we obtained 2934 genome sequences, as of 4th of May, 2020. We aligned these isolate sequences against the NC_045512.2 reference genome obtained from NCBI’s Nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore/NC_045512) using the mafft alignment software (https://mafft.cbrc.jp/alignment/software/) with the suggested parameters for aligning closely related viral genomes (https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html). Afterwards, we identified single nucleotide variations in the sequences compared to the reference genome using snp-sites (https://github.com/sanger-pathogens/snp-sites) and bcftools (http://www.htslib.org/doc/bcftools.html), and annotated the peptide changes resulting from these variations using ANNOVAR (https://annovar.openbioinformatics.org/en/latest/). We identified 5382 potential single nucleotide variants in 4010 sites, with 2429 of them being nonsynonymous.
Having identified the variances and the resulting aminoacid substitutions, we categorized the variants using the S peptide substitutions listed by CoVariants (https://covariants.org/shared-mutations). Out of 2934 genome sequences, 1224 had the D614G and N501Y substitutions shared among the three variants of interest. Out of these 1224, 335 (11.4%) were of the B.1.1.7 variant that was emerged in UK, 10 (0.3%) were of the B.1.351 variant that was emerged in South Africa, and 3 (0.1%) were of the P.1 variant that was isolated first from the patients from Brazil.
The B.1.17 variant isolates displayed 28 additional nonsynonymous mutations in the S gene, with 10 (3.0%) instances of the A1020S aminoacid substitution and 7 (2.1%) instances of the L5F substitution. The remaining 26 mutations were observed in fewer than 5 isolates.
The B.1.351 variant isolates displayed only one additional nonsynonymous mutation in the S gene, responsible for the L5F substitution in 2 of the 10 isolates.
The P.1 variant isolates displayed no nonsynonymous mutations in the S gene other than the globally observed ones.
We further examined the top 10 frequently observed nonsynonymous mutations outside of the S gene in each variation. Information for these mutations are available below.
Table 1: Top 10 B.1.1.7 nonsynonymous mutations
Variation | Number of Isolates | Percentage | ANNOVAR annotation |
28881G>A | 335 | 100 | ORF9:YP_009724397.2:exon1:c.G608A:p.R203K |
28883G>C | 335 | 100 | ORF9:YP_009724397.2:exon1:c.G610C:p.G204R |
14408C>T | 334 | 99.7 | ORF1ab:YP_009724389.1:exon2:c.C14144T:p.P4715L |
28111A>G | 334 | 99.7 | ORF8:YP_009724396.1:exon1:c.A218G:p.Y73C |
28280G>C | 334 | 99.7 | ORF9:YP_009724397.2:exon1:c.G7C:p.D3H |
28281A>T | 334 | 99.7 | ORF9:YP_009724397.2:exon1:c.A8T:p.D3V |
28282T>A | 334 | 99.7 | ORF9:YP_009724397.2:exon1:c.T9A:p.D3E |
28977C>T | 334 | 99.7 | ORF9:YP_009724397.2:exon1:c.C704T:p.S235F |
6954T>C | 333 | 99.4 | ORF1ab:YP_009724389.1:exon1:c.T6689C:p.I2230T,ORF1a:YP_009725295.1:exon1:c.T6689C:p.I2230T |
5388C>A | 332 | 99.1 | ORF1ab:YP_009724389.1:exon1:c.C5123A:p.A1708D,ORF1a:YP_009725295.1:exon1:c.C5123A:p.A1708D |
Table 2: Top 10 B.1.351 nonsynonymous mutations
Variation | Number of Isolates | Percentage | ANNOVAR Annotation |
1059C>T | 10 | 100 | ORF1ab:YP_009724389.1:exon1:c.C794T:p.T265I,ORF1a:YP_009725295.1:exon1:c.C794T:p.T265I |
5230G>T | 10 | 100 | ORF1ab:YP_009724389.1:exon1:c.G4965T:p.K1655N,ORF1a:YP_009725295.1:exon1:c.G4965T:p.K1655N |
10323A>G | 10 | 100 | ORF1ab:YP_009724389.1:exon1:c.A10058G:p.K3353R,ORF1a:YP_009725295.1:exon1:c.A10058G:p.K3353R |
25563G>T | 10 | 100 | ORF3a:YP_009724391.1:exon1:c.G171T:p.Q57H |
25904C>T | 10 | 100 | ORF3a:YP_009724391.1:exon1:c.C512T:p.S171L |
26456C>T | 10 | 100 | ORF4:YP_009724392.1:exon1:c.C212T:p.P71L |
28887C>T | 10 | 100 | ORF9:YP_009724397.2:exon1:c.C614T:p.T205I |
14408C>T | 9 | 90 | ORF1ab:YP_009724389.1:exon2:c.C14144T:p.P4715L |
28310C>T | 9 | 90 | ORF9:YP_009724397.2:exon1:c.C37T:p.P13S |
21488A>G | 3 | 30 | ORF1ab:YP_009724389.1:exon2:c.A21224G:p.K7075R |
Table 3: Top 10 P.1 nonsynonymous mutations
Variation | Number of Isolates | Percentage | ANNOVAR Annotation |
3828C>T | 3 | 100 | ORF1ab:YP_009724389.1:exon1:c.C3563T:p.S1188L,ORF1a:YP_009725295.1:exon1:c.C3563T:p.S1188L |
14408C>T | 3 | 100 | ORF1ab:YP_009724389.1:exon2:c.C14144T:p.P4715L |
17259G>T | 3 | 100 | ORF1ab:YP_009724389.1:exon2:c.G16995T:p.E5665D |
26149T>C | 3 | 100 | ORF3a:YP_009724391.1:exon1:c.T757C:p.S253P |
28167G>A | 3 | 100 | ORF8:YP_009724396.1:exon1:c.G274A:p.E92K |
28512C>G | 3 | 100 | ORF9:YP_009724397.2:exon1:c.C239G:p.P80R |
28877A>T | 3 | 100 | ORF9:YP_009724397.2:exon1:c.A604T:p.S202C |
28878G>C | 3 | 100 | ORF9:YP_009724397.2:exon1:c.G605C:p.S202T |
28881G>A | 3 | 100 | ORF9:YP_009724397.2:exon1:c.G608A:p.R203K |
28883G>C | 3 | 100 | ORF9:YP_009724397.2:exon1:c.G610C:p.G204R |
We also analyzed isolates that did not fit into one of these three variants, but nevertheless carried the K417N mutation on the S protein, normally found in the B.1.351 variant, to identify potential novel co-occurring mutations on the S gene. 465 of all isolates (15.8%) had the K417N mutation, but were not included in the B.1.1.7, B.1.351, and P.1 variants. We identified three re-occurring nonsynonymous variations that co-occurred with the K417N mutation at least twice, and were not observed as defining mutations of the other major variants. These variations were the A653V mutation, observed in 13 (2.9%) of the unknown variant isolates, L5F, observed in 7 (1.5%) of the isolates, and L54F, observed in 5 (1.1%) of the isolates. In addition, the T716I mutation, normally a mutation unique to the B.1.1.7 variant, was observed in two of these isolates.
The aligned genomic sequences are available upon request.
The full VCF and ANNOVAR annotation files are available below:
The full S gene variation files are available below:
*The following researchers have contributed to the content above:
ESKİER, Doğa, KARAKÜLAH, Gökhan, PhD, OKTAY, Yavuz, PhD