WHAT IS THE DISTRIBUTION OF SARS-Cov-2 VARIANTS IN TURKEY?

Identifying mutations in SARS-CoV-2, especially functional variants, and their distributions in the population, is crucial to successfully inoculate individuals against COVID-19, as well as to identify and treat new cases. Therefore, Doğa Eskier, Dr. Gökhan Karakülah and Dr. Yavuz Oktay from IBG sought to identify the distribution and variances of three major SARS-CoV-2 variants (B.1.1.7, B.1.351, P.1) in Turkey.

To do so, we downloaded the SARS-CoV-2 isolate genome sequences available on the GISAID (https://www.gisaid.org) EpiCoV database, applying the filters “Europe / Turkey” for location, “Human” for host, and selecting “complete”, “high coverage”, “low coverage excl”, and “collection date compl” to obtain high quality genomes with complete metadata. As a result, we obtained 2934 genome sequences, as of 4th of May, 2020. We aligned these isolate sequences against the NC_045512.2 reference genome obtained from NCBI’s Nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore/NC_045512) using the mafft alignment software (https://mafft.cbrc.jp/alignment/software/) with the suggested parameters for aligning closely related viral genomes (https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html). Afterwards, we identified single nucleotide variations in the sequences compared to the reference genome using snp-sites (https://github.com/sanger-pathogens/snp-sites) and bcftools (http://www.htslib.org/doc/bcftools.html), and annotated the peptide changes resulting from these variations using ANNOVAR (https://annovar.openbioinformatics.org/en/latest/). We identified 5382 potential single nucleotide variants in 4010 sites, with 2429 of them being nonsynonymous.

Having identified the variances and the resulting aminoacid substitutions, we categorized the variants using the S peptide substitutions listed by CoVariants (https://covariants.org/shared-mutations). Out of 2934 genome sequences, 1224 had the D614G and N501Y substitutions shared among the three variants of interest. Out of these 1224, 335 (11.4%) were of the B.1.1.7 variant that was emerged in UK, 10 (0.3%) were of the B.1.351 variant that was emerged in South Africa, and 3 (0.1%) were of the P.1 variant that was isolated first from the patients from Brazil.

The B.1.17 variant isolates displayed 28 additional nonsynonymous mutations in the S gene, with 10 (3.0%) instances of the A1020S aminoacid substitution and 7 (2.1%) instances of the L5F substitution. The remaining 26 mutations were observed in fewer than 5 isolates.

The B.1.351 variant isolates displayed only one additional nonsynonymous mutation in the S gene, responsible for the L5F substitution in 2 of the 10 isolates.

The P.1 variant isolates displayed no nonsynonymous mutations in the S gene other than the globally observed ones.

We further examined the top 10 frequently observed nonsynonymous mutations outside of the S gene in each variation. Information for these mutations are available below.

Table 1: Top 10 B.1.1.7 nonsynonymous mutations

Variation Number of Isolates Percentage ANNOVAR annotation
28881G>A 335 100 ORF9:YP_009724397.2:exon1:c.G608A:p.R203K
28883G>C 335 100 ORF9:YP_009724397.2:exon1:c.G610C:p.G204R
14408C>T 334 99.7 ORF1ab:YP_009724389.1:exon2:c.C14144T:p.P4715L
28111A>G 334 99.7 ORF8:YP_009724396.1:exon1:c.A218G:p.Y73C
28280G>C 334 99.7 ORF9:YP_009724397.2:exon1:c.G7C:p.D3H
28281A>T 334 99.7 ORF9:YP_009724397.2:exon1:c.A8T:p.D3V
28282T>A 334 99.7 ORF9:YP_009724397.2:exon1:c.T9A:p.D3E
28977C>T 334 99.7 ORF9:YP_009724397.2:exon1:c.C704T:p.S235F
6954T>C 333 99.4 ORF1ab:YP_009724389.1:exon1:c.T6689C:p.I2230T,ORF1a:YP_009725295.1:exon1:c.T6689C:p.I2230T
5388C>A 332 99.1 ORF1ab:YP_009724389.1:exon1:c.C5123A:p.A1708D,ORF1a:YP_009725295.1:exon1:c.C5123A:p.A1708D

Table 2: Top 10 B.1.351 nonsynonymous mutations

Variation Number of Isolates Percentage ANNOVAR Annotation
1059C>T 10 100 ORF1ab:YP_009724389.1:exon1:c.C794T:p.T265I,ORF1a:YP_009725295.1:exon1:c.C794T:p.T265I
5230G>T 10 100 ORF1ab:YP_009724389.1:exon1:c.G4965T:p.K1655N,ORF1a:YP_009725295.1:exon1:c.G4965T:p.K1655N
10323A>G 10 100 ORF1ab:YP_009724389.1:exon1:c.A10058G:p.K3353R,ORF1a:YP_009725295.1:exon1:c.A10058G:p.K3353R
25563G>T 10 100 ORF3a:YP_009724391.1:exon1:c.G171T:p.Q57H
25904C>T 10 100 ORF3a:YP_009724391.1:exon1:c.C512T:p.S171L
26456C>T 10 100 ORF4:YP_009724392.1:exon1:c.C212T:p.P71L
28887C>T 10 100 ORF9:YP_009724397.2:exon1:c.C614T:p.T205I
14408C>T 9 90 ORF1ab:YP_009724389.1:exon2:c.C14144T:p.P4715L
28310C>T 9 90 ORF9:YP_009724397.2:exon1:c.C37T:p.P13S
21488A>G 3 30 ORF1ab:YP_009724389.1:exon2:c.A21224G:p.K7075R

Table 3: Top 10 P.1 nonsynonymous mutations

Variation Number of Isolates Percentage ANNOVAR Annotation
3828C>T 3 100 ORF1ab:YP_009724389.1:exon1:c.C3563T:p.S1188L,ORF1a:YP_009725295.1:exon1:c.C3563T:p.S1188L
14408C>T 3 100 ORF1ab:YP_009724389.1:exon2:c.C14144T:p.P4715L
17259G>T 3 100 ORF1ab:YP_009724389.1:exon2:c.G16995T:p.E5665D
26149T>C 3 100 ORF3a:YP_009724391.1:exon1:c.T757C:p.S253P
28167G>A 3 100 ORF8:YP_009724396.1:exon1:c.G274A:p.E92K
28512C>G 3 100 ORF9:YP_009724397.2:exon1:c.C239G:p.P80R
28877A>T 3 100 ORF9:YP_009724397.2:exon1:c.A604T:p.S202C
28878G>C 3 100 ORF9:YP_009724397.2:exon1:c.G605C:p.S202T
28881G>A 3 100 ORF9:YP_009724397.2:exon1:c.G608A:p.R203K
28883G>C 3 100 ORF9:YP_009724397.2:exon1:c.G610C:p.G204R

We also analyzed isolates that did not fit into one of these three variants, but nevertheless carried the K417N mutation on the S protein, normally found in the B.1.351 variant, to identify potential novel co-occurring mutations on the S gene. 465 of all isolates (15.8%) had the K417N mutation, but were not included in the B.1.1.7, B.1.351, and P.1 variants. We identified three re-occurring nonsynonymous variations that co-occurred with the K417N mutation at least twice, and were not observed as defining mutations of the other major variants. These variations were the A653V mutation, observed in 13 (2.9%) of the unknown variant isolates, L5F, observed in 7 (1.5%) of the isolates, and L54F, observed in 5 (1.1%) of the isolates. In addition, the T716I mutation, normally a mutation unique to the B.1.1.7 variant, was observed in two of these isolates.

The aligned genomic sequences are available upon request.

The full VCF and ANNOVAR annotation files are available below:

The full S gene variation files are available below:

*The following researchers have contributed to the content above:

ESKİER, Doğa, KARAKÜLAH, Gökhan, PhD, OKTAY, Yavuz, PhD