A comprehensive look at the amino acid substitutions of SARS-CoV-2 isolates in Turkey

We estimated the full impact of the mutatome of 17 SARS-CoV-2 genomes isolated in Turkey, obtained from GISAID, on the tertiary structures of the coronavirus proteins. To this end, we employed multiple sequence alignment to identify changes in the amino acid sequences of 10 SARS-CoV-2 proteins, except surface glycoprotein, which we investigated in our previous post.

Our initial results are summarized as follows:

* Four proteins, namely, ORF6, ORF7a, ORF7b, and ORF10, do not have any observed mutations.

* 11 isolates have a Q57H mutation in the ORF3a protein. Besides being present in 11/17 Turkish isolates, this mutation is also the third-most frequent (2,666 samples mutated out of 12087 total) among 5033 different non-synonymous mutations in SARS-CoV-2 genomes deposited at GISAID as of April 25 (http://cov-glue.cvr.gla.ac.uk/).

* One isolate (**) has an unidentified residue at position 7, and a L51H mutation in the envelope protein.

* One isolate (**) has a V10A and an A194V mutation, and two isolates have a V66L mutation in the membrane glycoprotein.

* One isolate has a S54L mutation, while two isolates have a Q72H mutation in the ORF8 protein.

* Two isolates have a S194L mutation, one isolate has a S202N mutation, and one isolate has R203K and G204R mutations in the nucleocapsid phosphoprotein.

* The ORF1a polyprotein has a large number of mutations, apparently due to its great length (i.e., 4405 amino acid long). One isolate (**) has five unidentified residues and 26 mutations, the full list of which can be accessed here. The mutations and the remaining unidentified residues (denoted by “X” in MSA) are summarized below.

A206T – 1 isolate

R207C – 1 isolate

V378I – 4 isolates

S911F – 1 isolate

T951I – 2 isolates

A1420V – 1 isolate

unidentified 1631-1639 – 1 isolate

unidentified 1640-1644 – 10 isolates

unidentified 1652-1653 – 1 isolate

Q2702H – 1 isolate

M2796I – 1 isolate

unidentified 3588 – 1 isolate

L3606F – 5 isolates

A3995S – 1 isolate

T4159I – 2 isolates

L4182F – 1 isolate

** Note: Due to its large number of mutations (n=60) and non-consecutive unidentified nucleotides (n=11), the isolate hCoV-19/Turkey/6224-Ankara1034/2020|EPI_ISL_417413|2020-03-17 is marked separately wherever applicable. Notably, it has not been determined whether these genomic variants represent genuine mutations or sequencing artifacts; further experimental verification studies are needed.

The aligned nucleotide and peptide sequences can be obtained here.

Material and methods can be obtained here

The following researches have contributed to the research summarized above, listed alphabetically in order of surname:

ESKİER, Doğa
KARACA, Ezgi, PhD
KARAKÜLAH, Gökhan, PhD
OKTAY, Yavuz, PhD
PAVLOPOULOU, Athanasia, PhD