The Molecular Biology of Infection

An Introduction to the Nuts and Bolts of SARS-CoV-2. 

Covid-19 Home Introduction Background Method Downloads Models

Introduction to Covid-19.

The causative agent of the respiratory infection colloquially known as Covid-19, is the SARS-CoV-2 virus (1). This is a positive sense, single-strand RNA betacoronavirus of the Coronaviridae family (2-3) and with a genome of around 29,900 kilobases it has one of the largest genetic sequences of any RNA virus (4-5). 

Upon entering the host cell, the RNA genetic code is translated into two polyproteins (pp1a, pp1ab), these respectively cleaved into 11 and 15 none structural proteins (nsps); cleavage is mediated by nsp3 (Papain-like protease) and nsp5 (3C like protease) (5). Importantly, the genes for the four key structural proteins; the Spike (S), Membrane (M), Envelope (E) and Nucleocapsid (N) proteins are grouped together at the 3’ end of the viral RNA strand (5-7). These are accompanied by the accessory proteins encoded by the ORF3a, ORF6, ORF7a, ORF7b, and ORF8 genes (6-7). The nsp12, which functions as an RNA-dependent RNA polymerase, along with other nsps assists in replication and transcription of the viral genome, allows negative sense subgenomic RNAs encoding only the structural and associated accessory proteins to be transcribed (5). Thus, allowing for exaggerated transcription of structural over non-structural proteins.

PDB entries – accurate three-dimensional images - of numerous SARS-CoV-2 viral proteins have already been determine. These include the majority of the Spike protein, Envelope protein, Nucleocapsid protein, main protease, papain-like protease, nsp1, nsp3, nsp7-nsp8 complex, nsp9 (RNA-replicase), nsp10-nsp16 complex (methyltransferase), nsp12 (RNA-dependant RNA polymerase), nsp15 (Endoribonuclease), ORF3a, ORF7a, ORF8, and Orf9b (8). Multiple structures for the Spike protein in various conformation states; closed, open, double-open, and post-fusion are also available (9). The Spike protein structures are of particular importance considering this is a primary site for the hosts immunological response to recognise the virus and the focus of vaccine efforts (10).

 
spike 6xm4 r2.jpg

Spike Protein

Open Spike protein, with glyan groups 6XM4

envelope 7k3g r2.jpg

Envelope Protein

NMR determined structure of the Envelope protein 7K3G

N protein 6yI3 and 7ce0 r2.jpg

Nucelocapsid Protein

The structures of the N and C domains (6YI3 and 7CE0) of the Nucleocapid protein

Main protease 7AQE r2.jpg

Main protease structure 7AQE

Nsp9 an RNA replicase, structure 6W9Q

nsp15 r3.jpg

Nsp15 (endoribonucleoase) structure 7K0R

Papain like protease 6WRHJPG r2.jpg

Papain-like protease structure 6WRH

nsp10 nsp16 6w4h r2.jpg

Nsp10-nsp16 complex methyltransferase structure 6W4H

orf3a r2.jpg

ORF3a (similar to the M protein), structure 6XDC

nsp1 nsp 3 7k7p 6wey r2.jpg

Nsp1 and nsp3, structures 7K7P and 6WEY left to right.

Nsp7-nsp8-nsp12 7aap r2.jpg

Nsp7-nsp8-nsp12 (RNA dependant RNA polymerase), 7AAP

orf7a and orf8 r2.jpg

ORF7a and ORF8 structures 7CI3 and 7JTL (left to right)

Nsp7-nsp8 complex, dimmer unit 6YHU

Nsp13 (helicase) structure 6XEZ

orf9b r2.jpg

ORF9b structure 6Z4U

The methods used to directly determine protein structures namely, X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy (cryo-EM) are labour intensive and have limitations paticularly for defining membrane bound proteins (11). Some SARS-CoV-2 proteins remain elusive or only partially determined, the Membrane protein and transmembrane domain of the Spike protein are relevant examples. Tools such as Swiss-model, Modeller, and HHpred can and have been used towards creating in silico homology models of these unresolved proteins and their domains, these programs use a proteins sequence and data from other known determined PDB structures (12-15). 

Top: Zhang Lab M protein model.Bottom: Deepminds Alphafold M protein prediction.

Top: Zhang Lab M protein model.

Bottom: Deepminds Alphafold M protein prediction.

In the case of the M protein which lacks homologs, multiple teams have applied novel ab initio techniques using only the protein sequence and software which can predict protein folding - predicting a structure without a known protein template. Using such powerful modern in silico techniques, the Zhang Lab team using C-I-Tasser software have created models for the entire proteome of SARS-CoV-2 including the M protein (16-17). Deepmind’s AlphaFold 2 AI programme (owned by Google), which recently won the 14th Critical Assessment of protein Structure Prediction by some margin (18), have released three-dimensional models for several SARS-CoV-2 proteins including the M protein. The accuracy of this AlphaFold model was independently supported by the resolution of ORF3a using Cryo-EM (19), these two SARS-CoV-2 proteins sharing some sequence homology, both being membrane bound dimers exhibiting structural similarity.

Beyond the RNA and protein components, envelope viruses such as SARS-CoV-2 also possess a lipid bilayer which makes up a majority of the virions external surface area (20). To understand the composition of the viral envelope some insight into the cellular biology of infection is required.

The Spike proteins of coronavirus facilitate infection by recognising proteins present at the extracellular side of the host's cell. SARS-CoV-2 Spike protein recognises - fits the shape of - the human protein angiotensin-converting enzyme 2 (ACE2) which is a human type I membrane protein responsible controlling levels of signalling peptides, these chiefly involved in the cardiovascular system (21-22). The presence of ACE2 on the extracellular side of our cell’s lipid bilayer makes is a ideal target as an entry receptor for SARS-COV-2 (21).

Viral fusion facilitated via the S Protein ACE2 and TMPRSS2. 

Viral fusion facilitated via the S Protein ACE2 and TMPRSS2. 

As the virion comes in close proximity of the host cell a region on the Spike protein acts as a receptor binding domain (RBD) for ACE2, some recent mutations such as those found in the South African variant have occured at or help stabilise the RBD (23). 

Another host protein, transmembrane protease serine 2 (TMPRSS2), is also important in viral fusion. TMPRSS2 is a protease bound to the extracellular side of the host's cell membrane, it processes the Spike protein, cleaving it between the S1 and S2 domains allowing the Spike to more easily adopt is post fusion conformation (24). These protein-protein interactions between the virus and host cell, bring the two lipid membranes in close proximity until they fuse much like two bubbles joining. This allows entry of the pathogenic RNA via the viral envelope fusing and being incorporated with the cells plasma membrane.

However, virion replication, budding, and maturation occurs away from the plasma membrane, within the cell. This an important distinction omitted from layman explanations of infection. For coronaviruses, virion formation is predominately centred around the intermediate compartment at the endoplasmic reticulum Golgi interface inside the host cell (25). These cellular organelles are associated with protein trafficking and cellular secretion (26). The lipid composition of the SARS-CoV-2 virion envelope is defined by the subcellular membrane it budded from - that is the composition of the endoplasmic-reticulum–Golgi intermediate compartment.

Cellular Biology of the Endoplasmic reticulum and Golgi Apperatus, figure taken from wikipedia.

Cellular Biology of the Endoplasmic reticulum and Golgi Apperatus, figure taken from wikipedia.

With this background in the molecular biology of SARS-C0V-2 infection, knowing the genetic content, the structures of the proteome, and the location of viral envelope formation. We could undertake the task of determining the components for our macromolecular model of the virion.


References:

  1. Who 2020, Naming the coronavirus disease (COVID-19) and the virus that causes it, Who, viewed 21 April 2021, <https://www.who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance/naming-the-coronavirus-disease-(covid-2019)-and-the-virus-that-causes-it>.

  2. Na, Z, Dingyu, Z, Wenling, W, et al. 2019, ‘A Novel Coronavirus from Patients with Pneumonia in China, 2019’, N. Engl. J. Med., vol. 727, no. 733, pp. 382-8, doi: 10.1056/NEJMoa2001017.

  3. Gordon, DE, Jang, GM, Bouhaddou, M. et al. 2020 ‘A SARS-CoV-2 protein interaction map reveals targets for drug repurposing’. Nature, vol. 583, pp. 459–468, https://doi.org/10.1038/s41586-020-2286-9.

  4. Mei-Yue, W, Rong, Z, Li-Juan, G, Xue-Fei, G De-Ping, W, Ji-Min, C 2020, ‘SARS-CoV-2: Structure, Biology, and Structure-Based Therapeutics Development’, Frontiers in Cellular and Infection Microbiology, vol. 10, pp. 724, https://www.frontiersin.org/article/10.3389/fcimb.2020.587269.    

  5. Dongwan, K, Joo-Yeon, L, Jeong-Sun, Y, Jun, WK, Narry, KV, Hyeshik, C 2020, ‘The Architecture of SARS-CoV-2 Transcriptome’, Cell, vol. 181, no. 4, pp. 914-921.e10, https://doi.org/10.1016/j.cell.2020.04.011.

  6. Khailany, RA, Safdar, M, Ozaslan, M 2020, ‘Genomic characterization of a novel SARS-CoV-2’. Gene Rep. vol. 19, no. 100682, doi:10.1016/j.genrep.2020.100682.

  7. Naqvi, AAT, Fatima, K, Mohammad, T et al. 2020 ‘Insights into SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: Structural genomics approach’. Biochim Biophys Acta Mol Basis Dis. vol. 1866, no. 10, pp. 165878, doi:10.1016/j.bbadis.2020.165878.

  8. PDB-101 2021, COVID-19/SARS-CoV-2 Resources, PDB, viewed 21 April 2021, < https://www.rcsb.org/news?year=2020&article=5e74d55d2d410731e9944f52&feature=true>.

  9. Cai, Y, Zhang, J, Xiao, T, Peng, H, Sterling, SM, Walsh, RM, Rawson, S, Rits-Volloch, S, Chen B 2020, ‘Distinct conformational states of SARS-CoV-2 spike protein’, Science, vol. 2020, pp. 1586-1592, https://science.sciencemag.org/content/369/6511/1586/tab-pdf.

  10. Dai, L, Gao, GF 2021, ‘Viral targets for vaccines against COVID-19’. Nat Rev Immunol, vol. 21, pp. 73–82, https://doi.org/10.1038/s41577-020-00480-0.

  11. Rawson, S, Davies, S, Lippiat, JD, Muench, SP 2016 ‘The changing landscape of membrane protein structural biology through developments in electron microscopy’, Molecular Membrane Biology, vol. 33, no. 1-2, pp. 12-22, DOI: 10.1080/09687688.2016.1221533.

  12.  Swiss-Model, Swiss-Model [Home], Swiss-Model, viewed 21 April 2021, <https://swissmodel.expasy.org/>.

  13. Waterhouse, A, Bertoni, M, Bienert, S, et al. 2018 ‘SWISS-MODEL: homology modelling of protein structures and complexes’. Nucleic Acids Res. vol. 46, no. W296-W303.

  14. Sali, A, Webb, B 2021, Modeller [Home], Modeller, viewed 21 April 2021, <https://salilab.org/modeller/>.

  15. Zimmermann, L, Stephens, A, Nam, SZ, Rau, D, Kübler, J, Lozajic, M, Gabler, F, Söding, J, Lupas, AN, Alva, V 2018, ‘A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core’. J. Mol. Biol. vol. 17, no. S0022-2836, pp. 30587-9.

  16. Zhang Lab, Covid-19, University of Michigan, viewed 21 April 2021, <https://zhanglab.ccmb.med.umich.edu/COVID-19/>.

  17. Huang, X, Zhang, C, Pearce, R, Omenn, GS, et al. 2021 ‘Identifying the Zoonotic Origin of SARS-CoV-2 by Modeling the Binding Affinity between the Spike Receptor-Binding Domain and Host ACE2’. Journal of Proteome Research, vol. 19, pp. 4844-4856.

  18. Callaway E 2020, ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures, Nature [News], viewed 20 April 2021, <https://www.nature.com/articles/d41586-020-03348-4>.

  19. Kern, DM, Sorum, B, Mali, SS, et al. 2020, ‘Cryo-EM structure of the SARS-CoV-2 3a ion channel in lipid nanodiscs’. [preprint] bioRxiv 2020.06.17.156554; doi: https://doi.org/10.1101/2020.06.17.156554.

  20. Ono, A 2010, ‘Viruses and lipids’. Viruses. vol. 2, no. 5, pp. 1236-1238. doi:10.3390/v2051236.

  21. Zoufaly, A, Poglitsch, M, Aberle, JH, Hoepler, W 2020 ‘Human recombinant soluble ACE2 in severe COVID-19’, The Lancet Respiratory Medicine Case Report, vol. 8, no. 11, pp. 1154-1158.

  22. Samavati, L, Uhal, BD 2020, ‘ACE2, Much More Than Just a Receptor for SARS-COV-2’. Front. Cell. Infect. Microbiol., [perspective article 05 June 2020] https://doi.org/10.3389/fcimb.2020.00317.

  23. Xie, X, Liu, Y, Liu, J et al. 2021 ‘Neutralization of SARS-CoV-2 spike 69/70 deletion, E484K and N501Y variants by BNT162b2 vaccine-elicited sera’. Nat. Med., vol. 27, pp. 620–621, https://doi.org/10.1038/s41591-021-01270-4.

  24. Hoffmann, M, Kleine-Weber, H, Schroeder, S, Krüger, N 2020 ‘SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor’, Cell, vol. 181, no. 2, pp. 271-280, https://doi.org/10.1016/j.cell.2020.02.052.

  25. Saraste J, Prydz, K 2021, ‘Assembly and Cellular Exit of Coronaviruses: Hijacking an Unconventional Secretory Pathway from the Pre-Golgi Intermediate Compartment via the Golgi Ribbon to the Extracellular Space’. Cells, vol. 10, no. 3, pp. 503, doi: 10.3390/cells10030503.

  26. Appenzeller-Herzog, C, Hauri, HP 2006, ‘The ER-Golgi intermediate compartment (ERGIC): in search of its identity and function’. J. Cell Sci., vol. 119, no. 11, pp. 2173–2183, doi:https://doi.org/10.1242/jcs.03019.

title picture: Transmission electron micrograph of SARS-CoV-2 virus particles, isolated from a patient. Image captured and color-enhanced at the NIAID Integrated Research Facility (IRF) in Fort Detrick, Maryland. Credit: NIAID.