In silico analysis of SARS-CoV-2 spike glycoprotein and insights into antibody binding

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in Wuhan, China in December 2019. Since then, COVID-19, the disease caused by SARS-CoV-2, has become a rapidly spreading pandemic that has reached most countries in the world. So far, there are no vaccines or therapeutics to fight this virus. Here, I present an in silico analysis of the virus spike glycoprotein (recently determined at atomic resolution) and provide insights into how antibodies against the 2002 virus SARS-CoV might be modified to neutralize SARS-CoV-2. I ran docking experiments with Rosetta Dock to determine which substitutions in the 80R and m396 antibodies might improve the binding of these to SARSCoV-2 and used molecular visualization and analysis software, including UCSF Chimera and Rosetta Dock, as well as other bioinformatics tools, including SWISS-MODEL. Supercomputers, including Bridges Large, Stampede and Frontera, were used for macromolecular assemblies and large scale analysis and visualization.


Introduction
, the disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), presents with symptoms of fever, severe respiratory illness and pneumonia. As of April 15, 2020, SARS-CoV-2 has infected approximately 2,000,000 people, with more than 120,000 deaths. The virus is spreading exponentially worldwide and has been declared a public health emergency by the World Health Organization (WHO) (World Health Organization 2020). SARS-CoV-2 belongs to the genus Betacoronavirus and is closely related to several bat coronaviruses (Chen et al. 2020). It uses a spike glycoprotein (Chand et al. 2017), the structure of which has recently been determined at atomic resolution (PDB ID: 6VSB) to enter its target cell (Wrapp et al. 2020). The published structure shows that the spike protein has a receptor binding domain (RBD), which binds to the angiotensin-converting enzyme 2 (ACE2) receptor on host cells (Yan et al. 2020). The RBD can exist in two conformations, referred to as "up" (receptoraccessible) and "down" (receptor-inaccessible) (Wrapp et al. 2020). The in silico structural analysis showed that ACE2 and potential antibodies bind in the same location on the spike protein (Hwang et al. 2006,Sui et al. 2004). An antibody should thus be very effective in preventing viral spread by blocking the ACE2 binding site in the RBD. Two potent neutralizing human monoclonal antibodies, 80R and m396 (Prabakaran et al. 2006, Hwang et al. 2006, which bind to the RBD of SARS-CoV, the virus responsible for the 2002-2004 outbreak of severe acute respiratory syndrome, did not, however, bind to the RBD of SARS-CoV-2. Here, I report insights into sequence differences that affect the ability of 80R and m396 to bind to the SARS-CoV-2 RBD (Cuesta et al. 2010). Understanding why 80R and m396 do not bind to the SARS-CoV-2 spike protein could pave the way to engineering new antibodies that are effective against SARS-CoV-2. Mutated versions of the 80R and m396 antibodies could then be produced and administered as a therapeutic to fight the disease and prevent infection (Norman et al. 2019, Zhao et al. 2018. The ACE2 dimeric structure in complex with the SARS-CoV-2 RBD has also recently been determined at atomic resolution (PDB ID: 6M17) (Yan et al. 2020). I have now built a structural model of SARS-CoV-2 infection ( Fig. 1), which provides further insights into antibody binding and should contribute to solving the complex problem of preventing viral spread (Pettersen et al. 2004).  (Lyskov and Gray 2008, Lyskov et al. 2013, Sircar and Gray 2010, Weitzner et al. 2017 were used for docking experiments between the RBD of SARS-CoV-2 and in silicomutated 80R and m396 antibodies according to their described protocols. PDBEPISA server was used to analyze the structure energies and residue-residue interactions while the Therapeutic Antibody Profiler server was used for developability studies (Dunbar et al. 2016).

Analysis of SARS-CoV-2 spike glycoprotein
The tertiary structure of the SARS-CoV spike protein bound to ACE2 (Song et al. 2018, Wan et al. 2020, obtained from RCSB (PDB ID: 6ACG), and the tertiary structure of the SARS-CoV RBD bound to 80R (PDB ID: 2GHW) were superimposed, and the primary sequences were aligned (Fig. 2). A structural model for the SARS-CoV-2 RBD was constructed using SWISS-MODEL and superimposed by sequence alignment and tertiary structures (Fig. 2). In the secondary structures of the spike proteins, helices are shown in red, strands are shown in yellow and loops are shown in green; ACE2 is brown and 80R is cyan. I found that the 80R antibody and ACE2 bind in the same location, which perfectly explains how the antibody prevents infection by competing with ACE2. The structure of the SARS-CoV-2 spike protein (PDB ID: 6VSB) RBD has many loops missing (data not shown). Because these are essential for binding to ACE2, the model was constructed with a valid structure to run docking experiments. Furthermore, glycans in the PDB 6VSB show no interference with 80R binding even in predicted sites but not in the crystal structure (Fig. 2). Structural model of SARS-CoV-2 infection. This structural model was built with UCSF Chimera using high-performance computers (Bridges Large and Frontera). The model shows 16 viruses, with the spike proteins shown in green (PDB ID: 6VSB) and an actual lipid bilayer membrane, with ACE2 dimers shown in magenta. All these structures are at atomic resolution. The length of the membrane is approximately 1 micrometer.

Introduction of in silico mutations into the 80R antibody
Sequence alignments between SARS-CoV RBD and SARS-CoV-2 RBD were built in UCSF Chimera. There are many sequence differences between SARS-CoV and SARS-CoV-2 in the 80R-RBD interface, which explains why the 80R antibody binds with high affinity to the spike protein of SARS-CoV but not to the spike protein of SARS-CoV-2. In the SARS-CoV-2 RBD, polar residues are replaced by neutral residues, which disrupts the binding interactions between 80R and the RBD in the SARS-CoV-2 spike protein. Insertion of a glycine residue at position 482 also twists one of the interacting loops located at E484, which then clashes with the 80R antibody in the superimpositions. To avoid this clash and allow better antibody binding to the RBD in SARS-CoV-2, the 80R partner residue should be replaced by a different residue. I carefully selected six alternative residues, introduced these one at a time, and ran docking experiments using Rosetta Dock Gray 2008, Lyskov et al. 2013) to explore the new interactions (Fig. 3). Aromatic-aromatic interactions between residues that are within 8 Å of each other are very important for protein structure and protein-protein recognition (Lanzarotti et al. 2011). According to the structural analysis in Chimera, there are many aromatic residues in the RBD of SARS-CoV-2 and I therefore introduced aromatic residues into 80R to match the RBD aromatic residues. My strategy was to thus increase aromatic-aromatic interactions, avoid the clash with E484 and maintain the solubility of the antibody by introducing substitutions, without introducing major changes in the 80R antibody. In support of my findings for the 80R Structural analysis of SARS-CoV spike glycoprotein. In A the SARS-CoV spike protein (PDB ID: 6ACG) is shown bound to ACE2 (brown) and 80R antibody (cyan), superimposed on the same binding site. In B the spike protein is shown bound only to the 80R antibody (PDB ID: 2GHW), with the structural model of the RBD of the SARS-CoV-2 spike protein (magenta) containing the missing loops. This homology model served as the basis for the docking experiments. In C it is shown a spike colored by subunit and showing the glycans. There are only two possible glycans in RBD region at 331 and 343 and neither of these sites affect the 80R binding. antibody epitope, an epitope for another SARS-CoV-2 antibody (CR3022) was recently reported to contain many aromatic residues (Yuan et al. 2020).
I made a structural model of mutated 80R with SWISS-MODEL and positioned this model close to the RBD region. I used Rosetta Dock in Stampede2 supercomputer to run docking experiments with the following mutations in 80R R102F, S103F, R152F, S186F, T206G, S210F and N227F. These mutations were intended mainly to increase aromatic-aromatic interactions and to avoid contact with E484 (T206G) (Fig. 3). The following aromatic interactions were achieved: F102-F103-Y505, F186-F456, F152-F210-F486, and F227-Y449. These interactions increased the binding affinity between the two partners, as demonstrated by the funnel type charts from Rosetta Dock results (Fig. 4), which show how the structures converge to the lowest possible energy state compare to wild type. Furthermore, PDBEPISA (Krissinel and Henrick 2007) server was used to get the energies of the complexes with satisfactory results for the new mutated 80Rm. Finally, the Therapeutic Antibody Profiler (TAP) (Dunbar et al. 2016) tool was used to analyze the developability of this new mutated antibody and it shows all green flags for five important characteristics to take in consideration such as hydrophobicity and charges. Docking interface between the modified 80R antibody and the RBD of the SARS-CoV-2 spike protein. The model shows the structural interface with the 80R antibody above and the RBD below. The seven substitutions in 80R are shown in magenta and RBD residues are shown in cyan. Notice how the substitutions in 80R allow new aromatic-aromatic interactions that improve binding to the RBD and are not present in wild type 80R. E484 is shown pointing towards the beta strand of 80R and a glycine substitution was therefore introduced to avoid clashes.

Introduction of in silico mutations into the m396 antibody
Similarly, m396 antibody that neutralizes SARS-CoV binding to its RBD domain was structurally analyzed and five mutations were introduced. In this case, the analysis pointed to electrostatic interactions to maximize and one aromatic. Therefore, five mutations were introduced: in the heavy chain T52F, I56E, N58K, Q61E; and in the light chain S94E. A structure with these mutations was put close to SARS-CoV-2 RBD and docking experiments were run in Stampede2. The results (Fig. 5) show that these mutations improve binding to the level of binding seen with SARS-CoV and wild type antibody (Fig.  6). m396 mutant shows formation of 9 new salt bridges which provide great binding to the target SARS-CoV-2 RBD (data not shown). Energy analysis from PDBEPISA server shows that binding is improved significantly compared to a wild type m396 and SARS-CoV-2 RBD. Furthermore, the developability flags show green even though one is yellow but just for a short margin which may be addressed after testing in wet lab experiments if it is an issue. Docking energies and interface score charts. A shows the Rosetta Dock results, binding energies from PDBEPISA server in B shows good results for ΔG. Developability of this antibody shows all green flags in C.

Figure 5.
Docking interface between the modified m396 antibody and SARS-CoV-2 spike protein RBD. In magenta is m396 mutant and in cyan SARS-CoV-2 RBD. These five mutations introduce many electrostatic interactions between the partners therefore stabilizing very much the binding. m396 mutations docking results. A shows Rosetta Dock funnels for the original partners SARS-CoV and m396, SARS-CoV-2 and m396 and the SARS-CoV-2 and mutated m396. Notice how the binding is improved to the level of the original partners. B shows the ΔG energies again notice the improvement of binding when the mutations are introduced. Finally, C shows the developability flags with only one warning that is not that far from green flag.

Discussion
Docking experiments showed that appropriate amino acid substitutions in 80R and m396 should increase binding interactions between the antibodies and the SARS-CoV-2 RBD, thus providing new antibodies with sufficient affinity for the SARS-CoV-2 spike protein to neutralize the virus. This new antibody should be expressed in vitro to study its solubility, stability, specificity and binding kinetic, and these results would be the basis for further mutations to correct some of these parameters or even improve the affinity. This methodology could be the basis for a rapid and effective generation of neutralizing therapeutic antibodies against COVID-19. In silico analysis is a very useful tool that structural bioinformaticians can use to guide mutagenesis to achieve a goal, in this case better affinity for the RBD of the SARS-CoV-2 spike protein. The results obtained using this relatively new branch of science must be taken with caution, but the method is becoming increasingly successful with the rapid improvement in bioinformatics tools. This type of analysis was not possible a decade ago, when scientists had to conduct mutagenesis experiments in wet laboratories. It is now possible for many scientists to use a bioinformatics approach to shorten the time needed to find new therapeutics.Further in silico experiments, including molecular dynamics simulations, can be performed to analyze interactions in real time, and in the future, I plan to conduct these types of experiment. Humanity is in a race to find therapeutics and/or vaccines against COVID-19 and my experimental analysis and findings should help the scientific community to quickly discover novel therapeutics.