Computer Simulation of Biological Processes

Project Leader
André H. Juffer, Ph.D.
Biocenter Oulu and Faculty of Biochemistry and Molecular Medicine, University of Oulu

Background and Significance

Computer simulation techniques have become important tools to study the properties of biomolecules and their interactions with other molecules. These techniques are employed with the general objective of predicting and explaining physical quantities and phenomena, such as protein-ligand associations, reaction rate constants of enzyme-catalyzed reactions, acid dissociation constants of proteins, protein domain motion, protein folding and stability, and the like. The underlying computational model provides direct insight into the workings of these molecules, such that statements about their functional properties can be offered. Such knowledge is frequently relied upon, for instance, in the drug design industry to speed up the process of drug finding and drug target analysis.

Computer simulation and modeling are not limited to the study of proteins, but can be and are being applied to many different domains of science. This project is specifically concerned with the study of biological processes. Whereas bioinformatics is largely concerned with the generation and analysis of biological data, computer simulation and modeling are applied with the objective of explaining the behavior of a given biological process. Each simulation model typically deals with different time- and length scales. Chemical reactions at the active site of an enzyme occur in very short timescales (fs), protein dynamics generally take place in the range of ps to ns or even longer timescales, interfacial processes (e.g. protein diffusion at membrane surfaces) cover the ms to ms timescale, protein translation falls in the ms to s range, and the growth of tumors occurs over months to years. Each model must therefore involve different simulation and modeling methods.

Simulation and modeling methods rely on the application of techniques that are based in physics (theoretical biophysics), mathematics and chemistry. This project is predominantly concerned with the development and application of such techniques. These range from very detailed QM/MM (quantum mechanics/molecular mechanics) and dynamic models of proteins to multiple-scale approaches that combine, for instance, diffusion models with cellular automata (CA) in the case of brain tumor simulation.

Recent Progress

Comparison of non-sequential sets of protein residues
A methodology for performing sequence-free comparison of functional sites in protein structures is introduced. The method is based on a new notion of similarity among superimposed groups of amino acid residues that evaluates both geometry and physico-chemical properties. The method is specifically designed to handle disconnected and sparsely distributed sets of residues. A genetic algorithm is employed to find the superimposition of protein segments that maximizes their similarity. The method was evaluated by performing an all-to-all comparison on two separate sets of ligand-binding sites, comprising 47 protein-FAD (Flavin-Adenine Dinucleotide) and 64 protein-NAD (Nicotinamide-Adenine Dinucleotide) complexes, and comparing the results with those of an existing sequence-based structural alignment tool (TM-Align). The quality of the two methodologies is judged by the methods’ capacity to, among other, correctly predict the similarities in the protein-ligand contact patterns of each pair of binding sites. The results show that using a sequence-free method significantly improves over the sequence-based one, resulting in 23 significant binding-site homologies being detected by the new method but ignored by the sequence-based one. (Garma and Juffer, 2016, see also Figure 1).


Fig. 1. Correlation between the different similarity measures for all the comparisons on both the FAD and the NAD sets. The R2 values annotated on each panel refer to the value of the Pearson correlation coefficient for the linear regression on each pair of values. The Precision given on each plot refers to the fraction of observations that differ by less than 0.05 from the value on the regression curve.

Structure-based classification of FAD binding sites
A total of six different structural alignment tools (TM-Align, TriangleMatch, CLICK, ProBis, SiteEngine and GA-SI) were assessed for their ability to perform two particular tasks: (i) discriminating FAD (flavin adenine dinucleotide) from non-FAD binding sites, and (ii) performing an all-to-all comparison on a set of 883 FAD binding sites for the purpose of classifying them. For the first task, the consistency of each alignment method was evaluated, showing that every method is able to distinguish FAD and non-FAD binding sites with a high Matthews correlation coefficient. Additionally, GA-SI was found to provide alignments different from those of the other approaches. The results obtained for the second task revealed more significant differences among alignment methods, as reflected in the poor correlation of their results and highlighted clearly by the independent evaluation of the structural superimpositions generated by each method. The classification itself was performed using the combined results of all methods, using the best result found for each comparison of binding sites. A number of different clustering methods (Single-linkage, UPGMA, Complete-linkage, SPICKER and k-Means clustering) were also used. The groups of similar binding sites (proteins) or clusters generated by the best performing method were further analyzed in terms of local sequence identity, local structural similarity and conservation of analogous contacts with the FAD ligands. Each of the clusters was characterized by a unique set of structural features or patterns, demonstrating that the groups generated truly reflect the structural diversity of FAD binding sites. (Garma et al, 2016, See also Figure 2).



Fig. 2. A graph representation of the ligand cavity of chain A of 1frn. Each node in the graph represents a residue in the ligand cavity. An edge between two nodes exists whenever the corresponding residues are closer than 5 Å in the three-dimensional structure. The nodes are scaled in proportion to their number of connections with other nodes: TYR77, SER115, and VAL78 display the highest number of connections. The present work defines a pattern as an ensemble of residues that form a connected subgraph with a central node. In the given graph, the nodes from one of the existing patterns are highlighted in red. The corresponding central residue, TYR77, is labeled as C1. B: A cartoon representation of the backbone of chain A of 1frn. The Cα atoms of the residues in the ligand cavity are represented as spheres. The residues corresponding to the nodes on the left are colored in red.

Thiolase
This subproject is concerned with the importance of hydrogen bond networks (HBNs) for the efficient catalysis of the Zoogloea ramigera thiolase active site (Figure 4). This will be investigated by in silico modeling and simulation and structural enzymological approaches. These studies will benefit very much from the presence of a range of key structures of this thiolase, complexed with various substrates, substrate analogs and intermediates. Two HBNs will be studied, being related to the properties of oxyanion hole 1 (OAH1, stabilizing an enolate intermediate) and oxyanion hole 2 (OAH2, stabilizing a tetrahedral intermediate) of thiolase. The hydrogen bond donors (HBDs) of these oxyanion holes are from main chain, side chain or water molecules. In the HBN of OAH1 water molecules play a key role, whereas in OAH2 a peptide-mediated HBN is involved. This study capitalizes on the enormous amount of structural data already available on the thiolase system. In addition, the computational studies will be complemented and validated by structural enzymological studies (collaboration with Prof. Wierenga’s group, See also Figure 3).


Fig. 3. Evolution of the amount of buried water (black line) and the length of the water trail (red line) as a function of time. The top left inset displays the catalytic site of the thiolase, with S-acetylated cysteine and bound acetyl-CoA (green) in the oxyanion hole OAH2 formed by an histidine and a water molecule (red). The latter is part of a water trail.  The top right inset displays an example of a buried water analysis. The alpha shape surface (grey) defines an outer surface of the protein. Inside the surface, buried water molecules (blue cross) and the water trail (red) are detected by an in-house developed algorithm that analyses the trajectory generated from molecular dynamics simulations. The properties of the water trails may be affected by mutations altering the catalytic efficiency of thiolase.

Proton transfer events in coarse-grained molecular dynamics
Any theoretical molecular model for describing biological processes involving proteins should account for possible variations in the protein's charge distribution due to proton (dis)association by ionizable groups in the protein. In this project we propose a novel model for charge fluctuations in coarse-grained molecular dynamics (CGMD), a simulation methodology that has become one of the most important techniques for the study of proteins and membranes at large length and long time scales. Conventional CGMD is based on Newtonian mechanics. Like standard atomistic molecular dynamics (MD), it computes forces on CG particles or “beads” to generate a sequence of states (particle coordinates and velocities) as a function of time, from which numerous molecular, thermodynamic and time-dependent properties of the system can be calculated. However, charge fluctuations in CGMD are completely ignored. This subproject is aimed at correcting this serious omission (See also Figure 4).


Fig. 4. Simple simulation of a stochastic current I(t) (in blue) between two coarse-grained  particles. The x-axis represents time t. The time scale is not representative of the proton transfer process. The current is a consequence of a proton transfer or jump event, the occurrence of which is controlled by the state of the system, and results in new current or a change of direction of ongoing current (highlighted by vertical lines at the jump time). The current affects the charges Q(t) of the particles.

Future Goals

  1. Development of a novel method for the inclusion of proton binding in coarse-grained simulation of proteins.
  2. Deriving scientifically sound and probabilistic optimal principles for coarse-grain molecular systems for simulating protein adsorption at interfaces.

Publications 2016-

Garma L, Juffer AH. Comparison of non-sequential sets of protein residues. Comput Biol Chem 61:23-39, 2016.

Garma L, Medina M, Juffer AH. Structure-based classification of FAD binding sites: a comparative study of structural alignment tools. Proteins 4:1728–1747, 2016.

Pietilä I, Prunskaite-Hyyryläinen R, Kaisto S, Tika E, van Eerde AM, Salo AM, Garma L, Miinalainen I, Feitz WF, Bongers EMHF, Juffer AH, Knoers NVAM, Renkema KY, Myllyharju J, Vainio SJ. Wnt5a Deficiency Leads to Anomalies in Ureteric Tree Development, Tubular Epithelial Cell Organization and Basement Membrane Integrity Pointing to a Role in Kidney Collecting Duct Patterning. PLoS ONE 11:e0147171, 2016.

Research Group Members


Project Leader:
André H. Juffer, Ph.D. (Biocenter Oulu)

Ph.D. Students:
Pierre Leprovost (Biocenter Oulu)
Ioannis Beis (UO strategic funding targeted for BF operations)
Outi Lampela, M.Sc.
Leonardo Garma, M.Sc.

Undergraduate Students:
Ella Rukajärvi, B.Sc.

Interns:
Florian Borse, M.Sc.

Foreign Scientists, 5

Last updated: 10.5.2017