Monday 20 December 2021
The last weekend (10-11-12 December 2021) took place the third edition of bitsxlaMarató, a Hackathon organized by FIB, Hackers@UPC, LleidaHack, the BSC and the Escola Superior d'Infermeria del Mar. As its name says, Hackaton collaborates every year with the Marató de TV3 through the donation of participants, sponsors and and the entire university community of the UPC and beyond.
This edition was about mental health, in line with the theme of this year's Marathon. The objective was to create a space for collaboration between professionals in the fields of technology and health to, together, seek and develop solutions to face all the challenges posed by mental health.
The organizers and the collaborating entities proposed a total of 4 challenges that each team could try to solve. You can find more information at their web.
The inLaber team
The team (partially) inLaber "Orenetes" (formed, in part, by Gonzalo Recio, Gerard Calvo and Jordi Cluet) achieved the first place in the challenge "Interacció de proteïnes. Ens fiquem d’acord?” proposed by the BSC and the spin-off Nostrum Biodiscovery.
Schizophrenia, bipolar disorder and depression are mental illnesses that affect more than 25% of the population throughout their lives. The relationship between some of the protein-protein interactions involved in these diseases has now been established. However, there is no experimental structure for most of these interactions, thus limiting their study. For this reason, so-called docking programs are needed, which, given two proteins, return a multitude of possible pairs of positions in which the two interact. However, these programs are not (in general) able to sort the different structures obtained by relevance by relevance. In addition, the metrics used by different programs to rank the best and worst predictions are not comparable.
In this context, this challenge aimed to propose algorithms for data analysis (clustering, distance analysis,…) in order to find a consensus between the predictions returned by different protein docking programs. The premise for finding this consensus was that the more often an interaction between two proteins is repeated, the more relevant their structure is. The aim is to increase the reliability of the predictions of these programs.
Therefore, the challenge was to analyze a set of between 100,000 and 200,000 structures obtained for experimentally known protein systems, so that the predictions of the algorithm could then be validated with a reference structure.
First, some definitions to better understand the context of our solution:
- Each structure analyzed consists of an interaction between two proteins, A and B.
- A is always fixed in space.
- B (called ligand) is a rigid protein (does not change in size or shape) that can be found rotated and / or moved around A.
- Each position where we find B interacting with A (according to the corresponding docking algorithm) is called pose.
What it does
In this project we introduce the ELE (Essence Ligand Encoding) algorithm; a pose clustering algorithm, which encodes each ligand as its three most distant atoms. We show that using ELE, the execution time of these consensus algorithms can be reduced by up to 99%, while maintaining the same accuracy.
How it does it
The key to ELE is in the representation of each ligand. Because we are only dealing with rigid-body ligands (which only rotate and move, but do not change shape or size), all three-dimensional information of the protein can be encoded only by its position in space and its 3D rotation. Alternatively, with only three of the points (atoms) of this molecule we can approximate this information well. We decided to take the three farthest points apart in order to best represent their three-dimensional position.
Using this coding, we are able to drastically reduce the information needed to represent each interaction. Because the main protein (A) does not move, it does not add information, so we can ignore it. Furthermore, since we can encode the ligand (B) only by its three most distant atoms, we can reduce all the information required for clustering algorithms to only the three coordinates of these three atoms. Therefore, we can represent each interaction between two proteins with only a 9-coordinate vector.
Using the ELE trick, clustering algorithms have much less data to manage and are therefore much faster.
The next and final step was to apply clustering methods in order to group all the poses that came out as the output of the different docking programs and thus see in which positions we most often find an interaction between the central protein and the ligand.
We proposed two different clustering algorithms, DBSCAN and K-Means, which gave us similar results. Below you can see an example of protein A (in the center) and the top 10 clusters obtained with their ligands, as well as the best cluster obtained by this molecule.
Below are two tables comparing the consensus algorithm used until then and ELE (ours):
Time comparison between existing implementation and ELE
As you can see in the table, using either of the two clustering algorithms we proposed combined with ELE, we got excellent results with respect to the implementation from which we started (a 99% reduction in runtime). In addition, we were able to verify that the accuracy of the results was equal to or better than the algorithm implemented until then.
In addition, as an extra add-on, we created an interface for viewing interactions and proteins in a personalized way, very useful for understanding and analyzing the results:
The experience of the InLab members we have participated in is very positive. On one hand, we have been able to learn a lot, both in terms of health issues (in this case, how to deal with large amounts of biological data) and in terms of technology issues (although it seems untrue, you can learn a lot in less than 48 hours). In addition, we have been able to get into the field of mental health and see how important it is in people's quality of life.
We are very pleased, both with the result of our work, that we believe that it can really be used in future research, and with the fact that we have contributed our grain of sand to a solidarity event such as bitsxlaMarató. We encourage you to participate in future editions of this Hackaton and many more that are organized around the world every year.