Introduction
Computing power has increased greatly over the last few decades due to advances in technology. Despite this increase, there are various applications whose requirements exceed the available computing power of smaller general purpose machines. To tackle this, specialised machines are periodically constructed with the best available technology to provide a very large amount of concentrated compute power, called high performance computers (HPC), to give the best possible answers for such demanding applications. The next generation of HPC, expected sometime after 2020, is called Exascale, a name related to the amount of computation available.
In the last decade, a new breed of user of very large machines has appeared, those concerned with Big Data. Big Data problems, usually deal with less sophisticated models but with many more parameters, and try to choose the model parameters by analysing large amounts of data with relatively little associated computation. However, there are problems in this area for which the data are very expensive to generate. In this case it becomes important to be able to use more sophisticated models to be able to squeeze as much knowledge as possible out of the data. Such problems are at the juncture of HPC and Big Data in that they have large data sets to analyse, yet should exploit more sophisticated models through computation to make the most of the available data.
The ExCAPE project is about how to tackle such problems. The core of the project is on maths and software and how they work on HPC machines. However, to be able to advance the state of the art it helps to have a concrete problem to tackle. For this we take the chemogenomics problem, that of predicting the activity of compounds in the drug discovery phase of the pharmaceutical industry, leading to the project name (Exascale Compound Activity Prediction Engines - ExCAPE). Making such predictive models belongs to the field of Machine Learning.
The overall objectives of the project are to find methods and systems that can tackle large and complex machine learning problems, such as chemogenomics. This will require algorithms and software that make efficient use of the latest HPC machines. Creating these, along with preparing the data to give the system something to work on, is the main work of the project. The project is part of the H2020 European Initiative, the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020). The RHUL team contribute in the area of Uncertainty Quantification with their expertise in Conformal Prediction and Venn-Abers Prediction.
Reference:
Tom Ashby TechReport.
Technical reports
- Performance analysis of Mondrian Conformal Prediction for the top 10 targets in the ExCAPE dataset Khuong An Nguyen, internal report, August 2018.
- Multi-target learning Ilia Nouretdinov, internal report, August 2018.
- Inductive Venn–Abers prediction for regression Ivan Petej, internal report, July 2018.
- Venn–Abers partial ordering method applied to ExCAPE datasets Ivan Petej, internal report, July 2018.
- Prediction in bioinformatics applications by conformal predictors Alex Gammerman, invited talk at ICPB 2016, Pattaya, Thailand.
- Applying Conformal Predictions on Public BioAssay Data Paolo Toccaceli, poster presented at ICPB 2016, Pattaya, Thailand.
Publications
- Conformal Prediction of Biological Activity of Chemical Compounds Paolo Toccaceli, Ilia Nouretdinov, Alexander Gammerman; Annals of Mathematics and Artificial Intelligence, p.1-19, 2017.
- Combination of Conformal Predictors for Classification Paolo Toccaceli, Alexander Gammerman; Proceedings of Machine Learning Research. Vol.60, p.39-61, 2017.
- Conformal Predictors for Compound Activity Prediction Paolo Toccaceli, Ilia Nouretdinov, Alexander Gammerman; 5th International Symposium on Conformal and Probabilistic Prediction with Applications, 2016.
Researchers
-
Prof. Alex Gammerman
Principal Investigator -
Prof. Vladimir Vovk
Co-Investigator -
Prof. Zhiyuan Luo
Co-Investigator -
Dr. Lars Carlsson
Co-Investigator -
Dr. Khuong An Nguyen
Research Assistant -
Dr. Paolo Toccaceli
Research Assistant -
Dr. Ilia Nouretdinov
Research Assistant -
Dr. Ivan Petej
Research Assistant
Deliverable reports