Computing power has increased greatly over the last few decades due to advances in technology. Despite this increase, there are various applications whose requirements exceed the available computing power of smaller general purpose machines. To tackle this, specialised machines are periodically constructed with the best available technology to provide a very large amount of concentrated compute power, called high performance computers (HPC), to give the best possible answers for such demanding applications. The next generation of HPC, expected sometime after 2020, is called Exascale, a name related to the amount of computation available.
In the last decade, a new breed of user of very large machines has appeared, those concerned with Big Data. Big Data problems, usually deal with less sophisticated models but with many more parameters, and try to choose the model parameters by analysing large amounts of data with relatively little associated computation. However, there are problems in this area for which the data are very expensive to generate. In this case it becomes important to be able to use more sophisticated models to be able to squeeze as much knowledge as possible out of the data. Such problems are at the juncture of HPC and Big Data in that they have large data sets to analyse, yet should exploit more sophisticated models through computation to make the most of the available data.
The ExCAPE project is about how to tackle such problems. The core of the project is on maths and software and how they work on HPC machines. However, to be able to advance the state of the art it helps to have a concrete problem to tackle. For this we take the chemogenomics problem, that of predicting the activity of compounds in the drug discovery phase of the pharmaceutical industry, leading to the project name (Exascale Compound Activity Prediction Engines - ExCAPE). Making such predictive models belongs to the field of Machine Learning.
The overall objectives of the project are to find methods and systems that can tackle large and complex machine learning problems, such as chemogenomics. This will require algorithms and software that make efficient use of the latest HPC machines. Creating these, along with preparing the data to give the system something to work on, is the main work of the project. The project is part of the H2020 European Initiative, the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020). The RHUL team contribute in the area of Uncertainty Quantification with their expertise in Conformal Prediction and Venn-Abers Prediction.
Tom Ashby TechReport.