Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models

Reading time: 6 minute
...

📝 Original Info

  • Title: Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models
  • ArXiv ID: 1706.06689
  • Date: 2018-08-15
  • Authors: Researchers from original ArXiv paper

📝 Abstract

In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google's Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed "Chemception", a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. We develop Chemception without providing any additional explicit chemistry knowledge, such as basic concepts like periodicity, or advanced features like molecular descriptors and fingerprints. We then show how Chemception can serve as a general-purpose neural network architecture for predicting toxicity, activity, and solvation properties when trained on a modest database of 600 to 40,000 compounds. When compared to multi-layer perceptron (MLP) deep neural networks trained with ECFP fingerprints, Chemception slightly outperforms in activity and solvation prediction and slightly underperforms in toxicity prediction. Having matched the performance of expert-developed QSAR/QSPR deep learning models, our work demonstrates the plausibility of using deep neural networks to assist in computational chemistry research, where the feature engineering process is performed primarily by a deep learning algorithm.

💡 Deep Analysis

Deep Dive into Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models.

In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google’s Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed “Chemception”, a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. We develop Chemception without providing any additional explicit chemistry knowledge, such as basic concepts like periodicity, or advanced features like molecular descriptors and fingerprints. We then show how Chemception can serve as a general-purpose neural network architecture for predicting toxicity, activity, and solvation properties when trained on a modest database of 600 to 40,000 compounds. When compared to multi-layer perceptron (MLP) deep neural networks trained with ECFP fingerprints, Chemception slightly outperforms in activity and solvation prediction and slightly underperforms in

📄 Full Content

1

Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models

Garrett B. Goh,*,† Charles Siegel, Abhinav Vishnu,† Nathan O. Hodas,‡ Nathan Baker†

†High Performance Computing Group, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99354 ‡Data Science and Analytics, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99354

  • Corresponding Author: Garrett B. Goh
    Email: garrett.goh@pnnl.gov
    Keyword: Deep learning, deep neural networks, cheminformatics, artificial intelligence, computer vision, convolutional neural network, computational chemistry, QSAR, QSPR 2

Abstract In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google’s Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed “Chemception”, a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. We develop Chemception without providing any additional explicit chemistry knowledge, such as basic concepts like periodicity, or advanced features like molecular descriptors and fingerprints. We then show how Chemception can serve as a general- purpose neural network architecture for predicting toxicity, activity, and solvation properties when trained on a modest database of 600 to 40,000 compounds. When compared to multi-layer perceptron (MLP) deep neural networks trained with ECFP fingerprints, Chemception slightly outperforms in activity and solvation prediction and slightly underperforms in toxicity prediction. Having matched the performance of expert-developed QSAR/QSPR deep learning models, our work demonstrates the plausibility of using deep neural networks to assist in computational chemistry research, where the feature engineering process is performed primarily by a deep learning algorithm.

3

Introduction ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is an annual assessment and competition of various image classification computer algorithms for computer vision applications. In 2012, deep learning algorithms were first introduced to this community by Hinton and co-workers,1 and their deep neural network (DNN) model, AlexNet, achieved a 16.4% top-5 error rate, far exceeding the 25-30% error rate for state-of-the-art models employed at that time.1 Since then, DNN-based models have become the dominant algorithm used in computer vision, and human accuracy (less than 5% for top-5 error) was achieved by 2015, approximately 3 years after the entry of deep learning into this community.2-3 More recently, deep learning has also begun to emerge in other fields, such as high-energy particle physics,4-5 astrophysics,6 and bioinformatics,7- 8 In chemistry, a few notable recent achievements include DNN-based models winning the Merck Kaggle challenge for activity prediction in 2012 and the NIH Tox21 challenge for toxicity prediction in 2014. Since then, numerous research groups have demonstrated the impact of DNN- based models to predict a wide range of properties, including activity,9-12 toxicity,13-14 reactivity,15- 17 solubility,18, ADMET,19 docking,20 atomization energies and other quantum properties.21-23 In recent reviews across various chemistry sub-fields, DNN-based models typically perform as well as or better than previous state-of-the-art models based on traditional machine learning algorithms such as support vector machines, and random forests.24-25
Unlike other machine learning algorithms, including those used by past and current computational chemistry applications, deep learning distinguishes itself in the use of a hierarchical cascade of non-linear functions. This allows it to learn representations and extract necessary features (which are conceptually similar to molecular descriptors and fingerprints in the context of chemistry) from its input data to predict the desired property of interest. This representation 4

learning ability is the key capability that has enabled deep learning to make significant and transformative impacts in its “parent” field of computer vision. Prior to the introduction of deep learning, computer vision researchers invested substantial efforts in developing appropriate features26; such expert-driven development has been mostly replaced by deep learning models that automatically develop their own set of internal features, and have exceeded human-level accuracy in certain tasks.2-3 In their current state, deep learning algorithms are not artificial general intelligence or strong “AI” systems and, as such, cannot be replace human creativity or intelligence in the scientific research process. Nevertheless, it is undeniable that deep learning have successfully demonstrated performance that is as good as, and at

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut