- ChemPredictor is a Streamlit Web Interface that uses an artificial neural network (ANN) trained using TensorFlow for quantitative structure-property relationship (QSPR) analysis of molecules to predict 9 thermodynamic properties.
- Users can either enter the common name of the compound or draw the molecule using an interactive sketch tool and obtain predicted properties in real time.
- The project aims to provide a user-friendly interface for chemists, researchers, and students to analyze and predict the properties of organic molecules.
- The integration of Google's AI language model, Gemini Pro, allows users to obtain additional information about the predicted properties or the compound itself. (API Key required)
-
Predict Molecular Properties: Utilize an ANN model to predict 9 thermodynamic properties of organic molecules based on their molecular structure.
-
User-Friendly Interface: Develop a Streamlit WebUI that allows users to input the common name, SMILES string, or draw the molecule for property prediction.
-
Gemini LLM Integration: Integrate Google's AI language model, Gemini Pro, to provide additional information about the predicted properties or the compound itself. (API Key required)
Table 1 - Predicted properties of the QM9 dataset
No. Property Unit Description 1 μ D Dipole moment 2 α a³ Isotropic polarizability 3 homo Ha Energy of HOMO 4 lumo Ha Energy of LUMO 5 gap Ha Gap (lumo − homo) 6 U Ha Internal energy at 298.15 K 7 H Ha Enthalpy at 298.15 K 8 G Ha Free energy at 298.15 K 9 Cv cal/mol K Heat capacity at 298.15 K - Dipole moment (µ): Measurement of polarity of a molecule.
- Electronic polarizability (α): Tendency of non-polar molecules to shift their electron clouds relative to their nuclei.
- Energy of HOMO: Energy of the highest occupied Molecular Orbital.
- Energy of LUMO: Energy of the lowest unoccupied Molecular Orbital.
- Band Gap Energy: Energy of LUMO – Energy of HOMO.
- Internal energy of atomization (U): Energy required to break a molecule into separate atoms.
- Enthalpy of atomization (H): Amount of enthalpy change when a compound's bonds are broken, and the component atoms are separated into single atoms.
- Free energy of atomization (G): Extra energy needed to break up a molecule into separate atoms.
- Heat capacity (Cv): Amount of heat required to increase the temperature of the molecule by one degree.
-
Gemini LLM Integration:
-
Multiple Input Options: ChemPredictor supports various input methods for user convenience.
- SMILES Input: Predict properties by entering the SMILES string of the compound directly.
- Common Name Input: Input the common name of the compound (e.g., "Aspirin") to predict its properties.
- CSV File Upload: Upload a CSV file containing SMILES strings to predict properties for multiple molecules.
- Molecule Drawing: Use an interactive drawing board to draw the molecule for property prediction.
-
Similar Molecules: Shows similar molecules based on the predicted properties, providing insights into chemical similarity.
-
3D Molecular Visualization: Shows molecular structures in three dimensions, providing an insight into the compound's spatial arrangement."
To run the ChemPredictor WebUI, follow these steps:
-
Clone the repository:
git clone https://github.com/CubeStar1/ChemPredictorv2.git cd ChemPredictorv2
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
.\venv\Scripts\activate
- On Unix or MacOS:
source venv/bin/activate
- On Windows:
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
-
Open the WebUI in your browser:
- Open your web browser and navigate to http://localhost:8501 to use the ChemPredictor WebUI.
-
Predict Molecular Properties:
Choose one of the following ways to predict molecular properties:
-
Enter Common Name : Input the common name of the compound (e.g., "Aspirin") to predict its properties.
-
Enter SMILES String:
- Enter the SMILES string of the compound in the provided input field.
-
Upload SMILES CSV File:
- Upload a CSV file containing a set of SMILES strings for bulk prediction.
-
Draw Molecule:
- Utilize the interactive drawing board to draw the molecule for prediction.
-
-
Click on the Predict Button:
- Once the input is provided (SMILES string, CSV file, or drawn molecule), click on the "Predict" button.
-
View Predicted Values:
- QM9 Dataset: The QM9 dataset is used for training the ANN model. It contains 133,885 stable small organic molecules with up to nine heavy atoms (C, N, O, F). The dataset includes 9 thermodynamic properties for each molecule.
- Libraries Used: NumPy, Pandas, Matplotlib
- Overview: Data analysis and exploration are performed using NumPy, Pandas, and Matplotlib to understand the dataset's structure and properties.
- Libraries Used: RDKit, Mordred Molecular Descriptors
- Overview: RDKit is used to convert the SMILES strings into molecular structures, and Mordred is used to calculate 1826 molecular descriptors for each molecule.
- Libraries Used: TensorFlow with Keras, Scikit-learn
- Overview: An ANN model is trained using TensorFlow with Keras to predict the 9 thermodynamic properties of organic molecules. The model is trained on the QM9 dataset and evaluated using Scikit-learn.
- Libraries Used: Streamlit
- Overview: Streamlit is used to make a user-friendly Web Interface. User input is accepted via the WebUI, preprocessed, and predictions are made using the deployed ANN model.
- Gemini LLM Integration: Google's AI language model, Gemini Pro, is integrated to provide additional information about the predicted properties or the compound itself. (API Key required)
project-root/
├── app.py # Streamlit WebUI
│
├── scripts/
│ ├── image_handling.py # Image handling functions
│ ├── predict_property # Property prediction functions
│ ├── project_overview_page.py # Project overview page
│ ├── utils.py # Project utility functions
│
├── utilities/
│ ├── assets/ # Images
│ ├── dataset/ # QM9 dataset
│ ├── descriptors/ # Mordred descriptor files
│ ├── models/ # Trained model files
│ ├── scalers/ # Pickled MinMaxScaler file
│
├── images/ # Image storage
│
├── requirements.txt
├── packages.txt
│
└── README.md
For a quick demo, you can also access the hosted version of ChemPredictor at https://chempredictor.streamlit.app/