Skip to content

Latest commit

 

History

History
41 lines (24 loc) · 1.92 KB

README.md

File metadata and controls

41 lines (24 loc) · 1.92 KB

Predicting Bio-Activity

In order to make a structure based predict on the bio-activity of molecules, a list of features is generated with a KNIME workflow. This list is used as input for a Support Vector Machine (SVM) Predictor. In the script, the compounds contained in the input data file are used to train the predictor. Furthermore, the parameters of the predictor are adjusted by GridSearchCV: The predictor is trained multiple times with different combinations of available parameters and the best predictor is then used to predict the bio-activity.

Feature Calculation

The KNIME workflow featureGeneration.knar receives an input file containing SMILES and the predicted bio-activity of the molecules in a comma separated csv file. It generates a list of features for the molecules and outputs a comma separated file containing the activity, the SMILES structure the molecules corresponding features.

Classification

In order to run the program one has to specify

-train Path of the input csv file generated by the KNIME workflow, containing the training molecules -test Path of the input csv file generated by the KNIME workflow, containing the molecules to be tested -out Destination path of the resulting prediction csv

SVM Classifier

SVM_GridSearch.py -train trainingData_Features.csv -test testData_Features.csv -out SVM_GridSearch_res.csv

Built With

  • KNIME - Analytics Platform (3.7)
  • RDKIT - Software Package to read and analyse SMILE data (3.4.0v)
  • Python - Python programming language (3.6)
  • scikit-learn - Software Package for Machine Learning (v0.20.1)
  • matplotlib - 2D Plotting Library (2.2.2)
  • pandas - Datastructures and Dataframes (v0.23.4)

Authors

Jennifer Bödker Tobias Nietsch