rxnpca

Script is an attempt to use principal component analysis (PCA) to identify key characteristics allowing to differ 'efficient' (working) reactions from 'inefficient' (not-working) ones.

Description

For a random sample of chemical compounds, script employs retrosynthetic transforms developed at NU, to generate a space of all possible reactions. Then, it performs PCA for a predefined set of reaction descriptors.

Currently about a 100 of molecular descriptors are available. For each reaction a sum of values of a given molecular descriptor is calculated separately for its reactants and products. Compounds which H-depleted molecular graphs contains a single atom only (e.g. water 'O') are excluded, as many topological descriptors are not defined for them.

Each such a pair is treated as reaction descriptor corresponding to a given molecular descriptor.

Usage

To run the script type (currently all arguments are optional):

rxnpca.py --size <size> --seed <seed> --selection-type <type>

where

size

Sample size, i.e. number of chemicals for which reactions will be generated. Defaults to 1000.
seed

Pseudo-random number generator seed. Default to None, meaning that current system time will be used.
type

Selection method of reactions sharing the same product. Available choices are:
- all: all available reactions will be selected,
- random: a random pair of published and unpublished reactions will be selected for each product,
- popular: a pair of published and unpublished reactions having the highest value of popularity index will be picked. If absent, default to all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

rxnpca

Description

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

rxnpca

Description

Usage