-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mark atoms inside rings #12
base: main
Are you sure you want to change the base?
Conversation
src/header_generation.py
Outdated
if aidx in inner_atoms: | ||
# this atom cannot have substituents, so we substitute it with a query with a fixed degree | ||
atom = mol.GetAtomWithIdx(aidx) | ||
query_atom = Chem.AtomFromSmarts(f"[#4&D{atom.GetDegree()}]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a reason for #4
? I think dummy atoms/#0
would make sense but I remember you mentioning there being a reason for having specific atom types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that's because the &D2 part does not show up in CXSMILES (I need your help for this! It was working from C++), so I temporarily put #4 to make sure the atoms are correct... #0 works for me, #6 also works and maybe leads to slightly more readable SMILES. I saw in the code that there's a plan to use wildcards on the bonds too, that also makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D2
is a SMARTS feature, so it's not going to show up in output SMILES.
You could try generating CXSMARTS by calling Chem.MolToCXSmarts()
To get the github actions bot to show pictures of the new/updated SMILES, you have to run this script and rewrite |
sure, as soon as I can get the wildcards to show up in the SMILES... |
I think you could also create the 'gallery' using this script and attach it manually to the PR https://github.com/rdkit/molecular_templates/blob/main/src/update_gallery.py I'm not sure how great the rdkit image generation will be though with the SMARTS |
after talking with @rachelnwalker I reverted the changes to the header as @ricrogz suggested. Once this code is approved we can update the .smi file in a second review. Here's how the last three smiles look with the new code
|
@greglandrum @rachelnwalker the new template atoms are marked as [!#1], which is fine by my, but I think will match the atoms we mark as outside of a ring system here https://github.dev/rdkit/rdkit/blob/eaf544ab6f7df12dd0f738474e44f9792d97d971/Code/GraphMol/Depictor/EmbeddedFrag.cpp#L429-L433. Maybe [!#200] would be better? What do you think is the best solution in terms of speed/template readability? |
Yeah I think |
implemented [!#200] for every atom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but it would be helpful to see the gallery of template depictions if possible.
I've looked at a lot more and they all seem correct. The only thing is that centers with 3 substituents at 120° sometimes get marked as "internal" even if they could actually get a substitution: due to the symmetry of the system the average of the substituents falls exactly on the atom, so its direction is undetermined. I don't think that in general we would ever want a substitution there... maybe we can mark ALL centers with more than 2 bonds as [!#200&D3] so they never get substitutions. @rachelnwalker what do you think? |
The third example you have has an atom with three substituents that we probably want to be able to attach to, so I don't think we should mark all of the atoms. I think its OK to not get a substitution with atoms with 3 equally spaced neighbors, though. This lgtm though! In order to force the header update, I think we will need to make a second PR after this one (maybe add a template?) or otherwise trigger the GH action that generates the header and makes the PR |
sounds good, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One requested change
I would also suggest running yapf or black or some kind of python code formatter across this in order to cleanup the formatting and oddly indented comments. That's not a huge deal, but I do think it helps with maintainability to have consistently formatted code
@ZontaNicola just a quick ping in case you missed the notification of my review |
thanks, I'm aware of it. I should be able to get back to this case soon |
@greglandrum I ran yapf and removed the unused function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look good to me.
@rachelnwalker or @ricrogz will need to confirm, but I think we should not be modifying the template_smiles.h
directly in a PR
(I know you included the changes here in order to show what will happen, but I think that needs to be reverted?)
adding functions to mark atoms inside rings so that we don't match molecules that have substitutions to those atoms.
note that I'm doing something wrong, because the query does not show up in the smiles, but the atoms are correctly identified (i.e. the atomic number is changed to 4...)
PLEASE DO NOT MODIFY template_smiles.h DIRECTLY
Add your templates to the templates.smi file in this repository. Add one CXSMILES per line, and please do not insert any blank lines.
Commit your changes and create the PR. The CI workflow will run on the new templates and validate them, and once they are approved, they will be automatically added to the header.
For more information please check https://github.com/rdkit/molecular_templates/blob/main/README.md