Skip to content

Commit

Permalink
Cleaned up tokenization and fingerprints code.
Browse files Browse the repository at this point in the history
  • Loading branch information
pschwllr committed Aug 12, 2021
1 parent ec0c662 commit 865880e
Show file tree
Hide file tree
Showing 9 changed files with 62 additions and 33 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,3 +112,7 @@ Our work was first presented in the NeurIPS 2019 workshop for [Machine Learning
```

RXNFP has been developed in a collaboration between IBM Research Europe and the [Reymond group](http://gdb.unibe.ch) at the University of Bern. The classification models are used on the [RXN for Chemistry](https://rxn.res.ibm.com) platform.

Our publication is part of the Nature Portfolio ["Synthesis and enabling technologies" collection](https://www.nature.com/collections/ijabjccjec) and was featured in a News & Views on [Transformers for future medicinal chemists](https://www.nature.com/articles/s42256-021-00299-x).

Moreover, the `rxnfp` code was reused to train new models on different data as described in [Reusability report: Learning the language of synthetic methods used in medicinal chemistry](https://www.nature.com/articles/s42256-021-00367-2).
2 changes: 2 additions & 0 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,8 @@ <h2 id="Citation">Citation<a class="anchor-link" href="#Citation"> </a></h2><p>O
publisher={Nature Publishing Group}
}</code></pre>
<p>RXNFP has been developed in a collaboration between IBM Research Europe and the <a href="http://gdb.unibe.ch">Reymond group</a> at the University of Bern. The classification models are used on the <a href="https://rxn.res.ibm.com">RXN for Chemistry</a> platform.</p>
<p>Our publication is part of the Nature Portfolio <a href="https://www.nature.com/collections/ijabjccjec">"Synthesis and enabling technologies" collection</a> and was featured in a News &amp; Views on <a href="https://www.nature.com/articles/s42256-021-00299-x">Transformers for future medicinal chemists</a>.</p>
<p>Moreover, the <code>rxnfp</code> code was reused to train new models on different data as described in <a href="https://www.nature.com/articles/s42256-021-00367-2">Reusability report: Learning the language of synthetic methods used in medicinal chemistry</a>.</p>

</div>
</div>
Expand Down
17 changes: 17 additions & 0 deletions docs/tokenization.html
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,23 @@ <h1 id="Examples">Examples<a class="anchor-link" href="#Examples"> </a></h1><p><
</div>
</div>

</div>
{% endraw %}

{% raw %}

<div class="cell border-box-sizing code_cell rendered">
<div class="input">

<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">vocab_list</span><span class="p">)</span> <span class="o">==</span> <span class="mi">591</span>
</pre></div>

</div>
</div>
</div>

</div>
{% endraw %}

Expand Down
6 changes: 3 additions & 3 deletions docs/transformer_fingerprints.html
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ <h2 id="RXNBERTFingerprintGenerator" class="doc_header"><code>class</code> <code


<div class="output_markdown rendered_html output_subarea ">
<h2 id="RXNBERTMinhashFingerprintGenerator" class="doc_header"><code>class</code> <code>RXNBERTMinhashFingerprintGenerator</code><a href="https://github.com/rxn4chemistry/rxnfp/tree/master/rxnfp/transformer_fingerprints.py#L79" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>RXNBERTMinhashFingerprintGenerator</code>(<strong><code>model</code></strong>:<code>BertModel</code>, <strong><code>tokenizer</code></strong>:<a href="/rxnfp/tokenization.html#SmilesTokenizer"><code>SmilesTokenizer</code></a>, <strong><code>permutations</code></strong>=<em><code>256</code></em>, <strong><code>seed</code></strong>=<em><code>42</code></em>, <strong><code>force_no_cuda</code></strong>=<em><code>False</code></em>) :: <a href="/rxnfp/core.html#FingerprintGenerator"><code>FingerprintGenerator</code></a></p>
<h2 id="RXNBERTMinhashFingerprintGenerator" class="doc_header"><code>class</code> <code>RXNBERTMinhashFingerprintGenerator</code><a href="https://github.com/rxn4chemistry/rxnfp/tree/master/rxnfp/transformer_fingerprints.py#L72" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>RXNBERTMinhashFingerprintGenerator</code>(<strong><code>model</code></strong>:<code>BertModel</code>, <strong><code>tokenizer</code></strong>:<a href="/rxnfp/tokenization.html#SmilesTokenizer"><code>SmilesTokenizer</code></a>, <strong><code>permutations</code></strong>=<em><code>256</code></em>, <strong><code>seed</code></strong>=<em><code>42</code></em>, <strong><code>force_no_cuda</code></strong>=<em><code>False</code></em>) :: <a href="/rxnfp/core.html#FingerprintGenerator"><code>FingerprintGenerator</code></a></p>
</blockquote>
<p>Generate RXNBERT fingerprints from reaction SMILES</p>

Expand All @@ -97,7 +97,7 @@ <h2 id="RXNBERTMinhashFingerprintGenerator" class="doc_header"><code>class</code


<div class="output_markdown rendered_html output_subarea ">
<h4 id="get_default_model_and_tokenizer" class="doc_header"><code>get_default_model_and_tokenizer</code><a href="https://github.com/rxn4chemistry/rxnfp/tree/master/rxnfp/transformer_fingerprints.py#L117" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>get_default_model_and_tokenizer</code>(<strong><code>model</code></strong>=<em><code>'bert_ft'</code></em>, <strong><code>force_no_cuda</code></strong>=<em><code>False</code></em>)</p>
<h4 id="get_default_model_and_tokenizer" class="doc_header"><code>get_default_model_and_tokenizer</code><a href="https://github.com/rxn4chemistry/rxnfp/tree/master/rxnfp/transformer_fingerprints.py#L110" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>get_default_model_and_tokenizer</code>(<strong><code>model</code></strong>=<em><code>'bert_ft'</code></em>, <strong><code>force_no_cuda</code></strong>=<em><code>False</code></em>)</p>
</blockquote>

</div>
Expand All @@ -121,7 +121,7 @@ <h4 id="get_default_model_and_tokenizer" class="doc_header"><code>get_default_mo


<div class="output_markdown rendered_html output_subarea ">
<h4 id="generate_fingerprints" class="doc_header"><code>generate_fingerprints</code><a href="https://github.com/rxn4chemistry/rxnfp/tree/master/rxnfp/transformer_fingerprints.py#L141" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>generate_fingerprints</code>(<strong><code>rxns</code></strong>:<code>List</code>[<code>str</code>], <strong><code>fingerprint_generator</code></strong>:<a href="/rxnfp/core.html#FingerprintGenerator"><code>FingerprintGenerator</code></a>, <strong><code>batch_size</code></strong>=<em><code>1</code></em>)</p>
<h4 id="generate_fingerprints" class="doc_header"><code>generate_fingerprints</code><a href="https://github.com/rxn4chemistry/rxnfp/tree/master/rxnfp/transformer_fingerprints.py#L134" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>generate_fingerprints</code>(<strong><code>rxns</code></strong>:<code>List</code>[<code>str</code>], <strong><code>fingerprint_generator</code></strong>:<a href="/rxnfp/core.html#FingerprintGenerator"><code>FingerprintGenerator</code></a>, <strong><code>batch_size</code></strong>=<em><code>1</code></em>)</p>
</blockquote>

</div>
Expand Down
Loading

0 comments on commit 865880e

Please sign in to comment.