-
Notifications
You must be signed in to change notification settings - Fork 4
/
rebuttal.tex
executable file
·429 lines (377 loc) · 22.2 KB
/
rebuttal.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
\documentclass[11pt,a4paper]{article}
\usepackage{times}
\usepackage{fullpage}
\setlength{\parindent}{0cm}
\begin{document}
\newcommand{\comment}[1]{\textit{``\ldots #1''}\par\vspace{0.5em}}
\newcommand{\response}[1]{#1\vspace{1em}}
We would like to thank the anonymous reviewers for their constructive
comments which have served to significantly improve our manuscript.
We have updated the main text in several places and have
also included a supplemental document that provides a brief
description of terminology and a list of some historic milestones in
the field. The following sections provide a detailed response to
individual reviewer comments.
\vspace{2em}
\fbox{\textbf{Reviewer 1}}
\comment{when the authors state 'together with an even larger number
of annotations', they shold explain what are annotations and how
much is the large number;}
\response{We have updated the text on page 1 to be more explicit in
terms of what annotations are and provide a concrete example of the
sizes involved.}
\comment{For an overview I would expect a more comprehensive set of
references.}
\response{We agree with the reviewer that a more
comprehensive list of references would be appropriate and
useful. However, due to format limitations of the article, we have
been restricted to the use of 40 references. To alleviate this
problem, we have included a supplement that provides a more
extensive list of references, specifically including a number of
texts covering cheminformatics broadly.}
\comment{1) The connection between sections could be improved,
i.e. the authors should try to enhance the correlation that exists
between the topics described in each section and map them to a
global flow of cheminfomatics changes. 2) The authors state in the
beginning that the paper will be around the concept of risk
minimization in drug discovery, but I found that this topic is not
sufficiently explored. In many sections is not straightforward teh
connection between the challenges described and their impact on risk
minimization in drug discovery; 3) The paper misses some important
efforts and more detail on translational medicine and semantic web }
\response{The reviewer is correct in noting the disjointed character
of the original article. We have updated the text throughout to flow
smoothly. As part of this we have made more explicit how each topic
that is discussed plays a role in derisking various stages of the
drug disocvery process. Finally, we have updated the text on page 4
to address the issue of linked data and the role that semantic
technologies can play in this area. We have, however, chosen to not
address the issue of translational medicine in the main article.
Still, we have added a paragraph plus references to the historic
supplement.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
\fbox{\textbf{Reviewer 2}}
\comment{The authors mention limitations of SMILES implementations,
but do not mention canonicalization in this context, often using the
Morgan algorithm published in 1965 in J. Chem. Doc.}
\response{We have updated the text on page 3 to note the Morgan
algorithm in the context of chemical structure representation}
\comment{The authors state that 3D information is lost when only
considering the molecular graph. However, they do not make it clear
that the 3D space is implicitly encoded in the molecular graph –
this is essential to cover properly.}
\response{We have update the text on page 2 to note that 3D
information is implicit in 2D representations}
\comment{ I entirely dispute that cheminformatics methods have been
'closely guarded secrets of companies...' since many methods have been
published in the public domain. The authors suggest that only in the
last decade has the field gained access to freely available
software, etc.}
\response{We agree that secret was probably not the best choice of
words. However we do believe that much cheminformatics software and
data was proprietary till recently. We have updated the text on
pages 7 and 8 to expand on this and provide a more detailed
discussion}
\comment{If this manuscript were to be published it would be
completely rewriting the history of cheminformatics}
\response{While we agree with the reviewer that some key milestones in
the history of cheminformatics were not included in the original
article, we believe that the addition of the supplemental history
addresses this problem. Furthermore, our original aim was \emph{not}
to provide a complete historical overview of the field. Others have
already provided such overviews. Our aim was to present a brief
overview of \emph{current topics} in cheminformatics, selected because we
believed that they represent on going challenges for the field and
thus amenable to coss-disciplinary efforts.}
%%%%%%%%%%%%%%%%%%%%%%%
\fbox{\textbf{Reviewer 3}}
\comment{The manuscript provides a very specific overview on the
aspects of the open source community in cheminformatics. }
\response{As we have noted below, the current article is not a
comprehensive historical review of the field. We have included a
supplement that does provide a very brief overview of historic
milestones. Furthermore, we have updated the text to note commercial
vendors and solutions; but we still emphasise Open Source and Open
Access solutions as these allow readers to explore the problems and
topics we have discussed with a minimum of hindrances.}
\comment{The manuscript is poor in relevant citations with none of the
classical references provided, such as Morgan 1965, Weininger 1988,
Johnson and Maggiora 1990, Barnard 1993, Willett et al. 1998, etc.}
\response{The text has been updated to include the ``classic''
references. In addition the supplemental history document lists a
number of milestones in cheminformatics and their relevant references.}
\comment{Molecular docking is also not mentioned, but is regarded as
one of the key methods applied in cheminformatics. Docking should be
included, along with other molecular modelling methods such as
pharmacophores, shape searching, molecular dynamics, as a distinct
section.}
\response{These methods are mentioned on page 5 (column
1). However, we have chosen not to place a more in depth discussion
of these methods in their own section in the main document for a few
reasons. First, due to space limitations this would require us to
compress or drop other sections which we believe to be more
relevant. Second, we are of the view that docking, for example, is
not really a cheminformatics technique, though it makes extensive
use of cheminformatic methods. Instead we place it under molecular
modeling. We realize that this is a somewhat subjective
``classification'' of methods. However we do make explicit note of
these methods in the supplemental document where we make a
distinction between cheminformatics and related subject areas such
as quantum mechanics and molecular modeling.}
\comment{Proteochemometric methods are mentioned a couple of times but
these are largely un-noticed by the community currently. I do not
think that it is necessarily appropriate to include these as
tried-and-tested methods in an over-arching review such as this.}
\response{It is true that proteochemometric methodologies
might not be as well known as say docking or pharmacophore modeling.
Though, what is considered as proteochemometric, is in docking known as
'interaction fingerprint', e.g. for Kinase databases, and in the field of HIV the
term 'phenotypic modeling' has a long tradition.
We choose, along with multiple reviews and the books written by
Qing Yan (Pharmacogenomics) and Kubinyi/M\"{u}ller (Chemogenomics),
that the term proteochemometrics is worth being mentioned, since it unifies the
(statistical) mining/QSAR techniques that do explicitly include multiple ligand
activities and multiple receptor features.\\
Finally, we also agree with reviewer number one that proteochemometric modeling
plays an important role in translational medicine, e.g. HIV-HAART therapy.
A paragraph and references about translational medicine was added to the
historic supplement.}
\comment{RDKit is not mentioned at all in this article and is widely
used as a cheminformatics toolkit in industry and academia and free
and open-source (although from commercial beginnings).}
\response{We agree that this toolkit should have been included. We
have updated the text on page 7 as well as Table 2 to take note of RDKit,
along with the newer Indigo toolkit.}
\comment{The authors could more appropriately cover the graph theory
aspects of molecular structures by correctly applying the
terminology to which computer scientists and cheminformaticians will
most certainly be aware}
\response{The text has been updated.}
\comment{The inclusion of ontologies is not really useful here as the
conclusions drawn are vague and do not really inform the current
state-of-the-art and usefulness of ontologies in cheminformatics.}
\response{The text has been updated on page 4 to make the discussion
of ontologies more focused and the use cases clearer, specifically
identifying challenges in cross-domain querying, classification
and integration within the context of whole-systems biology, and the
growing essental role that ontologies and semantic technologies are
playing in this matter.}
\comment{Section 2.2. Structure Enumeration: it is not clear what this
section is trying to convey to the reader \ldots}
\response{We have rewritten this section to simplify the language as
well as be more explicit in applications of and challenges in this
technique. We have also provide relevant references to key topics in
this area.}
\comment{p2, c1: I think the authors could benefit from a re-write of
the section detailing molecular structure representations. \ldots
Figure 1: what is meant by 'closer to reality' This needs to be
clarified. This figure and its caption requires a little more
consideration to appropriately convey the message, }
\response{We have updated the caption to remove the phrase ``closer to
reality'' since this is not the main point. Instead we have noted
that the representations contain differing amounts and types of
information that are equally valid, but suited for different
purposes. The text on page 2 has been updated to better represent
this view.}
\comment{p1, c2, l31: it is not clear to this reviewer what software
and algorithms have only recently been made available. Certainly,
the referenced book to which they refer contains a large number of
algorithms that have already been published in the primary
literature and frequently many decades ago. If the open source
community has provided true novelty then please articulate this,
otherwise cite the primary references that report these
discoveries.}
\response{The text has been updated to note that most of the
algorithms in cheminformatics have been known since the mid 20th
century, but that freely accessible toolkits implementing them are
relatively recent, as well as public and open benchmark data sets.}
\comment{The paragraph in the second column discussing descriptors and
the proper description requires rewriting too since it provides a
very vague overview that is not particularly informative.}
\response{We have updated the discussion of descriptors to be more
focused.}
\comment{page 1, column 1, line 24: the authors need to provide a
reference for the assertion that cheminformatics is an older field,
particularly given the comments later in this manuscript.}
\response{We have removed the assertion that cheminformatics is older
than bioinformatics. Primarily, because there is no obvious
reference that claims this. But also because, depending on what one
considers bioinformatics and cheminformatics, one field can be
identified as being older or younger than the other. We feel that
claims on age do not strengthen the paper and have thus removed it.}
\comment{p1, c1, l29-37: I would dispute that until recently the
cheminformatics techniques have been closely guarded
secrets. J. Chem. Doc. was founded in 1961 and other techniques even
older, such as the Wiener index (1947), Wiswesser Line Notation
(1949), etc. Not to mention the actual foundations of what we call
cheminformatics back to the atomistic theory of the 19th century.}
\response{We have the updated the text on page 1 to attenuate this
statement, stressing the fact that much of the data in
cheminformatics is proprietary, whereas only some of the techniques (such
as SMILES canonicalization) are. Regarding the statement that atomistic
theory laid the foundations of cheminformatics - this is certainly
true in a very broad sense. For that matter the atomistic theory
laid the ground for computational chemistry in general. In the
supplemental history document, we make a distinction between
computational chemistry and cheminformatics.}
\comment{p1, c1, l32: I think the phrase miracle molecule could be
more usefully replaced with new drugs or new small molecule
therapeutics.}
\response{The text has been updated to use the term ``therapeutic
molecule''.}
\comment{p2, c2, l57: it might be worthwhile here clarifying in the text
precisely the properties that require satisfaction to deliver a small
molecule therapeutic, not ?promising?. A drug must be safe and
efficacious ultimately; perhaps this should be mentioned first
followed by the typical pitfalls and how they are assessed?}
\response{The text has been updated to provide some examples of
properties that would characterize a therapeutically useful small molecule.}
\comment{P3, c1, l41: the authors need to cover the Morgan algorithm
(published in 1965) here and explain the canonicalization
process. The issues mentioned in different canonicalization
implementations providing different SMILES strings is also a
challenge with InChI codes with different softwares giving different
representation. Therefore, this is still not a solved problem as
suggested here. Could the authors clarify this in the text?}
\response{We have updated the text to reference the Morgan
algorithm. Regarding the issue of InChI codes, we believe that this
is not the case. Currently, there is only one implementation of the
InChI algorithm and even if alternative implementations were to be
developed, the InChI specification is publically available. As a
result, there should be no differing InChI representations for a
given input structure.}
\comment{P3, c2: structure-based fingerprints referred to here sound
like Daylight-style fingerprints to this reviewer. Is this the case?
I would emphasise the distinction of structure-key and hash-key
fingerprints. My understanding is that Daylight-style fingerprints
enumerate paths of 7 in length, not 8, could the authors provide a
reference for this?}
\response{The section on fingerprints has been rewritten to be more
clearer as well as distinguish between the two broad classes of
fingerprints. We have also mentioned the use of fingerprints in
similarity searches and the Tanimoto coefficient.}
\comment{P3, c2, l42: is this true given the lower complexity of
molecular graphs compared with other much larger and denser graphs?
I think a reference to a known paper that states the problem clearly
would be of benefit here.}
\response{While it is true that isomorphism on smaller graphs that are
characteristic of small molecule is in an absolute sense, not very
slow, there are a number of cases (polycyclic hydrocarbons, steroids
etc.) that can take significantly longer. More generally, the fact
that subgraph isomorphism is NP-complete, means that no time guarantees
can be provided. Thus when performing this on \emph{millions} of
molecules as in a database search, we may not be able to complete
the operation. In practice this is usually not the case, but still,
isomorphism algorithms do take much longer than fingerprint
screening. Hence in practice performing isomorphism tests for large
databases is infeasible. We have included a reference that
explicitly talks about this problem.}
\comment{P4, c2, l17: citation needed on the size of chemistry space.}
\response{The appropriate reference has been added.}
\comment{P4, c2, l23: the isomorphism problem has already been
mentioned previously in this manuscript.}
\response{The text has been updated and simplified.}
\comment{P4, c2, l41: the GDB-13 database contains 970 million
molecules, which is nearly a billion, not a trillion.}
\response{The text has been updated to use the correct number.}
\comment{P5, c1, l29: the normal phrase used to describe this concept
is the similar property principle. The authors should also provide a
reference.}
\response{This portion of the text has been restructured to refer to
the similarity property principle and also include the relevant
reference.}
\comment{P5, c1, l40: QSAR should be referenced to Hansch et al. Also
perhaps some discussion on what the authors mean by referring to
these approaches as traditional.}
\response{We have updated text to include references to Hansch and
Free \& Wilson. Regarding the use of ``traditional'', we believe the
text explicitly explains why - the fact that QSAR as originaly
defined only considers ligand features. However, we have rephrased it to use
``traditionally'' since one can argue that methods such as docking
and pharmacophores are also QSAR methods, but consider both ligand
and receptor.}
\comment{P5, c1, l41: I would dispute the assertion that these methods
"ignore reception interactions" since the aim is to identify a
correlation of biological response (e.g. pIC50) with chemical
structure. The biological response is an explicit measurement of
receptor interactions on the protein.}
\response{We have updated
the text to explicitly note that QSAR models do not usually take
into account \emph{receptor features}. However we note that while
receptor information is implicit in the $IC_{50}$, the value also
includes other non-receptor related features such as
permeability. Furthermore, lack of receptor information in QSAR
models has been noted as the origin of activity cliffs (Guha and Van
Drie, \textit{J.~Chem.~Inf.~Model.}, \textbf{2008}, \textit{48},
1716--1728), thus supporting the statement that traditional QSAR
models do indeed miss important information on receptor-ligand
interactions.}
\comment{P5, c2, l22: I would mention naive Bayesian classifiers as well as
this is perhaps the most widely applied method in the field.}
\response{We have included the Na\"{i}ve Bayes as well.}
\comment{P5, c2, l42: formal citation for one of Hopfinger?s papers is
required here.}
\response{The relevant reference has been added.}
\comment{P5, c2, l58: this assertion is made with no evidence to back
it up. Why should multi-target models be more reliable? It is not
clear to me that they should.}
\response{We have rephrased this statement to be less dogmatic. While
it is true that for the case of linear models, a multivariate
multiple regression is in general, equivalent to multiple individual
regression models, one could argue that taking into account the
correlation structure between the multiple y's could lead to improved
models in scenarios where a molecule has affinities for multiple
related targets. We admit that the answer is not clear at this point
and hence suggest that this could a topic for research. }
\comment{P6, c1, l7: I think it is important for the authors to cover
ELNs by using actual recorded data. Do we know that many scientists
changed to ELNs ?many years ago? as stated? Is there a reference for
this or data to back it up?}
\response{We have expanded our original wording, and we now cite two
references about the use and growth of ELNs in chemistry and
elsewhere.}
\comment{P6, c1, l7: the authors do not mention the leading players in
ELNs (e.g. CambridgeSoft, Accelrys, etc.) \ldots I think the
key software houses that develop and supply these ELN systems should
be mentioned. Furthermore, systems such as Reaxys should be
mentioned \ldots}
\response{We now mention CambridgeSoft, Accelrys and
Reaxys in the article, and provide a citation to a review of 35 ELNs
that was published in November 2011.}
\comment{P7, c2, l25: the authors should go back even further in the
history of cheminformatics here before DENDRAL. While this is an
excellent example of crossover, it is not by any means the founding
of our field. Aspects of mathematical chemistry should also be
mentioned in this article, which date back even further, such as
1894 with the publication of ?The Principles of Mathematical
Chemistry? published by Helm.}
\response{As noted previously, we have included a supplemental
document that lists some of the milestones in the history of
cheminformatics. However, we note that the suggested reference by
Helm is not specifically related to cheminformatics. Rather, it is a
mathematical treatment of physical chemistry. Given that DENDRAL is
referenced in the context of structure enumeration, we feel that
inclusion of this reference would be somewhat irrelevant. We have
included this reference in the supplemental history.}
\comment{P7, c2, l51: why is chemically in inverted commas and not
algorithmically? I would prefer inverted commas for neither.}
\response{Quotes have been removed.}
\comment{P8, c1, l10: ELN's should be ELNs.}
\response{Corrected.}
\comment{P8, c1, l5: the users emphasise open source, open data but
this should be an article on all of cheminformatics not just a
particular subset.}
\response{We have restructured the section on
open source and open data. It is true that we still focus on open
source/open data, because this ensures that a reader will be able to
explore cheminformatics problems with a minimum of hindrances.}
\comment{P7, c1, l9: the authors make an interesting point here
regarding why cheminformatics software has historically been from
commercial suppliers. Given the scope of this manuscript it would be
interesting to expand on this discussion.}
\response{We have
restructured the Conclusions section to expand on the role of
cheminformatics in industry versus bioinformatics, and why much
of cheminformatics tools and data have been commercial, in contrast
to freely available tools and data in bioinformatics.}
\end{document}