diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 0000000..f1a69e1 --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,37 @@ +Package: ECOTOXr +Type: Package +Title: Download and Extract Data from US EPA's ECOTOX Database +Version: 0.1.0 +Date: 2021-10-03 +Authors@R: c(person("Pepijn", "de Vries", role = c("aut", "cre", "dtc"), + email = "pepijn.devries@outlook.com")) +Author: + Pepijn de Vries [aut, cre, dtc] +Maintainer: Pepijn de Vries +Description: The US EPA ECOTOX database is a freely available database + with a treasure of aquatic and terrestrial ecotoxicological data. + As the online search interface doesn't come with an API, this + package provides the means to easily access and search the database + in R. To this end, all raw tables are downloaded from the EPA website + and stored in a local SQLite database. +Depends: + R (>= 3.5.0), + RSQLite +Imports: + crayon, + dplyr, + rappdirs, + readr, + rvest, + stringr, + utils +Suggests: + testthat (>= 3.0.0), + webchem +URL: https://github.com/pepijn-devries/ECOTOXr +BugReports: https://github.com/pepijn-devries/ECOTOXr/issues +License: GPL (>= 3) +Encoding: UTF-8 +LazyData: true +RoxygenNote: 7.1.2 +Config/testthat/edition: 3 diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 0000000..9d7eed3 --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,595 @@ +GNU General Public License +========================== + +_Version 3, 29 June 2007_ +_Copyright © 2007 Free Software Foundation, Inc. <>_ + +Everyone is permitted to copy and distribute verbatim copies of this license +document, but changing it is not allowed. + +## Preamble + +The GNU General Public License is a free, copyleft license for software and other +kinds of works. + +The licenses for most software and other practical works are designed to take away +your freedom to share and change the works. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change all versions of a +program--to make sure it remains free software for all its users. We, the Free +Software Foundation, use the GNU General Public License for most of our software; it +applies also to any other work released this way by its authors. You can apply it to +your programs, too. + +When we speak of free software, we are referring to freedom, not price. Our General +Public Licenses are designed to make sure that you have the freedom to distribute +copies of free software (and charge for them if you wish), that you receive source +code or can get it if you want it, that you can change the software or use pieces of +it in new free programs, and that you know you can do these things. + +To protect your rights, we need to prevent others from denying you these rights or +asking you to surrender the rights. Therefore, you have certain responsibilities if +you distribute copies of the software, or if you modify it: responsibilities to +respect the freedom of others. + +For example, if you distribute copies of such a program, whether gratis or for a fee, +you must pass on to the recipients the same freedoms that you received. You must make +sure that they, too, receive or can get the source code. And you must show them these +terms so they know their rights. + +Developers that use the GNU GPL protect your rights with two steps: **(1)** assert +copyright on the software, and **(2)** offer you this License giving you legal permission +to copy, distribute and/or modify it. + +For the developers' and authors' protection, the GPL clearly explains that there is +no warranty for this free software. For both users' and authors' sake, the GPL +requires that modified versions be marked as changed, so that their problems will not +be attributed erroneously to authors of previous versions. + +Some devices are designed to deny users access to install or run modified versions of +the software inside them, although the manufacturer can do so. This is fundamentally +incompatible with the aim of protecting users' freedom to change the software. The +systematic pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we have designed +this version of the GPL to prohibit the practice for those products. If such problems +arise substantially in other domains, we stand ready to extend this provision to +those domains in future versions of the GPL, as needed to protect the freedom of +users. + +Finally, every program is threatened constantly by software patents. States should +not allow patents to restrict development and use of software on general-purpose +computers, but in those that do, we wish to avoid the special danger that patents +applied to a free program could make it effectively proprietary. To prevent this, the +GPL assures that patents cannot be used to render the program non-free. + +The precise terms and conditions for copying, distribution and modification follow. + +## TERMS AND CONDITIONS + +### 0. Definitions + +“This License” refers to version 3 of the GNU General Public License. + +“Copyright” also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + +“The Program” refers to any copyrightable work licensed under this +License. Each licensee is addressed as “you”. “Licensees” and +“recipients” may be individuals or organizations. + +To “modify” a work means to copy from or adapt all or part of the work in +a fashion requiring copyright permission, other than the making of an exact copy. The +resulting work is called a “modified version” of the earlier work or a +work “based on” the earlier work. + +A “covered work” means either the unmodified Program or a work based on +the Program. + +To “propagate” a work means to do anything with it that, without +permission, would make you directly or secondarily liable for infringement under +applicable copyright law, except executing it on a computer or modifying a private +copy. Propagation includes copying, distribution (with or without modification), +making available to the public, and in some countries other activities as well. + +To “convey” a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through a computer +network, with no transfer of a copy, is not conveying. + +An interactive user interface displays “Appropriate Legal Notices” to the +extent that it includes a convenient and prominently visible feature that **(1)** +displays an appropriate copyright notice, and **(2)** tells the user that there is no +warranty for the work (except to the extent that warranties are provided), that +licensees may convey the work under this License, and how to view a copy of this +License. If the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + +### 1. Source Code + +The “source code” for a work means the preferred form of the work for +making modifications to it. “Object code” means any non-source form of a +work. + +A “Standard Interface” means an interface that either is an official +standard defined by a recognized standards body, or, in the case of interfaces +specified for a particular programming language, one that is widely used among +developers working in that language. + +The “System Libraries” of an executable work include anything, other than +the work as a whole, that **(a)** is included in the normal form of packaging a Major +Component, but which is not part of that Major Component, and **(b)** serves only to +enable use of the work with that Major Component, or to implement a Standard +Interface for which an implementation is available to the public in source code form. +A “Major Component”, in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system (if any) on which +the executable work runs, or a compiler used to produce the work, or an object code +interpreter used to run it. + +The “Corresponding Source” for a work in object code form means all the +source code needed to generate, install, and (for an executable work) run the object +code and to modify the work, including scripts to control those activities. However, +it does not include the work's System Libraries, or general-purpose tools or +generally available free programs which are used unmodified in performing those +activities but which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for the work, and +the source code for shared libraries and dynamically linked subprograms that the work +is specifically designed to require, such as by intimate data communication or +control flow between those subprograms and other parts of the work. + +The Corresponding Source need not include anything that users can regenerate +automatically from other parts of the Corresponding Source. + +The Corresponding Source for a work in source code form is that same work. + +### 2. Basic Permissions + +All rights granted under this License are granted for the term of copyright on the +Program, and are irrevocable provided the stated conditions are met. This License +explicitly affirms your unlimited permission to run the unmodified Program. The +output from running a covered work is covered by this License only if the output, +given its content, constitutes a covered work. This License acknowledges your rights +of fair use or other equivalent, as provided by copyright law. + +You may make, run and propagate covered works that you do not convey, without +conditions so long as your license otherwise remains in force. You may convey covered +works to others for the sole purpose of having them make modifications exclusively +for you, or provide you with facilities for running those works, provided that you +comply with the terms of this License in conveying all material for which you do not +control copyright. Those thus making or running the covered works for you must do so +exclusively on your behalf, under your direction and control, on terms that prohibit +them from making any copies of your copyrighted material outside their relationship +with you. + +Conveying under any other circumstances is permitted solely under the conditions +stated below. Sublicensing is not allowed; section 10 makes it unnecessary. + +### 3. Protecting Users' Legal Rights From Anti-Circumvention Law + +No covered work shall be deemed part of an effective technological measure under any +applicable law fulfilling obligations under article 11 of the WIPO copyright treaty +adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention +of such measures. + +When you convey a covered work, you waive any legal power to forbid circumvention of +technological measures to the extent such circumvention is effected by exercising +rights under this License with respect to the covered work, and you disclaim any +intention to limit operation or modification of the work as a means of enforcing, +against the work's users, your or third parties' legal rights to forbid circumvention +of technological measures. + +### 4. Conveying Verbatim Copies + +You may convey verbatim copies of the Program's source code as you receive it, in any +medium, provided that you conspicuously and appropriately publish on each copy an +appropriate copyright notice; keep intact all notices stating that this License and +any non-permissive terms added in accord with section 7 apply to the code; keep +intact all notices of the absence of any warranty; and give all recipients a copy of +this License along with the Program. + +You may charge any price or no price for each copy that you convey, and you may offer +support or warranty protection for a fee. + +### 5. Conveying Modified Source Versions + +You may convey a work based on the Program, or the modifications to produce it from +the Program, in the form of source code under the terms of section 4, provided that +you also meet all of these conditions: + +* **a)** The work must carry prominent notices stating that you modified it, and giving a +relevant date. +* **b)** The work must carry prominent notices stating that it is released under this +License and any conditions added under section 7. This requirement modifies the +requirement in section 4 to “keep intact all notices”. +* **c)** You must license the entire work, as a whole, under this License to anyone who +comes into possession of a copy. This License will therefore apply, along with any +applicable section 7 additional terms, to the whole of the work, and all its parts, +regardless of how they are packaged. This License gives no permission to license the +work in any other way, but it does not invalidate such permission if you have +separately received it. +* **d)** If the work has interactive user interfaces, each must display Appropriate Legal +Notices; however, if the Program has interactive interfaces that do not display +Appropriate Legal Notices, your work need not make them do so. + +A compilation of a covered work with other separate and independent works, which are +not by their nature extensions of the covered work, and which are not combined with +it such as to form a larger program, in or on a volume of a storage or distribution +medium, is called an “aggregate” if the compilation and its resulting +copyright are not used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work in an aggregate +does not cause this License to apply to the other parts of the aggregate. + +### 6. Conveying Non-Source Forms + +You may convey a covered work in object code form under the terms of sections 4 and +5, provided that you also convey the machine-readable Corresponding Source under the +terms of this License, in one of these ways: + +* **a)** Convey the object code in, or embodied in, a physical product (including a +physical distribution medium), accompanied by the Corresponding Source fixed on a +durable physical medium customarily used for software interchange. +* **b)** Convey the object code in, or embodied in, a physical product (including a +physical distribution medium), accompanied by a written offer, valid for at least +three years and valid for as long as you offer spare parts or customer support for +that product model, to give anyone who possesses the object code either **(1)** a copy of +the Corresponding Source for all the software in the product that is covered by this +License, on a durable physical medium customarily used for software interchange, for +a price no more than your reasonable cost of physically performing this conveying of +source, or **(2)** access to copy the Corresponding Source from a network server at no +charge. +* **c)** Convey individual copies of the object code with a copy of the written offer to +provide the Corresponding Source. This alternative is allowed only occasionally and +noncommercially, and only if you received the object code with such an offer, in +accord with subsection 6b. +* **d)** Convey the object code by offering access from a designated place (gratis or for +a charge), and offer equivalent access to the Corresponding Source in the same way +through the same place at no further charge. You need not require recipients to copy +the Corresponding Source along with the object code. If the place to copy the object +code is a network server, the Corresponding Source may be on a different server +(operated by you or a third party) that supports equivalent copying facilities, +provided you maintain clear directions next to the object code saying where to find +the Corresponding Source. Regardless of what server hosts the Corresponding Source, +you remain obligated to ensure that it is available for as long as needed to satisfy +these requirements. +* **e)** Convey the object code using peer-to-peer transmission, provided you inform +other peers where the object code and Corresponding Source of the work are being +offered to the general public at no charge under subsection 6d. + +A separable portion of the object code, whose source code is excluded from the +Corresponding Source as a System Library, need not be included in conveying the +object code work. + +A “User Product” is either **(1)** a “consumer product”, which +means any tangible personal property which is normally used for personal, family, or +household purposes, or **(2)** anything designed or sold for incorporation into a +dwelling. In determining whether a product is a consumer product, doubtful cases +shall be resolved in favor of coverage. For a particular product received by a +particular user, “normally used” refers to a typical or common use of +that class of product, regardless of the status of the particular user or of the way +in which the particular user actually uses, or expects or is expected to use, the +product. A product is a consumer product regardless of whether the product has +substantial commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + +“Installation Information” for a User Product means any methods, +procedures, authorization keys, or other information required to install and execute +modified versions of a covered work in that User Product from a modified version of +its Corresponding Source. The information must suffice to ensure that the continued +functioning of the modified object code is in no case prevented or interfered with +solely because modification has been made. + +If you convey an object code work under this section in, or with, or specifically for +use in, a User Product, and the conveying occurs as part of a transaction in which +the right of possession and use of the User Product is transferred to the recipient +in perpetuity or for a fixed term (regardless of how the transaction is +characterized), the Corresponding Source conveyed under this section must be +accompanied by the Installation Information. But this requirement does not apply if +neither you nor any third party retains the ability to install modified object code +on the User Product (for example, the work has been installed in ROM). + +The requirement to provide Installation Information does not include a requirement to +continue to provide support service, warranty, or updates for a work that has been +modified or installed by the recipient, or for the User Product in which it has been +modified or installed. Access to a network may be denied when the modification itself +materially and adversely affects the operation of the network or violates the rules +and protocols for communication across the network. + +Corresponding Source conveyed, and Installation Information provided, in accord with +this section must be in a format that is publicly documented (and with an +implementation available to the public in source code form), and must require no +special password or key for unpacking, reading or copying. + +### 7. Additional Terms + +“Additional permissions” are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. Additional +permissions that are applicable to the entire Program shall be treated as though they +were included in this License, to the extent that they are valid under applicable +law. If additional permissions apply only to part of the Program, that part may be +used separately under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + +When you convey a copy of a covered work, you may at your option remove any +additional permissions from that copy, or from any part of it. (Additional +permissions may be written to require their own removal in certain cases when you +modify the work.) You may place additional permissions on material, added by you to a +covered work, for which you have or can give appropriate copyright permission. + +Notwithstanding any other provision of this License, for material you add to a +covered work, you may (if authorized by the copyright holders of that material) +supplement the terms of this License with terms: + +* **a)** Disclaiming warranty or limiting liability differently from the terms of +sections 15 and 16 of this License; or +* **b)** Requiring preservation of specified reasonable legal notices or author +attributions in that material or in the Appropriate Legal Notices displayed by works +containing it; or +* **c)** Prohibiting misrepresentation of the origin of that material, or requiring that +modified versions of such material be marked in reasonable ways as different from the +original version; or +* **d)** Limiting the use for publicity purposes of names of licensors or authors of the +material; or +* **e)** Declining to grant rights under trademark law for use of some trade names, +trademarks, or service marks; or +* **f)** Requiring indemnification of licensors and authors of that material by anyone +who conveys the material (or modified versions of it) with contractual assumptions of +liability to the recipient, for any liability that these contractual assumptions +directly impose on those licensors and authors. + +All other non-permissive additional terms are considered “further +restrictions” within the meaning of section 10. If the Program as you received +it, or any part of it, contains a notice stating that it is governed by this License +along with a term that is a further restriction, you may remove that term. If a +license document contains a further restriction but permits relicensing or conveying +under this License, you may add to a covered work material governed by the terms of +that license document, provided that the further restriction does not survive such +relicensing or conveying. + +If you add terms to a covered work in accord with this section, you must place, in +the relevant source files, a statement of the additional terms that apply to those +files, or a notice indicating where to find the applicable terms. + +Additional terms, permissive or non-permissive, may be stated in the form of a +separately written license, or stated as exceptions; the above requirements apply +either way. + +### 8. Termination + +You may not propagate or modify a covered work except as expressly provided under +this License. Any attempt otherwise to propagate or modify it is void, and will +automatically terminate your rights under this License (including any patent licenses +granted under the third paragraph of section 11). + +However, if you cease all violation of this License, then your license from a +particular copyright holder is reinstated **(a)** provisionally, unless and until the +copyright holder explicitly and finally terminates your license, and **(b)** permanently, +if the copyright holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + +Moreover, your license from a particular copyright holder is reinstated permanently +if the copyright holder notifies you of the violation by some reasonable means, this +is the first time you have received notice of violation of this License (for any +work) from that copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + +Termination of your rights under this section does not terminate the licenses of +parties who have received copies or rights from you under this License. If your +rights have been terminated and not permanently reinstated, you do not qualify to +receive new licenses for the same material under section 10. + +### 9. Acceptance Not Required for Having Copies + +You are not required to accept this License in order to receive or run a copy of the +Program. Ancillary propagation of a covered work occurring solely as a consequence of +using peer-to-peer transmission to receive a copy likewise does not require +acceptance. However, nothing other than this License grants you permission to +propagate or modify any covered work. These actions infringe copyright if you do not +accept this License. Therefore, by modifying or propagating a covered work, you +indicate your acceptance of this License to do so. + +### 10. Automatic Licensing of Downstream Recipients + +Each time you convey a covered work, the recipient automatically receives a license +from the original licensors, to run, modify and propagate that work, subject to this +License. You are not responsible for enforcing compliance by third parties with this +License. + +An “entity transaction” is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an organization, or +merging organizations. If propagation of a covered work results from an entity +transaction, each party to that transaction who receives a copy of the work also +receives whatever licenses to the work the party's predecessor in interest had or +could give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if the predecessor +has it or can get it with reasonable efforts. + +You may not impose any further restrictions on the exercise of the rights granted or +affirmed under this License. For example, you may not impose a license fee, royalty, +or other charge for exercise of rights granted under this License, and you may not +initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging +that any patent claim is infringed by making, using, selling, offering for sale, or +importing the Program or any portion of it. + +### 11. Patents + +A “contributor” is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The work thus +licensed is called the contributor's “contributor version”. + +A contributor's “essential patent claims” are all patent claims owned or +controlled by the contributor, whether already acquired or hereafter acquired, that +would be infringed by some manner, permitted by this License, of making, using, or +selling its contributor version, but do not include claims that would be infringed +only as a consequence of further modification of the contributor version. For +purposes of this definition, “control” includes the right to grant patent +sublicenses in a manner consistent with the requirements of this License. + +Each contributor grants you a non-exclusive, worldwide, royalty-free patent license +under the contributor's essential patent claims, to make, use, sell, offer for sale, +import and otherwise run, modify and propagate the contents of its contributor +version. + +In the following three paragraphs, a “patent license” is any express +agreement or commitment, however denominated, not to enforce a patent (such as an +express permission to practice a patent or covenant not to sue for patent +infringement). To “grant” such a patent license to a party means to make +such an agreement or commitment not to enforce a patent against the party. + +If you convey a covered work, knowingly relying on a patent license, and the +Corresponding Source of the work is not available for anyone to copy, free of charge +and under the terms of this License, through a publicly available network server or +other readily accessible means, then you must either **(1)** cause the Corresponding +Source to be so available, or **(2)** arrange to deprive yourself of the benefit of the +patent license for this particular work, or **(3)** arrange, in a manner consistent with +the requirements of this License, to extend the patent license to downstream +recipients. “Knowingly relying” means you have actual knowledge that, but +for the patent license, your conveying the covered work in a country, or your +recipient's use of the covered work in a country, would infringe one or more +identifiable patents in that country that you have reason to believe are valid. + +If, pursuant to or in connection with a single transaction or arrangement, you +convey, or propagate by procuring conveyance of, a covered work, and grant a patent +license to some of the parties receiving the covered work authorizing them to use, +propagate, modify or convey a specific copy of the covered work, then the patent +license you grant is automatically extended to all recipients of the covered work and +works based on it. + +A patent license is “discriminatory” if it does not include within the +scope of its coverage, prohibits the exercise of, or is conditioned on the +non-exercise of one or more of the rights that are specifically granted under this +License. You may not convey a covered work if you are a party to an arrangement with +a third party that is in the business of distributing software, under which you make +payment to the third party based on the extent of your activity of conveying the +work, and under which the third party grants, to any of the parties who would receive +the covered work from you, a discriminatory patent license **(a)** in connection with +copies of the covered work conveyed by you (or copies made from those copies), or **(b)** +primarily for and in connection with specific products or compilations that contain +the covered work, unless you entered into that arrangement, or that patent license +was granted, prior to 28 March 2007. + +Nothing in this License shall be construed as excluding or limiting any implied +license or other defenses to infringement that may otherwise be available to you +under applicable patent law. + +### 12. No Surrender of Others' Freedom + +If conditions are imposed on you (whether by court order, agreement or otherwise) +that contradict the conditions of this License, they do not excuse you from the +conditions of this License. If you cannot convey a covered work so as to satisfy +simultaneously your obligations under this License and any other pertinent +obligations, then as a consequence you may not convey it at all. For example, if you +agree to terms that obligate you to collect a royalty for further conveying from +those to whom you convey the Program, the only way you could satisfy both those terms +and this License would be to refrain entirely from conveying the Program. + +### 13. Use with the GNU Affero General Public License + +Notwithstanding any other provision of this License, you have permission to link or +combine any covered work with a work licensed under version 3 of the GNU Affero +General Public License into a single combined work, and to convey the resulting work. +The terms of this License will continue to apply to the part which is the covered +work, but the special requirements of the GNU Affero General Public License, section +13, concerning interaction through a network will apply to the combination as such. + +### 14. Revised Versions of this License + +The Free Software Foundation may publish revised and/or new versions of the GNU +General Public License from time to time. Such new versions will be similar in spirit +to the present version, but may differ in detail to address new problems or concerns. + +Each version is given a distinguishing version number. If the Program specifies that +a certain numbered version of the GNU General Public License “or any later +version” applies to it, you have the option of following the terms and +conditions either of that numbered version or of any later version published by the +Free Software Foundation. If the Program does not specify a version number of the GNU +General Public License, you may choose any version ever published by the Free +Software Foundation. + +If the Program specifies that a proxy can decide which future versions of the GNU +General Public License can be used, that proxy's public statement of acceptance of a +version permanently authorizes you to choose that version for the Program. + +Later license versions may give you additional or different permissions. However, no +additional obligations are imposed on any author or copyright holder as a result of +your choosing to follow a later version. + +### 15. Disclaimer of Warranty + +THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. +EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER +EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE +QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE +DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + +### 16. Limitation of Liability + +IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY +COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS +PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, +INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE +PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE +OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE +WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + +### 17. Interpretation of Sections 15 and 16 + +If the disclaimer of warranty and limitation of liability provided above cannot be +given local legal effect according to their terms, reviewing courts shall apply local +law that most closely approximates an absolute waiver of all civil liability in +connection with the Program, unless a warranty or assumption of liability accompanies +a copy of the Program in return for a fee. + +_END OF TERMS AND CONDITIONS_ + +## How to Apply These Terms to Your New Programs + +If you develop a new program, and you want it to be of the greatest possible use to +the public, the best way to achieve this is to make it free software which everyone +can redistribute and change under these terms. + +To do so, attach the following notices to the program. It is safest to attach them +to the start of each source file to most effectively state the exclusion of warranty; +and each file should have at least the “copyright” line and a pointer to +where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + +If the program does terminal interaction, make it output a short notice like this +when it starts in an interactive mode: + + Copyright (C) + This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type 'show c' for details. + +The hypothetical commands `show w` and `show c` should show the appropriate parts of +the General Public License. Of course, your program's commands might be different; +for a GUI interface, you would use an “about box”. + +You should also get your employer (if you work as a programmer) or school, if any, to +sign a “copyright disclaimer” for the program, if necessary. For more +information on this, and how to apply and follow the GNU GPL, see +<>. + +The GNU General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may consider it +more useful to permit linking proprietary applications with the library. If this is +what you want to do, use the GNU Lesser General Public License instead of this +License. But first, please read +<>. diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 0000000..cd80a46 --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,18 @@ +# Generated by roxygen2: do not edit by hand + +export(build_ecotox_sqlite) +export(check_ecotox_availability) +export(cite_ecotox) +export(dbConnectEcotox) +export(dbDisconnectEcotox) +export(download_ecotox_data) +export(get_ecotox_info) +export(get_ecotox_path) +export(get_ecotox_sqlite_file) +export(list_ecotox_fields) +export(search_ecotox) +export(search_query_ecotox) +importFrom(RSQLite,dbConnect) +importFrom(RSQLite,dbDisconnect) +importFrom(RSQLite,dbExecute) +importFrom(RSQLite,dbWriteTable) diff --git a/NEWS b/NEWS new file mode 100644 index 0000000..2e97db5 --- /dev/null +++ b/NEWS @@ -0,0 +1,8 @@ +ECOTOXr v0.1.0 (Release date: 2021-10-03) +============= + + * Inital release which can: + + * Download raw ECOTOX database tables from the EPA website + * Build an SQLite database from those files + * Search and extract data from the created local database diff --git a/R/ECOTOXr.r b/R/ECOTOXr.r new file mode 100644 index 0000000..2808b41 --- /dev/null +++ b/R/ECOTOXr.r @@ -0,0 +1,103 @@ +#' Package description +#' +#' Everything you need to know when you start using the ECOTOXr package. +#' +#' The ECOTOXr provides the means to efficiently search, extract and analyse \href{https://www.epa.gov/}{US EPA} +#' \href{https://cfpub.epa.gov/ecotox/}{ECOTOX} data, with a focus on reproducible results. Although the package +#' creator/maintainer is confident in the quality of this software, it is the end users sole responsibility to +#' assure the quality of his or her work while using this software. As per the provided license terms the package +#' maintainer is not liable for any damage resulting from its usage. That being said, below we present some tips +#' for generating reproducible results with this package. +#' +#' @section How do I get started?: +#' Installing this package is only the first step to get things started. You need to perform the following steps +#' in order to use the package to its full capacity. +#' +#' \itemize{ +#' \item{ +#' First download a copy of the complete EPA database. This can be done by calling \code{\link{download_ecotox_data}}. +#' This may not always work on all machines as R does not always accept the website SSL certificate from the EPA. +#' In those cases the zipped archive with the database files can be downloaded manually with a different (more +#' forgiving) browser. The files from the zip archive can be extracted to a location of choice. +#' } +#' \item{ +#' Next, an SQLite database needs to be build from the downloaded files. This will be done automatically when +#' you used \code{\link{download_ecotox_data}} in the previous step. When you have manually downloaded the files +#' you can call \code{\link{build_ecotox_sqlite}} to build the database locally. +#' } +#' \item{ +#' When the previous steps have been performed successfully, you can now search the database by calling +#' \code{\link{search_ecotox}}. You can also use \code{\link{dbConnectEcotox}} to open a connection to the +#' database. You can query the database using this connection and any of the methods provided from the +#' \link[DBI:DBI]{DBI} or \link[RSQLite:RSQLite]{RSQLite} packages. +#' } +#' } +#' +#' @section How do I obtain reproducible results?: +#' Each individual user is responsible for evaluating the reproducibility of his or her work. Although +#' this package offers instruments to achieve reproducibility, it is not guaranteed. In order to increase the +#' chances of generating reproducible results, one should adhere at least to the following rules: +#' \itemize{ +#' \item{ +#' Always use an official release from CRAN, and cite the version used in your analyses (\code{citation("ECOTOXr")}). +#' Different versions, may produce different end results (although we will strive for backward compatibility). +#' } +#' \item{ +#' Make sure you are working with a clean (unaltered) version of the database. When in doubt, download and build +#' a fresh copy of the database (\code{\link{download_ecotox_data}}). Also cite the (release) version of the downloaded +#' database (\code{\link{cite_ecotox}}), and the system operating system in which the local database was build +#' \code{\link{get_ecotox_info}}). Or, just make sure that you never modify the database (e.g., write data to it, delete +#' data from it, etc.) +#' } +#' \item{ +#' In order to avoid platform dependencies it is advised to only include non-accented alpha-numerical characters in +#' search terms. See also \link{search_ecotox} and \link{build_ecotox_sqlite}. +#' } +#' \item{ +#' When trying to reproduce database extractions from earlier database releases, filter out additions after +#' that specific release. This can be done by adding output fields 'tests.modified_date', 'tests.created_date' and +#' 'tests.published_date' to your search and compare those with the release date of the database you are trying to +#' reproduce results from. +#' } +#' } +#' +#' @section Why isn't the database included in the package?: +#' This package doesn't come bundled with a copy of the database which needs to be downloaded the first time the +#' package is used. Why is this? There are several reasons: +#' \itemize{ +#' \item{ +#' The database is maintained and updated by the \href{https://www.epa.gov/}{US EPA}. This process is and should be +#' outside the sphere of influence of the package maintainer. +#' } +#' \item{ +#' Packages on CRAN are not allowed to contain large amounts of data. Publication on CRAN is key to control +#' the quality of this package and therefore outweighs the convenience of having the data bundled with the package. +#' } +#' \item{ +#' The user has full control over the release version of the database that is being used. +#' } +#' } +#' +#' @section Why doesn't this package search the online ECOTOX database?: +#' Although this is possible, there are several reasons why we opted for creating a local copy: +#' \itemize{ +#' \item{ +#' The user would be restricted to the search options provided on the website (\href{https://cfpub.epa.gov/ecotox/}{ECOTOX}). +#' } +#' \item{ +#' The online database doesn't come with an API that would allow for convenient interface. +#' } +#' \item{ +#' The user is not limited by an internet connection and its bandwidth. +#' } +#' \item{ +#' Not all database fields can be retrieved from the online interface. +#' } +#' } +#' @docType package +#' @name ECOTOXr +#' @author Pepijn de Vries +#' @references +#' Official US EPA ECOTOX website: +#' \url{https://cfpub.epa.gov/ecotox/} +NULL diff --git a/R/database_access.r b/R/database_access.r new file mode 100644 index 0000000..156fddd --- /dev/null +++ b/R/database_access.r @@ -0,0 +1,171 @@ +#' @rdname get_path +#' @name get_ecotox_sqlite_file +#' @export +get_ecotox_sqlite_file <- function(path = get_ecotox_path(), version) { + if (missing(version)) { + version <- NULL + } else { + if (length(version) != 1) stop("Argument 'version' should hold a single element!") + version <- as.Date(version, format = "%m_%d_%Y") + } + files <- attributes(.fail_on_missing(path))$files + results <- nrow(files) + files <- files[which(files$date == ifelse(is.null(version), max(files$date)[[1]], version)),] + if (results > 1 && is.null(version)) { + warning(sprintf("Multiple versions of the database found and not one specified. Using the most recent version (%s)", + format(files$date, "%Y-%m-%d"))) + } + return(file.path(files$path, files$database)) +} + +#' Open or close a connection to the local ECOTOX database +#' +#' Wrappers for \code{\link[RSQLite:SQLite]{dbConnect}} and \code{\link[RSQLite:SQLite]{dbDisconnect}} methods. +#' +#' Open or close a connection to the local ECOTOX database. These functions are only required when you want +#' to send custom queries to the database. For most searches the \code{\link{search_ecotox}} function +#' will be adequate. +#' +#' @param path A \code{character} string with the path to the location of the local database (default is +#' \code{\link{get_ecotox_path}()}). +#' @param version A \code{character} string referring to the release version of the database you wish to locate. +#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by +#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically. +#' @param conn An open connection to the ECOTOX database that needs to be closed. +#' @param ... Arguments that are passed to \code{\link[RSQLite:SQLite]{dbConnect}} method +#' or \code{\link[RSQLite:SQLite]{dbDisconnect}} method. +#' @return A database connection in the form of a \code{\link[DBI]{DBIConnection-class}} object. +#' The object is tagged with: a time stamp; the package version used; and the +#' file path of the SQLite database used in the connection. These tags are added as attributes +#' to the object. +#' @rdname dbConnectEcotox +#' @name dbConnectEcotox +#' @examples +#' \dontrun{ +#' ## This will only work when a copy of the database exists: +#' con <- dbConnectEcotox() +#' +#' ## check if the connection works by listing the tables in the database: +#' dbListTables(con) +#' +#' ## Let's be a good boy/girl and close the connection to the database when we're done: +#' dbDisconnectEcotox(con) +#' } +#' @author Pepijn de Vries +#' @export +dbConnectEcotox <- function(path = get_ecotox_path(), version, ...) { + f <- get_ecotox_sqlite_file(path, version) + return(.add_tags(RSQLite::dbConnect(RSQLite::SQLite(), f, ...), f)) +} + +#' @rdname dbConnectEcotox +#' @name dbDisconnectEcotox +#' @export +dbDisconnectEcotox <- function(conn, ...) { + RSQLite::dbDisconnect(conn, ...) +} + +#' Cite the downloaded copy of the ECOTOX database +#' +#' Cite the downloaded copy of the ECOTOX database and this package for reproducible results. +#' +#' When you download a copy of the EPA ECOTOX database using \code{\link{download_ecotox_data}()}, a BibTex file +#' is stored that registers the database release version and the access (= download) date. Use this function +#' to obtain a citation to that specific download. +#' +#' In order for others to reproduce your results, it is key to cite the data source as accurately as possible. +#' @param path A \code{character} string with the path to the location of the local database (default is +#' \code{\link{get_ecotox_path}()}). +#' @param version A \code{character} string referring to the release version of the database you wish to locate. +#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by +#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically. +#' @return Returns a \code{vector} of \code{\link{bibentry}}'s, containing a reference to the downloaded database +#' and this package. +#' @rdname cite_ecotox +#' @name cite_ecotox +#' @examples +#' \dontrun{ +#' ## In order to cite downloaded database and this package: +#' cite_ecotox() +#' } +#' @author Pepijn de Vries +#' @export +cite_ecotox <- function(path = get_ecotox_path(), version) { + db <- get_ecotox_sqlite_file(path, version) + bib <- gsub(".sqlite", "_cit.txt", db, fixed = T) + if (!file.exists(bib)) stop("No bibentry reference to database download found!") + result <- utils::readCitationFile(bib) + return(c(result, utils::citation("ECOTOXr"))) +} + +#' Get information on the local ECOTOX database when available +#' +#' Get information on how and when the local ECOTOX database was build. +#' +#' Get information on how and when the local ECOTOX database was build. This information is retrieved +#' from the log-file that is (optionally) stored with the local database when calling \code{\link{download_ecotox_data}} +#' or \code{\link{build_ecotox_sqlite}}. +#' @param path A \code{character} string with the path to the location of the local database (default is +#' \code{\link{get_ecotox_path}()}). +#' @param version A \code{character} string referring to the release version of the database you wish to locate. +#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by +#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically. +#' @return Returns a \code{vector} of \code{character}s, containing a information on the selected local ECOTOX database. +#' @rdname get_ecotox_info +#' @name get_ecotox_info +#' @examples +#' \dontrun{ +#' ## Show info on the current database (only works when one is downloaded and build): +#' get_ecotox_info() +#' } +#' @author Pepijn de Vries +#' @export +get_ecotox_info <- function(path = get_ecotox_path(), version) { + default <- "No information available\n" + inf <- tryCatch({ + db <- get_ecotox_sqlite_file(path, version) + gsub(".sqlite", ".log", db, fixed = T) + }, error = function(e) return(default)) + if (file.exists(inf)) { + inf <- readLines(inf) + } else { + inf <- default + } + cat(paste(inf, collapse = "\n")) + return(invisible(inf)) +} + +#' List the field names that are available from the ECOTOX database +#' +#' List the field names (table headers) that are available from the ECOTOX database +#' +#' This can be useful when specifying a \code{\link{search_ecotox}}, to identify which fields +#' are available from the database, for searching and output. +#' @param which A \code{character} string that specifies which fields to return. Can be any of: +#' '\code{default}': returns default output field names; '\code{all}': returns all fields; or +#' '\code{full}': returns all except fields from table 'dose_response_details'. +#' @param include_table A \code{logical} value indicating whether the table name should be included +#' as prefix. Default is \code{TRUE}. +#' @return Returns a \code{vector} of type \code{character} containing the field names from the ECOTOX database. +#' @rdname list_ecotox_fields +#' @name list_ecotox_fields +#' @examples +#' ## Fields that are included in search results by default: +#' list_ecotox_fields("default") +#' +#' ## All fields that are available from the ECOTOX database: +#' list_ecotox_fields("all") +#' +#' ## All except fields from the table 'dose_response_details' +#' ## that are available from the ECOTOX database: +#' list_ecotox_fields("all") +#' @author Pepijn de Vries +#' @export +list_ecotox_fields <- function(which = c("default", "full", "all"), include_table = TRUE) { + which <- match.arg(which) + result <- .db_specs$field_name + if (include_table) result <- paste(.db_specs$table, result, sep = ".") + if (which == "default") result <- result[.db_specs$default_output] + if (which == "full") result <- result[.db_specs$table != "dose_response_details"] + return(result) +} diff --git a/R/helpers.r b/R/helpers.r new file mode 100644 index 0000000..4e099e7 --- /dev/null +++ b/R/helpers.r @@ -0,0 +1,7 @@ +.add_tags <- function(x, sqlite) { + if (missing(sqlite)) sqlite <- attributes(x)$database_file + attributes(x)$date_created <- Sys.Date() + attributes(x)$created_with <- sprintf("Package ECOTOXr v%s", utils::packageVersion("ECOTOXr")) + attributes(x)$database_file <- sqlite + return(x) +} diff --git a/R/imports.r b/R/imports.r new file mode 100644 index 0000000..fb902e3 --- /dev/null +++ b/R/imports.r @@ -0,0 +1,12 @@ +.onAttach <- function(libname, pkgname){ + packageStartupMessage({ + if (check_ecotox_availability()) { + crayon::green("ECOTOX database file located, you are ready to go!\n") + } else { + crayon::red("ECOTOX database file not present! Invoke download and database build using 'download_ecotox_data()'\n") + } + }) +} + +#' @importFrom RSQLite dbExecute dbConnect dbDisconnect dbWriteTable +NULL diff --git a/R/init.r b/R/init.r new file mode 100644 index 0000000..4f86867 --- /dev/null +++ b/R/init.r @@ -0,0 +1,328 @@ +#' Check whether a ECOTOX database exists locally +#' +#' Tests whether a local copy of the US EPA ECOTOX database exists in \code{\link{get_ecotox_path}}. +#' +#' When arguments are omitted, this function will look in the default directory (\code{\link{get_ecotox_path}}). +#' However, it is possible to build a database file elsewhere if necessary. +#' @param target A \code{character} string specifying the path where to look for the database file. +#' @return Returns a \code{logical} value indicating whether a copy of the database exists. It also returns +#' a \code{files} attribute that lists which copies of the database are found. +#' @rdname check_ecotox_availability +#' @name check_ecotox_availability +#' @examples +#' check_ecotox_availability() +#' @author Pepijn de Vries +#' @export +check_ecotox_availability <- function(target = get_ecotox_path()) { + files <- list.files(target) + file_reg <- gregexpr("(?<=^ecotox_ascii_)(.*?)(?=\\.sqlite$)", files, perl = T) + file_reg <- regmatches(files, file_reg) + + files <- files[unlist(lapply(file_reg, length)) > 0] + file_reg <- unlist(file_reg[unlist(lapply(file_reg, length)) > 0]) + if (any(nchar(file_reg) > 0)) { + file_reg <- as.Date(file_reg, format = "%m_%d_%Y") + files <- files[!is.na(file_reg)] + file_reg <- file_reg[!is.na(file_reg)] + } else { + file_reg <- as.Date(NA)[-1] + target <- character(0) + } + result <- length(files) > 0 + attributes(result)$files <- data.frame(path = target, database = files, date = file_reg, + stringsAsFactors = F) + return(result) +} + +.fail_on_missing <- function(path = get_ecotox_path()) { + test <- check_ecotox_availability(path) + if (!test) { + stop("No local database located. Download data first by calling 'download_ecotox_data()'") + } else return(test) +} + +#' The local path to the ECOTOX database (directory or sqlite file) +#' +#' Obtain the local path to where the ECOTOX database is (or will be) placed. +#' +#' It can be useful to know where the database is located on your disk. This function +#' returns the location as provided by \code{\link[rappdirs]{app_dir}}. +#' +#' @param path When you have a copy of the database somewhere other than the default +#' directory (\code{\link{get_ecotox_path}()}), you can provide the path here. +#' @param version A \code{character} string referring to the release version of the database you wish to locate. +#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by +#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically. +#' @return Returns a \code{character} string of the path. +#' \code{get_ecotox_path} will return the default directory of the database. +#' \code{get_ecotox_sqlite_file} will return the path to the sqlite file when it exists. +#' @rdname get_path +#' @name get_ecotox_path +#' @examples +#' get_ecotox_path() +#' +#' \dontrun{ +#' ## This will only work if a local database exists: +#' get_ecotox_sqlite_file() +#' } +#' @author Pepijn de Vries +#' @export +get_ecotox_path <- function() { + rappdirs::app_dir("ECOTOXr")$cache() +} + +#' Download and extract ECOTOX database files and compose database +#' +#' In order for this package to fully function, a local copy of the ECOTOX database needs to be build. +#' This function will download the required data and build the database. +#' +#' This function will attempt to find the latest download url for the ECOTOX database from the EPA website. +#' When found it will attempt to download the zipped archive containing all required data. This data is than +#' extracted and a local copy of the database is build. +#' +#' @section Known issues: +#' On some machines this function fails to connect to the database download URL from the EPA website due to missing +#' SSL certificates. Unfortunately, there is no easy fix for this in this package. A work around is to download and +#' unzip the file manually using a different machine or browser that is less strict with SSL certificates. You can +#' then call \code{\link{build_ecotox_sqlite}()} and point the \code{source} location to the manually extracted zip +#' archive. +#' +#' @param target Target directory where the files will be downloaded and the database compiled. Default is +#' \code{\link{get_ecotox_path}()}. +#' @param write_log A \code{logical} value indicating whether a log file should be written to the target path +# after building the SQLite database. See \code{\link{build_ecotox_sqlite}()} for more details. Default is +#' \code{TRUE}. +#' @param ask There are several steps in which files are (potentially) overwritten or deleted. In those cases +#' the user is asked on the command line what to do in those cases. Set this parameter to \code{FALSE} in order +#' to continue without warning and asking. +#' @return Returns \code{NULL} invisibly. +#' @rdname download_ecotox_data +#' @name download_ecotox_data +#' @examples +#' \dontrun{ +#' download_ecotox_data() +#' } +#' @author Pepijn de Vries +#' @export +download_ecotox_data <- function(target = get_ecotox_path(), write_log = TRUE, ask = TRUE) { + avail <- check_ecotox_availability() + if (avail && ask) { + cat(sprintf("A local database already exists (%s).", paste(attributes(avail)$file$database, collapse = ", "))) + prompt <- readline(prompt = "Do you wish to continue and potentially overwrite the existing database? (y/n) ") + if (!startsWith("Y", toupper(prompt))) { + cat("Download aborted...\n") + return(invisible(NULL)) + } + } + if (!dir.exists(target)) dir.create(target, recursive = T) + ## Obtain download link from EPA website: + cat("Obtaining download link from EPA website... ") + con <- url("https://cfpub.epa.gov/ecotox/index.cfm") + link <- rvest::read_html(con) + link <- rvest::html_nodes(link, "a.ascii-link") + link <- rvest::html_attr(link, "href") + link <- link[!is.na(link) & endsWith(link, ".zip")] + dest_path <- file.path(target, utils::tail(unlist(strsplit(link, "/")), 1)) + closeAllConnections() + if (length(link) == 0) stop("Could not find ASCII download link...") + cat(crayon::green("Done\n")) + proceed.download <- T + if (file.exists(dest_path) && ask) { + prompt <- readline(prompt = sprintf("ECOTOX data is already present (%s), overwrite (y/n)? ", dest_path)) + proceed.download <- startsWith("Y", toupper(prompt)) + } + if (proceed.download) { + cat(sprintf("Start downloading ECOTOX data from %s...\n", link)) + con <- url(link, "rb") + dest <- file(gsub(".zip", ".incomplete.download", dest_path, fixed = T), "wb") + mb <- 0 + repeat { + read <- readBin(con, "raw", 1024*1024) ## download in 1MB chunks. + writeBin(read, dest) + mb <- mb + 1 + cat(sprintf("\r%i MB downloaded...", mb)) + if (length(read) == 0) break + } + closeAllConnections() + cat(crayon::green(" Done\n")) + } + file.rename(gsub(".zip", ".incomplete.download", dest_path, fixed = T), dest_path) + + ## create bib-file for later reference + con <- file(gsub(".zip", "_cit.txt", dest_path), "w+") + release <- as.Date(stringr::str_sub(link, -15, -1), format = "_%m_%d_%Y.zip") + writeLines(format(utils::bibentry( + "misc", + title = format(release, "US EPA ECOTOXicology Database System Version 5.0 release %Y-%m-%d"), + author = utils::person(family = "US EPA", role = "aut"), + year = format(release, "%Y"), + url = link, + howpublished = link, + note = format(Sys.Date(), "Accessed: %Y-%m-%d")), "R"), con) + close(con) + extr.path <- gsub(".zip", "", dest_path) + proceed.unzip <- T + if (dir.exists(extr.path)) { + test.files <- list.files(extr.path) + if (length(test.files) >= 12 && any(test.files == "chemical_carriers.txt") && ask) { + cat("EXtracted zip files already appear to exist.\n") + prompt <- readline(prompt = "Continue unzipping and overwriting these files (y/n)? ") + proceed.unzip <- startsWith("Y", toupper(prompt)) + } + } + if (proceed.unzip) { + cat("Extracting downloaded zip file... ") + utils::unzip(file.path(target, utils::tail(unlist(strsplit(link, "/")), 1)), exdir = target) + cat(crayon::green("Done\n")) + if (ask && + startsWith("Y", toupper(readline(prompt = "Done extracting zip file, remove it to save disk space (y/n)? ")))) { + cat("Trying to delete zip file... ") + tryCatch({ + file.remove(file.path(target, utils::tail(unlist(strsplit(link, "/")), 1))) + cat(crayon::green("Done\n")) + }, error = function(e) { + cat(crayon::red("Failed to delete the file, continuing with next step")) + }) + } + } + cat("Start constructing SQLite database from downloaded tables...\n") + cat("Note that this may take some time...\n") + build_ecotox_sqlite(extr.path, target, write_log) + return(invisible(NULL)) +} + +#' Build an SQLite database from zip archived tables downloaded from EPA website +#' +#' This function is called automatically after \code{\link{download_ecotox_data}}. The database files can +#' also be downloaded manually from the \href{https://cfpub.epa.gov/ecotox/}{EPA website} from which a local +#' database can be build using this function. +#' +#' Raw data downloaded from the EPA website is in itself not very efficient to work with in R. The files are large +#' and would put a large strain on R when loading completely into the system's memory. Instead use this function +#' to build an SQLite database from the tables. That way, the data can be queried without having to load it all into +#' memory. +#' +#' EPA provides the raw table from the \href{https://cfpub.epa.gov/ecotox/}{ECOTOX database} as text files with +#' pipe-characters ('|') as table column separators. Although not documented, the tables appear not to contain comment +#' or quotation characters. There are records containing the reserved pipe-character that will confuse the table parser. +#' For these records, the pipe-character is replaced with a dash character ('-'). +#' +#' In addition, while reading the tables as text files, this package attempts to decode the text as UTF8. Unfortunately, +#' this process appears to be platform-dependent, and may therefore result in different end-results on different platforms. +#' This problem only seems to occur for characters that are listed as 'control characters' under UTF8. This will have +#' consequences for reproducibility, but only if you build search queries that look for such special characters. It is +#' therefore advised to stick to common (non-accented) alpha-numerical characters in your searches, for the sake of +#' reproducibility. +#' +#' @param source A \code{character} string pointing to the directory path where the text files with the raw +#' tables are located. These can be obtained by extracting the zip archive from \url{https://cfpub.epa.gov/ecotox/} +#' and look for 'Download ASCII Data'. +#' @param destination A \code{character} string representing the destination path for the SQLite file. By default +#' this is \code{\link{get_ecotox_path}()}. +#' @param write_log A \code{logical} value indicating whether a log file should be written in the destination path +# after building the SQLite database. See \code{\link{build_ecotox_sqlite}()} for more details. Default is +#' \code{TRUE}. The log contains information on the source and destination path, the version of this package, +#' the creation date, and the operating system on which the database was created. +#' @return Returns \code{NULL} invisibly. +#' @rdname build_ecotox_sqlite +#' @name build_ecotox_sqlite +#' @examples +#' \dontrun{ +#' ## This example will only work properly if 'dir' points to an existing directory +#' ## with the raw tables from the ECOTOX database. This function will be called +#' ## automatically after a call to 'download_ecotox_data()'. +#' test <- check_ecotox_availability() +#' if (test) { +#' files <- attributes(test)$files[1,] +#' dir <- gsub(".sqlite", "", files$database, fixed = T) +#' path <- files$path +#' if (dir.exists(file.path(path, dir))) { +#' build_ecotox_sqlite(source = file.path(path, dir), destination = get_ecotox_path()) +#' } +#' } +#' } +#' @author Pepijn de Vries +#' @export +build_ecotox_sqlite <- function(source, destination = get_ecotox_path(), write_log = TRUE) { + dbname <- paste0(basename(source), ".sqlite") + dbcon <- RSQLite::dbConnect(RSQLite::SQLite(), file.path(destination, dbname)) + + ## Loop the text file tables and add them to the sqlite database 1 by 1 + by(.db_specs, .db_specs$table, function(tab) { + cat(sprintf("Adding '%s' table to database:\n", tab$table[[1]])) + filename <- file.path(source, paste0(tab$table[[1]], ".txt")) + if (!file.exists(filename)) filename <- file.path(source, "validation", paste0(tab$table[[1]], ".txt")) + + ## Remove table from database if it already exists + RSQLite::dbExecute(dbcon, sprintf("DROP TABLE IF EXISTS [%s];", tab$table[[1]])) + + ## specify query to create the table in the sqlite database + foreign_keys <- tab[tab$foreign_key != "",, drop = F] + if (nrow(foreign_keys) > 0) { + foreign_keys <- apply(foreign_keys, 1, function(x) { + sprintf("\tFOREIGN KEY(%s) REFERENCES [%s]", x[["field_name"]], x[["foreign_key"]]) + }) + foreign_keys <- paste(foreign_keys, collapse = ",\n") + } else foreign_keys <- "" + query <- tab[,names(tab) %in% c("field_name", "data_type", "primary_key", "not_null")] + query[is.na(query)] <- "" + query <- apply(query, 1, paste, collapse = " ") + query <- paste(paste0("\t", trimws(query)), collapse = ",\n") + if (foreign_keys != "") query <- paste0(query, ",\n", foreign_keys) + query <- sprintf("CREATE TABLE [%s](\n%s\n);", tab$table[[1]], query) + RSQLite::dbExecute(dbcon, query) + + head <- NULL + lines.read <- 1 + ## Copy tables in 50000 line fragments to database, to avoid memory issues + frag.size <- 50000 + cat(sprintf("\r 0 lines (incl. header) added of '%s' added to database", tab$table[[1]])) + repeat { + if (is.null(head)) { + head <- iconv(readr::read_lines(filename, skip = 0, n_max = 1, progress = F), to = "UTF8", sub = "*") + } else { + testsize <- ifelse(lines.read == 1, frag.size - 1, frag.size) + body <- readr::read_lines(filename, skip = lines.read, n_max = testsize, progress = F) + body <- suppressWarnings({iconv(body, to = "UTF8", sub = "*")}) + ## Replace pipe-characters with dashes when they are between brackets "("and ")", + ## These should not be interpreted as table separators and will mess up the table.read call + body <- stringr::str_replace_all(body, "(?<=\\().+?(?=\\))", function(x){ + if (grepl("[\\(/]", x)) return(x) ## there should not be another opening bracket or forward slash! in that case leave as is + gsub("[|]", "-", x) + }) + + lines.read <- lines.read + length(body) + + table.frag <- utils::read.table(text = c(head, body[1:1]), + sep = "|", header = T, quote = "", comment.char = "", + stringsAsFactors = F, strip.white = F) + + ## strip.white is set to F, as they occur in primary keys! + table.frag <- utils::read.table(text = c(head, body), + sep = "|", header = T, quote = "", comment.char = "", + stringsAsFactors = F, strip.white = F) + + RSQLite::dbWriteTable(dbcon, tab$table[[1]], table.frag, append = T) + cat(sprintf("\r %i lines (incl. header) added of '%s' added to database", lines.read, tab$table[[1]])) + if (length(body) < testsize) break + } + } + cat(crayon::green(" Done\n")) + }) + RSQLite::dbDisconnect(dbcon) + if (write_log) { + logfile <- file.path(destination, paste0(basename(source), ".log")) + downloadinfo <- file.path(destination, paste0(basename(source), "_cit.txt")) + writeLines(text = sprintf( + "ECOTOXr SQLite log\n\nSource: %s\nDestination: %s\nDownload info: %s\nBuild with: %s\nBuild on: %s\nBuild date: %s", + source, + destination, + ifelse(file.exists(downloadinfo), downloadinfo, "Not available"), + paste0("ECOTOXr V", utils::packageVersion("ECOTOXr")), + paste(Sys.info()[c("sysname", "release")], collapse = " "), + format(Sys.Date(), "%Y-%m-%d") + ), + con = logfile) + } + return(invisible(NULL)) +} diff --git a/R/sysdata.rda b/R/sysdata.rda new file mode 100644 index 0000000..3c891c8 Binary files /dev/null and b/R/sysdata.rda differ diff --git a/R/wrappers.r b/R/wrappers.r new file mode 100644 index 0000000..1da0ba4 --- /dev/null +++ b/R/wrappers.r @@ -0,0 +1,294 @@ +#' Search and retrieve toxicity records from the database +#' +#' Create (and execute) an SQL search query based on basic search terms and options. This allows you to search +#' the database, without having to understand SQL. +#' +#' The ECOTOX database is stored locally as an SQLite file, which can be queried with SQL. These functions +#' allow you to automatically generate an SQL query and send it to the database, without having to understand +#' SQL. The function \code{search_query_ecotox} generates and returns the SQL query (which can be edited by +#' hand if desired). You can also directly call \code{search_ecotox}, this will first generate the query, +#' send it to the database and retrieve the result. +#' +#' +#' Although the generated query is not optimized for speed, it should be able to process most common searches +#' within an acceptable time. The time required for retrieving data from a search query depends on the complexity +#' of the query, the size of the query and the speed of your machine. Most queries should be completed within +#' seconds (or several minutes at most) on modern machines. If your search require optimisation for speed, +#' you could try reordering the search fields. You can also edit the query generated with \code{search_query_ecotox} +#' by hand and retrieve it with \code{\link[DBI]{dbGetQuery}}. +#' +#' Note that this package is actively maintained and this function may be revised in future versions. +#' In order to create reproducible results the user must: always work with an official release from +#' CRAN and document the package and database version that are used to generate specific results (see also +#' \code{\link{cite_ecotox}()}). +#' @param search A named \code{list} containing the search terms. The names of the elements should refer to +#' the field (i.e. table header) in which the terms are searched. Use \code{\link{list_ecotox_fields}()} to +#' obtain a list of available field names. +#' +#' Each element in that list should contain another list with at least one element named 'terms'. This should +#' contain a \code{vector} of \code{character} strings with search terms. Optionally, a second element +#' named 'method' can be provided which should be set to either '\code{contain}' (default, when missing) or +#' '\code{exact}'. In the first case the query will match any record in the indicated field that contains +#' the search term. In case of '\code{exact}' it will only return exact matches. Note that searches are +#' not case sensitive, but are picky with special (accented) characters. While building the local database +#' (see \link{build_ecotox_sqlite}) such special characters may be treated differently on different +#' operating systems. For the sake of reproducibility, the user is advised to stick with non-accented +#' alpha-numeric characters. +#' +#' Search terms for a specific field (table header) will be combined with 'or'. Meaning that any record that +#' matches any of the terms are returned. For instance when 'latin_name' 'Daphnia magna' and 'Skeletonema costatum' +#' are searched, results for both species are returned. Search terms across fields (table headers) are combined with +#' 'and', which will narrow the search. For instance if 'chemical_name' 'benzene' is searched in combination +#' with 'latin_name' 'Daphnia magna', only tests where Daphnia magna are exposed to benzene are returned. +#' +#' When this search behaviour described above is not desirable, the user can either adjust the query manually, +#' or use this function to perform several separate searches and combine the results afterwards. +#' +#' Beware that some field names are ambiguous and occur in multiple tables (like `cas_number' and `code'). +#' When searching such fields, the search result may not be as expected. +#' @param output_fields A \code{vector} of \code{character} strings indicating which field names (table headers) +#' should be included in the output. By default \code{\link{list_ecotox_fields}("default")} is used. Use +#' \code{\link{list_ecotox_fields}("all")} to list all available fields. +#' @param group_by_results Ecological test results are generally the most informative element in the ECOTOX +#' database. Therefore, this search function returns a table with unique results in each row. +#' +#' However, some tables in the database (such as 'chemical_carriers' and 'dose_responses') have a one to many +#' relationship with test results. This means that multiple chemical carriers can be linked to a single test result, +#' similarly, multiple doses can also be linked to a single test result. +#' +#' By default the search results are grouped by test results. As a result not all doses or chemical carriers may +#' be displayed in the output. Set the \code{group_by_results} parameter to \code{FALSE} in order to force SQLite +#' to output all data (all carriers and doses). But beware that test results may be duplicated in those cases. +#' @param ... Arguments passed to \code{\link{dbConnectEcotox}}. You can use this when the database +#' is not located at the default path (\code{\link{get_ecotox_path}()}). +#' @return In case of \code{search_query_ecotox}, a \code{character} string containing an SQL +#' query is returned. This query is built based on the provided search terms and options. +#' +#' In case of \code{search_ecotox} a \code{data.frame} is returned based on the search query built with +#' \code{search_query_ecotox}. The \code{data.frame} is unmodified as returned by SQLite, meaning that all +#' fields are returned as \code{character}s (even where the field types are 'date' or 'numeric'). +#' +#' The results are tagged with: a time stamp; the package version used; and the +#' file path of the SQLite database used in the search (when applicable). These tags are added as attributes +#' to the output table or query. +#' @rdname search_ecotox +#' @name search_ecotox +#' @examples +#' \dontrun{ +#' ## let's find the ids of all ecotox tests on species +#' ## where latin names contain either of 2 specific genus names and +#' ## where they were exposed to the chemical benzene +#' if (check_ecotox_availability()) { +#' search <- +#' list( +#' latin_name = list( +#' terms = c("Skeletonema", "Daphnia"), +#' method = "contains" +#' ), +#' chemical_name = list( +#' terms = "benzene", +#' method = "exact" +#' ) +#' ) +#' ## numbers in result each represent a unique test id from the database +#' result <- search_ecotox(search) +#' query <- search_query_ecotox(search) +#' cat(query) +#' } else { +#' print("Sorry, you need to use 'download_ecotox_data()' first in order for this to work.") +#' } +#' } +#' @author Pepijn de Vries +#' @export +search_ecotox <- function(search, output_fields = list_ecotox_fields("default"), group_by_results = TRUE, ...) { + search <- search_query_ecotox(search, output_fields, group_by_results) + dbcon <- dbConnectEcotox(...) + query <- RSQLite::dbGetQuery(dbcon, search) + dbDisconnectEcotox(dbcon) + return(.add_tags(query, attributes(dbcon)$database_file)) +} + +#' @rdname search_ecotox +#' @name search_query_ecotox +#' @export +search_query_ecotox <- function(search, output_fields = list_ecotox_fields("default"), group_by_results = TRUE) { + ignored_fields <- !(output_fields %in% list_ecotox_fields("all")) + if (any(ignored_fields)) warning(sprintf("The following fields are unknown and ignored: %s.", + paste(output_fields[ignored_fields], collapse =", "))) + output_fields <- output_fields[!ignored_fields] + if (!any(grepl("^results.", output_fields))) { + warning("Output fields should contain at least 1 field from table 'results'. Adding 'test_id'.") + output_fields <- c("results.test_id") + } + + ## identify key fields that are required for joining tables + db_links <- cbind(.db_specs, + do.call(rbind, lapply(strsplit(.db_specs$foreign_key, "\\(|\\)"), function(x) { + if (length(x) < 2) return(data.frame(foreign_table = "", foreign_field = "")) else + return(data.frame(foreign_table = x[1], foreign_field = x[2])) + }))) + db_links$is_key <- db_links$primary_key == "PRIMARY KEY" | db_links$foreign_key != "" | db_links$foreign_table != "" + key_output_fields <- .db_specs[db_links$is_key | paste(.db_specs$table, .db_specs$field_name, sep = ".") %in% output_fields,,drop = F] + output_fields <- .db_specs[paste(.db_specs$table, .db_specs$field_name, sep = ".") %in% output_fields,,drop = F] + + if (!is.list(search)) stop("Parameter 'search' needs to be a list!") + if (!all(unlist(lapply(search, is.list)))) stop("Each element of parameter 'search' should contain a list") + if (any(duplicated(names(search)))) stop("You have used duplicated search fields. Use each field only once in your search!") + search.tables <- do.call(rbind, lapply(names(search), function(fn) { + tables <- unique(.db_specs$table[.db_specs$field_name %in% fn]) + if (length(tables) == 0) stop(sprintf("Unknown search field: %s", fn)) + if (fn == "test_id") tables <- "tests" + x <- search[[fn]] + if (!all(names(x) %in% c("terms", "method"))) stop("Each search field can only contain two elements: 'terms' and 'method'.") + method <- match.arg(x[["method"]], c("exact", "contains")) + wildcard <- ifelse(method == "contains", "%", "") + collapse <- ifelse(method == "contains", " OR ", ", ") + prefix <- ifelse(method == "contains", sprintf("\"%s\" LIKE ", fn), "") + if (typeof(x$terms) != "character" || length(x$terms) == 0) stop("Provide at least 1 search term (type 'character')") + + terms <- paste(sprintf("%s\"%s%s%s\"", + prefix, + wildcard, + x$terms, + wildcard), + collapse = collapse) + if (method == "exact") { + terms <- sprintf("\"%s\" COLLATE NOCASE IN (%s)", fn, terms) + } + return (data.frame(table = tables, + terms = terms, + method = method)) + })) + search.tables <- rbind(search.tables, + data.frame(table = unique(c("results", "tests", with(output_fields, table[!(table %in% search.tables$table)]))), + terms = "", method = "")) + search.tables$id <- sprintf("search%03i", seq_len(nrow(search.tables))) + search.tables$select <- unlist(lapply(seq_len(nrow(search.tables)), function(i) { + out <- key_output_fields[key_output_fields$table == search.tables$table[i],,drop = F] + paste(paste(search.tables$id[i], sprintf("\"%s\"", out$field_name), sep = "."), collapse = ", ") + })) + search.tables$query <- + with(search.tables, + sprintf("SELECT %s FROM \"%s\" AS %s%s", + select, + table, + id, + ifelse(terms != "", sprintf(" WHERE %s", terms), "") + ) + ) + ## species and species_synonyms need to be combined before we can continue + if (any(search.tables$table == "species_synonyms")) { + sp_id <- search.tables$table == "species" + ss_id <- search.tables$table == "species_synonyms" + sp <- search.tables[sp_id,] + select <- gsub(sp$id, "syns", sp$select) + q <- search.tables$query[ss_id] + q <- sprintf("SELECT %s FROM species AS syns INNER JOIN (%s) USING(species_number)", + select, + q) + search.tables$query[sp_id] <- sprintf("SELECT * FROM (%s UNION ALL %s) AS spec", search.tables$query[sp_id], q) + search.tables$id[sp_id] <- "syns" + search.tables$select[sp_id] <- select + search.tables <- search.tables[!ss_id,] + } + search.tables$linked_to <- "" + search.tables$linked_by <- "" + search.tables$linked_from <- "" + j <- 1 + for (tab in search.tables$table[!(search.tables$table %in% c("results", "tests"))]) { + repeat { + i <- which(search.tables$table == tab) + links <- subset(db_links, (db_links$table == tab & db_links$foreign_table != "") | db_links$foreign_table == tab) + exclude <- c("chemical_carriers", "doses", "dose_responses", "dose_response_details") + exclude <- exclude[!(exclude %in% output_fields$table)] + links <- subset(links, !links$table %in% exclude) + inverselink <- subset(links, links$table == tab & links$field_name %in% c("test_id", "result_id")) + if (nrow(inverselink) > 0) { + search.tables$linked_to[i] <- inverselink$foreign_table + search.tables$linked_by[i] <- inverselink$foreign_field + search.tables$linked_from[i] <- inverselink$field_name + break + } else { + links <- links[1,] + if (links$table %in% c("results", "tests")) { + search.tables$linked_to[i] <- links$table + search.tables$linked_by[i] <- links$field_name + search.tables$linked_from[i] <- links$foreign_field + break + } + temp_sel <- db_links$field_name[db_links$table == links$table] + search.tables$select[i] <- gsub(search.tables$id[i], sprintf("target%03i", j), search.tables$select[i]) + search.tables$id[i] <- sprintf("target%03i", j) + search.tables$select[i] <- paste(c(search.tables$select[i], + sprintf("%s.\"%s\"", sprintf("source%03i", j), temp_sel)), collapse = ", ") + search.tables$query[i] <- + sprintf("SELECT %s FROM \"%s\" AS source%03i\nLEFT JOIN (%s) target%03i ON source%03i.%s = target%03i.\"%s\"", + search.tables$select[i], + links$table, + j, + search.tables$q[i], + j, j, + links$field_name, + j, + links$foreign_field) + if (tab == links$table) stop("Can't build an SQL query using these parameters.") + tab <- links$table + } + } + j <- j + 1 + } + tests.query.tabs <- subset(search.tables, search.tables$linked_to == "tests") + tests.query.withs <- paste0("WITH ", paste(sprintf("%s AS (\n%s\n)", tests.query.tabs$id, tests.query.tabs$query), collapse = ",\n")) + tests.query.select <- unique(sprintf("\"%s\"", key_output_fields$field_name[key_output_fields$table == "tests"])) + + results.query.tabs <- subset(search.tables, search.tables$table == "results") + results.query.where <- results.query.tabs$terms[results.query.tabs$terms != ""] + tests_from_results <- sprintf("tests.test_id IN (SELECT DISTINCT test_id FROM results WHERE %s)", results.query.where) + + tests.query.where <- subset(search.tables, search.tables$table == "tests" & search.tables$terms != "") + tests.query.where <- paste( + sprintf("(%s)", + c(if (length(tests_from_results) > 0) tests_from_results else NULL, + if (length(tests.query.where$terms) > 0) tests.query.where$terms else NULL, + with(subset(tests.query.tabs, tests.query.tabs$terms != ""), + sprintf("tests.\"%s\" IN (SELECT \"%s\" FROM \"%s\")", + linked_by, linked_from, id)))), + collapse = " AND ") + tests.query.tabs$extrawhere <- rep("", nrow(tests.query.tabs)) + tests.query.tabs$extrawhere <- sprintf(" WHERE \"%s\".\"%s\" IN (SELECT DISTINCT tests_agg.\"%s\" FROM tests_agg)", + tests.query.tabs$table, + tests.query.tabs$linked_from, + tests.query.tabs$linked_by) + tests.query <- sprintf("%s\nSELECT %s FROM tests%s%s\n", + tests.query.withs, + paste(tests.query.select, collapse = ", "), + ifelse(tests.query.where == "", "", " WHERE "), + tests.query.where) + + tests.query <- paste0("WITH tests_agg AS (", tests.query, + ")\nSELECT * FROM tests_agg\n", + paste(sprintf("LEFT JOIN (SELECT * FROM \"%s\"%s) AS %s ON tests_agg.\"%s\" = %s.\"%s\"", + tests.query.tabs$table, + tests.query.tabs$extrawhere, + tests.query.tabs$id, + tests.query.tabs$linked_by, + tests.query.tabs$id, + tests.query.tabs$linked_from), collapse = "\n")) + results.query.where <- paste( + c(sprintf("(%s)", results.query.where), + "results.test_id IN (SELECT DISTINCT test_id FROM tests_agg)"), + collapse = " AND ") + results.query <- sprintf(paste0("WITH tests_agg AS (%s)\n", + "SELECT * FROM (SELECT DISTINCT * FROM results WHERE %s)\n", + "INNER JOIN (SELECT * FROM tests_agg) USING(test_id)%s"), + tests.query, + results.query.where, + ifelse(group_by_results, "GROUP BY result_id", "") + ) + + results.query <- sprintf("SELECT %s FROM\n(%s)", + paste(sprintf("\"%s\"", unique(output_fields$field_name)), collapse = ", "), + results.query + ) + return(.add_tags(results.query)) +} diff --git a/man/ECOTOXr.Rd b/man/ECOTOXr.Rd new file mode 100644 index 0000000..1601020 --- /dev/null +++ b/man/ECOTOXr.Rd @@ -0,0 +1,118 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/ECOTOXr.r +\docType{package} +\name{ECOTOXr} +\alias{ECOTOXr} +\title{Package description} +\description{ +Everything you need to know when you start using the ECOTOXr package. +} +\details{ +The ECOTOXr provides the means to efficiently search, extract and analyse \href{https://www.epa.gov/}{US EPA} +\href{https://cfpub.epa.gov/ecotox/}{ECOTOX} data, with a focus on reproducible results. Although the package +creator/maintainer is confident in the quality of this software, it is the end users sole responsibility to +assure the quality of his or her work while using this software. As per the provided license terms the package +maintainer is not liable for any damage resulting from its usage. That being said, below we present some tips +for generating reproducible results with this package. +} +\section{How do I get started?}{ + +Installing this package is only the first step to get things started. You need to perform the following steps +in order to use the package to its full capacity. + +\itemize{ +\item{ +First download a copy of the complete EPA database. This can be done by calling \code{\link{download_ecotox_data}}. +This may not always work on all machines as R does not always accept the website SSL certificate from the EPA. +In those cases the zipped archive with the database files can be downloaded manually with a different (more +forgiving) browser. The files from the zip archive can be extracted to a location of choice. +} +\item{ +Next, an SQLite database needs to be build from the downloaded files. This will be done automatically when +you used \code{\link{download_ecotox_data}} in the previous step. When you have manually downloaded the files +you can call \code{\link{build_ecotox_sqlite}} to build the database locally. +} +\item{ +When the previous steps have been performed successfully, you can now search the database by calling +\code{\link{search_ecotox}}. You can also use \code{\link{dbConnectEcotox}} to open a connection to the +database. You can query the database using this connection and any of the methods provided from the +\link[DBI:DBI]{DBI} or \link[RSQLite:RSQLite]{RSQLite} packages. +} +} +} + +\section{How do I obtain reproducible results?}{ + +Each individual user is responsible for evaluating the reproducibility of his or her work. Although +this package offers instruments to achieve reproducibility, it is not guaranteed. In order to increase the +chances of generating reproducible results, one should adhere at least to the following rules: +\itemize{ +\item{ +Always use an official release from CRAN, and cite the version used in your analyses (\code{citation("ECOTOXr")}). +Different versions, may produce different end results (although we will strive for backward compatibility). +} +\item{ +Make sure you are working with a clean (unaltered) version of the database. When in doubt, download and build +a fresh copy of the database (\code{\link{download_ecotox_data}}). Also cite the (release) version of the downloaded +database (\code{\link{cite_ecotox}}), and the system operating system in which the local database was build +\code{\link{get_ecotox_info}}). Or, just make sure that you never modify the database (e.g., write data to it, delete +data from it, etc.) +} +\item{ +In order to avoid platform dependencies it is advised to only include non-accented alpha-numerical characters in +search terms. See also \link{search_ecotox} and \link{build_ecotox_sqlite}. +} +\item{ +When trying to reproduce database extractions from earlier database releases, filter out additions after +that specific release. This can be done by adding output fields 'tests.modified_date', 'tests.created_date' and +'tests.published_date' to your search and compare those with the release date of the database you are trying to +reproduce results from. +} +} +} + +\section{Why isn't the database included in the package?}{ + +This package doesn't come bundled with a copy of the database which needs to be downloaded the first time the +package is used. Why is this? There are several reasons: +\itemize{ +\item{ +The database is maintained and updated by the \href{https://www.epa.gov/}{US EPA}. This process is and should be +outside the sphere of influence of the package maintainer. +} +\item{ +Packages on CRAN are not allowed to contain large amounts of data. Publication on CRAN is key to control +the quality of this package and therefore outweighs the convenience of having the data bundled with the package. +} +\item{ +The user has full control over the release version of the database that is being used. +} +} +} + +\section{Why doesn't this package search the online ECOTOX database?}{ + +Although this is possible, there are several reasons why we opted for creating a local copy: +\itemize{ +\item{ +The user would be restricted to the search options provided on the website (\href{https://cfpub.epa.gov/ecotox/}{ECOTOX}). +} +\item{ +The online database doesn't come with an API that would allow for convenient interface. +} +\item{ +The user is not limited by an internet connection and its bandwidth. +} +\item{ +Not all database fields can be retrieved from the online interface. +} +} +} + +\references{ +Official US EPA ECOTOX website: +\url{https://cfpub.epa.gov/ecotox/} +} +\author{ +Pepijn de Vries +} diff --git a/man/build_ecotox_sqlite.Rd b/man/build_ecotox_sqlite.Rd new file mode 100644 index 0000000..5f61cf0 --- /dev/null +++ b/man/build_ecotox_sqlite.Rd @@ -0,0 +1,65 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/init.r +\name{build_ecotox_sqlite} +\alias{build_ecotox_sqlite} +\title{Build an SQLite database from zip archived tables downloaded from EPA website} +\usage{ +build_ecotox_sqlite(source, destination = get_ecotox_path(), write_log = TRUE) +} +\arguments{ +\item{source}{A \code{character} string pointing to the directory path where the text files with the raw +tables are located. These can be obtained by extracting the zip archive from \url{https://cfpub.epa.gov/ecotox/} +and look for 'Download ASCII Data'.} + +\item{destination}{A \code{character} string representing the destination path for the SQLite file. By default +this is \code{\link{get_ecotox_path}()}.} + +\item{write_log}{A \code{logical} value indicating whether a log file should be written in the destination path +\code{TRUE}. The log contains information on the source and destination path, the version of this package, +the creation date, and the operating system on which the database was created.} +} +\value{ +Returns \code{NULL} invisibly. +} +\description{ +This function is called automatically after \code{\link{download_ecotox_data}}. The database files can +also be downloaded manually from the \href{https://cfpub.epa.gov/ecotox/}{EPA website} from which a local +database can be build using this function. +} +\details{ +Raw data downloaded from the EPA website is in itself not very efficient to work with in R. The files are large +and would put a large strain on R when loading completely into the system's memory. Instead use this function +to build an SQLite database from the tables. That way, the data can be queried without having to load it all into +memory. + +EPA provides the raw table from the \href{https://cfpub.epa.gov/ecotox/}{ECOTOX database} as text files with +pipe-characters ('|') as table column separators. Although not documented, the tables appear not to contain comment +or quotation characters. There are records containing the reserved pipe-character that will confuse the table parser. +For these records, the pipe-character is replaced with a dash character ('-'). + +In addition, while reading the tables as text files, this package attempts to decode the text as UTF8. Unfortunately, +this process appears to be platform-dependent, and may therefore result in different end-results on different platforms. +This problem only seems to occur for characters that are listed as 'control characters' under UTF8. This will have +consequences for reproducibility, but only if you build search queries that look for such special characters. It is +therefore advised to stick to common (non-accented) alpha-numerical characters in your searches, for the sake of +reproducibility. +} +\examples{ +\dontrun{ +## This example will only work properly if 'dir' points to an existing directory +## with the raw tables from the ECOTOX database. This function will be called +## automatically after a call to 'download_ecotox_data()'. +test <- check_ecotox_availability() +if (test) { + files <- attributes(test)$files[1,] + dir <- gsub(".sqlite", "", files$database, fixed = T) + path <- files$path + if (dir.exists(file.path(path, dir))) { + build_ecotox_sqlite(source = file.path(path, dir), destination = get_ecotox_path()) + } +} +} +} +\author{ +Pepijn de Vries +} diff --git a/man/check_ecotox_availability.Rd b/man/check_ecotox_availability.Rd new file mode 100644 index 0000000..6abfc59 --- /dev/null +++ b/man/check_ecotox_availability.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/init.r +\name{check_ecotox_availability} +\alias{check_ecotox_availability} +\title{Check whether a ECOTOX database exists locally} +\usage{ +check_ecotox_availability(target = get_ecotox_path()) +} +\arguments{ +\item{target}{A \code{character} string specifying the path where to look for the database file.} +} +\value{ +Returns a \code{logical} value indicating whether a copy of the database exists. It also returns +a \code{files} attribute that lists which copies of the database are found. +} +\description{ +Tests whether a local copy of the US EPA ECOTOX database exists in \code{\link{get_ecotox_path}}. +} +\details{ +When arguments are omitted, this function will look in the default directory (\code{\link{get_ecotox_path}}). +However, it is possible to build a database file elsewhere if necessary. +} +\examples{ +check_ecotox_availability() +} +\author{ +Pepijn de Vries +} diff --git a/man/cite_ecotox.Rd b/man/cite_ecotox.Rd new file mode 100644 index 0000000..f663849 --- /dev/null +++ b/man/cite_ecotox.Rd @@ -0,0 +1,39 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/database_access.r +\name{cite_ecotox} +\alias{cite_ecotox} +\title{Cite the downloaded copy of the ECOTOX database} +\usage{ +cite_ecotox(path = get_ecotox_path(), version) +} +\arguments{ +\item{path}{A \code{character} string with the path to the location of the local database (default is +\code{\link{get_ecotox_path}()}).} + +\item{version}{A \code{character} string referring to the release version of the database you wish to locate. +It should have the same format as the date in the EPA download link, which is month, day, year, separated by +underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically.} +} +\value{ +Returns a \code{vector} of \code{\link{bibentry}}'s, containing a reference to the downloaded database +and this package. +} +\description{ +Cite the downloaded copy of the ECOTOX database and this package for reproducible results. +} +\details{ +When you download a copy of the EPA ECOTOX database using \code{\link{download_ecotox_data}()}, a BibTex file +is stored that registers the database release version and the access (= download) date. Use this function +to obtain a citation to that specific download. + +In order for others to reproduce your results, it is key to cite the data source as accurately as possible. +} +\examples{ +\dontrun{ +## In order to cite downloaded database and this package: +cite_ecotox() +} +} +\author{ +Pepijn de Vries +} diff --git a/man/dbConnectEcotox.Rd b/man/dbConnectEcotox.Rd new file mode 100644 index 0000000..722d965 --- /dev/null +++ b/man/dbConnectEcotox.Rd @@ -0,0 +1,53 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/database_access.r +\name{dbConnectEcotox} +\alias{dbConnectEcotox} +\alias{dbDisconnectEcotox} +\title{Open or close a connection to the local ECOTOX database} +\usage{ +dbConnectEcotox(path = get_ecotox_path(), version, ...) + +dbDisconnectEcotox(conn, ...) +} +\arguments{ +\item{path}{A \code{character} string with the path to the location of the local database (default is +\code{\link{get_ecotox_path}()}).} + +\item{version}{A \code{character} string referring to the release version of the database you wish to locate. +It should have the same format as the date in the EPA download link, which is month, day, year, separated by +underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically.} + +\item{...}{Arguments that are passed to \code{\link[RSQLite:SQLite]{dbConnect}} method +or \code{\link[RSQLite:SQLite]{dbDisconnect}} method.} + +\item{conn}{An open connection to the ECOTOX database that needs to be closed.} +} +\value{ +A database connection in the form of a \code{\link[DBI]{DBIConnection-class}} object. +The object is tagged with: a time stamp; the package version used; and the +file path of the SQLite database used in the connection. These tags are added as attributes +to the object. +} +\description{ +Wrappers for \code{\link[RSQLite:SQLite]{dbConnect}} and \code{\link[RSQLite:SQLite]{dbDisconnect}} methods. +} +\details{ +Open or close a connection to the local ECOTOX database. These functions are only required when you want +to send custom queries to the database. For most searches the \code{\link{search_ecotox}} function +will be adequate. +} +\examples{ +\dontrun{ +## This will only work when a copy of the database exists: +con <- dbConnectEcotox() + +## check if the connection works by listing the tables in the database: +dbListTables(con) + +## Let's be a good boy/girl and close the connection to the database when we're done: +dbDisconnectEcotox(con) +} +} +\author{ +Pepijn de Vries +} diff --git a/man/download_ecotox_data.Rd b/man/download_ecotox_data.Rd new file mode 100644 index 0000000..2ad2d4f --- /dev/null +++ b/man/download_ecotox_data.Rd @@ -0,0 +1,48 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/init.r +\name{download_ecotox_data} +\alias{download_ecotox_data} +\title{Download and extract ECOTOX database files and compose database} +\usage{ +download_ecotox_data(target = get_ecotox_path(), write_log = TRUE, ask = TRUE) +} +\arguments{ +\item{target}{Target directory where the files will be downloaded and the database compiled. Default is +\code{\link{get_ecotox_path}()}.} + +\item{write_log}{A \code{logical} value indicating whether a log file should be written to the target path +\code{TRUE}.} + +\item{ask}{There are several steps in which files are (potentially) overwritten or deleted. In those cases +the user is asked on the command line what to do in those cases. Set this parameter to \code{FALSE} in order +to continue without warning and asking.} +} +\value{ +Returns \code{NULL} invisibly. +} +\description{ +In order for this package to fully function, a local copy of the ECOTOX database needs to be build. +This function will download the required data and build the database. +} +\details{ +This function will attempt to find the latest download url for the ECOTOX database from the EPA website. +When found it will attempt to download the zipped archive containing all required data. This data is than +extracted and a local copy of the database is build. +} +\section{Known issues}{ + +On some machines this function fails to connect to the database download URL from the EPA website due to missing +SSL certificates. Unfortunately, there is no easy fix for this in this package. A work around is to download and +unzip the file manually using a different machine or browser that is less strict with SSL certificates. You can +then call \code{\link{build_ecotox_sqlite}()} and point the \code{source} location to the manually extracted zip +archive. +} + +\examples{ +\dontrun{ +download_ecotox_data() +} +} +\author{ +Pepijn de Vries +} diff --git a/man/get_ecotox_info.Rd b/man/get_ecotox_info.Rd new file mode 100644 index 0000000..9197ee9 --- /dev/null +++ b/man/get_ecotox_info.Rd @@ -0,0 +1,36 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/database_access.r +\name{get_ecotox_info} +\alias{get_ecotox_info} +\title{Get information on the local ECOTOX database when available} +\usage{ +get_ecotox_info(path = get_ecotox_path(), version) +} +\arguments{ +\item{path}{A \code{character} string with the path to the location of the local database (default is +\code{\link{get_ecotox_path}()}).} + +\item{version}{A \code{character} string referring to the release version of the database you wish to locate. +It should have the same format as the date in the EPA download link, which is month, day, year, separated by +underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically.} +} +\value{ +Returns a \code{vector} of \code{character}s, containing a information on the selected local ECOTOX database. +} +\description{ +Get information on how and when the local ECOTOX database was build. +} +\details{ +Get information on how and when the local ECOTOX database was build. This information is retrieved +from the log-file that is (optionally) stored with the local database when calling \code{\link{download_ecotox_data}} +or \code{\link{build_ecotox_sqlite}}. +} +\examples{ +\dontrun{ +## Show info on the current database (only works when one is downloaded and build): +get_ecotox_info() +} +} +\author{ +Pepijn de Vries +} diff --git a/man/get_path.Rd b/man/get_path.Rd new file mode 100644 index 0000000..407857c --- /dev/null +++ b/man/get_path.Rd @@ -0,0 +1,42 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/database_access.r, R/init.r +\name{get_ecotox_sqlite_file} +\alias{get_ecotox_sqlite_file} +\alias{get_ecotox_path} +\title{The local path to the ECOTOX database (directory or sqlite file)} +\usage{ +get_ecotox_sqlite_file(path = get_ecotox_path(), version) + +get_ecotox_path() +} +\arguments{ +\item{path}{When you have a copy of the database somewhere other than the default +directory (\code{\link{get_ecotox_path}()}), you can provide the path here.} + +\item{version}{A \code{character} string referring to the release version of the database you wish to locate. +It should have the same format as the date in the EPA download link, which is month, day, year, separated by +underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically.} +} +\value{ +Returns a \code{character} string of the path. +\code{get_ecotox_path} will return the default directory of the database. +\code{get_ecotox_sqlite_file} will return the path to the sqlite file when it exists. +} +\description{ +Obtain the local path to where the ECOTOX database is (or will be) placed. +} +\details{ +It can be useful to know where the database is located on your disk. This function +returns the location as provided by \code{\link[rappdirs]{app_dir}}. +} +\examples{ +get_ecotox_path() + +\dontrun{ +## This will only work if a local database exists: +get_ecotox_sqlite_file() +} +} +\author{ +Pepijn de Vries +} diff --git a/man/list_ecotox_fields.Rd b/man/list_ecotox_fields.Rd new file mode 100644 index 0000000..05a65b4 --- /dev/null +++ b/man/list_ecotox_fields.Rd @@ -0,0 +1,40 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/database_access.r +\name{list_ecotox_fields} +\alias{list_ecotox_fields} +\title{List the field names that are available from the ECOTOX database} +\usage{ +list_ecotox_fields(which = c("default", "full", "all"), include_table = TRUE) +} +\arguments{ +\item{which}{A \code{character} string that specifies which fields to return. Can be any of: +'\code{default}': returns default output field names; '\code{all}': returns all fields; or +'\code{full}': returns all except fields from table 'dose_response_details'.} + +\item{include_table}{A \code{logical} value indicating whether the table name should be included +as prefix. Default is \code{TRUE}.} +} +\value{ +Returns a \code{vector} of type \code{character} containing the field names from the ECOTOX database. +} +\description{ +List the field names (table headers) that are available from the ECOTOX database +} +\details{ +This can be useful when specifying a \code{\link{search_ecotox}}, to identify which fields +are available from the database, for searching and output. +} +\examples{ +## Fields that are included in search results by default: +list_ecotox_fields("default") + +## All fields that are available from the ECOTOX database: +list_ecotox_fields("all") + +## All except fields from the table 'dose_response_details' +## that are available from the ECOTOX database: +list_ecotox_fields("all") +} +\author{ +Pepijn de Vries +} diff --git a/man/search_ecotox.Rd b/man/search_ecotox.Rd new file mode 100644 index 0000000..9749a83 --- /dev/null +++ b/man/search_ecotox.Rd @@ -0,0 +1,130 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/wrappers.r +\name{search_ecotox} +\alias{search_ecotox} +\alias{search_query_ecotox} +\title{Search and retrieve toxicity records from the database} +\usage{ +search_ecotox( + search, + output_fields = list_ecotox_fields("default"), + group_by_results = TRUE, + ... +) + +search_query_ecotox( + search, + output_fields = list_ecotox_fields("default"), + group_by_results = TRUE +) +} +\arguments{ +\item{search}{A named \code{list} containing the search terms. The names of the elements should refer to +the field (i.e. table header) in which the terms are searched. Use \code{\link{list_ecotox_fields}()} to +obtain a list of available field names. + +Each element in that list should contain another list with at least one element named 'terms'. This should +contain a \code{vector} of \code{character} strings with search terms. Optionally, a second element +named 'method' can be provided which should be set to either '\code{contain}' (default, when missing) or +'\code{exact}'. In the first case the query will match any record in the indicated field that contains +the search term. In case of '\code{exact}' it will only return exact matches. Note that searches are +not case sensitive, but are picky with special (accented) characters. While building the local database +(see \link{build_ecotox_sqlite}) such special characters may be treated differently on different +operating systems. For the sake of reproducibility, the user is advised to stick with non-accented +alpha-numeric characters. + +Search terms for a specific field (table header) will be combined with 'or'. Meaning that any record that +matches any of the terms are returned. For instance when 'latin_name' 'Daphnia magna' and 'Skeletonema costatum' +are searched, results for both species are returned. Search terms across fields (table headers) are combined with +'and', which will narrow the search. For instance if 'chemical_name' 'benzene' is searched in combination +with 'latin_name' 'Daphnia magna', only tests where Daphnia magna are exposed to benzene are returned. + +When this search behaviour described above is not desirable, the user can either adjust the query manually, +or use this function to perform several separate searches and combine the results afterwards. + +Beware that some field names are ambiguous and occur in multiple tables (like `cas_number' and `code'). +When searching such fields, the search result may not be as expected.} + +\item{output_fields}{A \code{vector} of \code{character} strings indicating which field names (table headers) +should be included in the output. By default \code{\link{list_ecotox_fields}("default")} is used. Use +\code{\link{list_ecotox_fields}("all")} to list all available fields.} + +\item{group_by_results}{Ecological test results are generally the most informative element in the ECOTOX +database. Therefore, this search function returns a table with unique results in each row. + +However, some tables in the database (such as 'chemical_carriers' and 'dose_responses') have a one to many +relationship with test results. This means that multiple chemical carriers can be linked to a single test result, +similarly, multiple doses can also be linked to a single test result. + +By default the search results are grouped by test results. As a result not all doses or chemical carriers may +be displayed in the output. Set the \code{group_by_results} parameter to \code{FALSE} in order to force SQLite +to output all data (all carriers and doses). But beware that test results may be duplicated in those cases.} + +\item{...}{Arguments passed to \code{\link{dbConnectEcotox}}. You can use this when the database +is not located at the default path (\code{\link{get_ecotox_path}()}).} +} +\value{ +In case of \code{search_query_ecotox}, a \code{character} string containing an SQL +query is returned. This query is built based on the provided search terms and options. + +In case of \code{search_ecotox} a \code{data.frame} is returned based on the search query built with +\code{search_query_ecotox}. The \code{data.frame} is unmodified as returned by SQLite, meaning that all +fields are returned as \code{character}s (even where the field types are 'date' or 'numeric'). + +The results are tagged with: a time stamp; the package version used; and the +file path of the SQLite database used in the search (when applicable). These tags are added as attributes +to the output table or query. +} +\description{ +Create (and execute) an SQL search query based on basic search terms and options. This allows you to search +the database, without having to understand SQL. +} +\details{ +The ECOTOX database is stored locally as an SQLite file, which can be queried with SQL. These functions +allow you to automatically generate an SQL query and send it to the database, without having to understand +SQL. The function \code{search_query_ecotox} generates and returns the SQL query (which can be edited by +hand if desired). You can also directly call \code{search_ecotox}, this will first generate the query, +send it to the database and retrieve the result. + + +Although the generated query is not optimized for speed, it should be able to process most common searches +within an acceptable time. The time required for retrieving data from a search query depends on the complexity +of the query, the size of the query and the speed of your machine. Most queries should be completed within +seconds (or several minutes at most) on modern machines. If your search require optimisation for speed, +you could try reordering the search fields. You can also edit the query generated with \code{search_query_ecotox} +by hand and retrieve it with \code{\link[DBI]{dbGetQuery}}. + +Note that this package is actively maintained and this function may be revised in future versions. +In order to create reproducible results the user must: always work with an official release from +CRAN and document the package and database version that are used to generate specific results (see also +\code{\link{cite_ecotox}()}). +} +\examples{ +\dontrun{ +## let's find the ids of all ecotox tests on species +## where latin names contain either of 2 specific genus names and +## where they were exposed to the chemical benzene +if (check_ecotox_availability()) { + search <- + list( + latin_name = list( + terms = c("Skeletonema", "Daphnia"), + method = "contains" + ), + chemical_name = list( + terms = "benzene", + method = "exact" + ) + ) + ## numbers in result each represent a unique test id from the database + result <- search_ecotox(search) + query <- search_query_ecotox(search) + cat(query) +} else { + print("Sorry, you need to use 'download_ecotox_data()' first in order for this to work.") +} +} +} +\author{ +Pepijn de Vries +} diff --git a/tests/testthat.R b/tests/testthat.R new file mode 100644 index 0000000..94ea054 --- /dev/null +++ b/tests/testthat.R @@ -0,0 +1,4 @@ +library(testthat) +library(ECOTOXr) + +test_check("ECOTOXr") diff --git a/tests/testthat/test_that.r b/tests/testthat/test_that.r new file mode 100644 index 0000000..767f475 --- /dev/null +++ b/tests/testthat/test_that.r @@ -0,0 +1,293 @@ +check_db <- function() { + if (!check_ecotox_availability()) { + skip("ECOTOX database not available") + } +} + +simple_search1 <- if (check_ecotox_availability()) { + suppressWarnings(search_ecotox( + list(latin_name = list(terms = "Daphnia magna"), chemical_name = list(terms = "benzene")), + c(list_ecotox_fields(), "results.result_id", "results.test_id", "tests.reference_number"))) +} else NULL + +simple_search2 <- if (check_ecotox_availability()) { + suppressWarnings({search_ecotox(list(test_id = list(terms = "1")))}) +} else NULL + +simple_search3 <- if (check_ecotox_availability()) { + suppressWarnings({search_ecotox(list(latin_name = list(terms = "perdix perdix"), test_cas = list(terms="1336363")), + c(list_ecotox_fields(), "results.result_id", "results.test_id", "tests.reference_number"))}) +} else NULL + +throws_errors <- function(expression) { + result <- F + tryCatch(expression, error = function(e) {result <<- T}, warning = function(w) {invisible(NULL)}) + result +} + +################################# +################################# +#### #### +#### TEST 01 #### +#### #### +################################# +################################# + + +test_that("All tables in the database are specified", { + check_db() + expect_true({ + dbcon <- suppressWarnings(dbConnectEcotox()) + test <- all(dbListTables(dbcon) %in% ECOTOXr:::.db_specs$table) + dbDisconnectEcotox(dbcon) + test + }) +}) + +################################# +################################# +#### #### +#### TEST 02 #### +#### #### +################################# +################################# + +test_that("All specified tables are in the database", { + check_db() + expect_true({ + dbcon <- suppressWarnings(dbConnectEcotox()) + test <- all(ECOTOXr:::.db_specs$table %in% dbListTables(dbcon)) + dbDisconnectEcotox(dbcon) + test + }) +}) + +################################# +################################# +#### #### +#### TEST 03 #### +#### #### +################################# +################################# + +test_that("All fields in the database are specified", { + check_db() + expect_true({ + dbcon <- suppressWarnings(dbConnectEcotox()) + tables <- dbListTables(dbcon) + test <- all(unlist(lapply(tables, function(tab) { + all(dbListFields(dbcon, tab) %in% subset(ECOTOXr:::.db_specs, table == tab)$field_name) + }))) + dbDisconnectEcotox(dbcon) + test + }) +}) + +################################# +################################# +#### #### +#### TEST 04 #### +#### #### +################################# +################################# + +test_that("All specified fields are in the database", { + check_db() + expect_true({ + dbcon <- suppressWarnings(dbConnectEcotox()) + tables <- dbListTables(dbcon) + test <- all(unlist(lapply(tables, function(tab) { + all(subset(ECOTOXr:::.db_specs, table == tab)$field_name %in% dbListFields(dbcon, tab)) + }))) + dbDisconnectEcotox(dbcon) + test + }) +}) + +################################# +################################# +#### #### +#### TEST 05 #### +#### #### +################################# +################################# + +test_that("Getting the path to the ECOTOX database doesn't throw errors", { + expect_false(throws_errors({get_ecotox_path()})) +}) + +################################# +################################# +#### #### +#### TEST 06 #### +#### #### +################################# +################################# + +test_that("Getting SQLite file location doesn't throw errors", { + check_db() + expect_false(throws_errors({get_ecotox_sqlite_file()})) +}) + +################################# +################################# +#### #### +#### TEST 07 #### +#### #### +################################# +################################# + +test_that("A simple search results in expected table", { + check_db() + expect_true({ + ## Compare result with anticipated ids: + all( + simple_search1$test_id %in% + c("1020021", "1020022", "1020023", "1022155", "1031085", "1031086", "1031087", "1031088", "1031196", "1031197", + "1064409", "1064410", "1064411", "1072942", "1072943", "1072944", "1083684", "1083685", "1083686", "1098939", + "1098940", "1098941", "1098942", "1098943", "1098944", "1098945", "1098946", "1098947", "1098948", "1098949", + "1098950", "1125798", "1136665", "1136666", "1142641", "1152541", "1185661", "1185662", "1185663", "1187783", + "1189253", "1237724", "2113979", "2114101", "2194929") + ) + }) +}) + +################################# +################################# +#### #### +#### TEST 08 #### +#### #### +################################# +################################# + +test_that("A simple search results in unique result ids", { + check_db() + expect_true({ + ## Compare result with anticipated ids: + all(!duplicated(simple_search1$result_id)) + }) +}) + +################################# +################################# +#### #### +#### TEST 09 #### +#### #### +################################# +################################# + +test_that("A simple when there is a reference number there is a publication year.", { + check_db() + expect_false({ + any(is.na(simple_search1$publication_year) & !is.na(simple_search1$reference_number)) + }) +}) + +################################# +################################# +#### #### +#### TEST 10 #### +#### #### +################################# +################################# + +test_that("A simple search does not necessarily result in unique result ids when chemical carriers are added to output", { + check_db() + expect_false({ + results <- suppressWarnings( + search_ecotox( + list(test_id = list(terms = "1000260")), + c("tests.test_id", "results.result_id", "chemical_carriers.carrier_id"), + group_by_results = FALSE) + ) + ## Compare result with anticipated ids: + all(!duplicated(results$result_id)) + }) +}) + +################################# +################################# +#### #### +#### TEST 11 #### +#### #### +################################# +################################# + +test_that("Default field names are fewer than all field names", { + expect_true({ + length(list_ecotox_fields("default")) < length(list_ecotox_fields("all")) + }) +}) + +################################# +################################# +#### #### +#### TEST 12 #### +#### #### +################################# +################################# + +test_that("A simple search query returns a single element of type character", { + check_db() + expect_true({ + search <- search_query_ecotox(list(test_id = list(terms = "1"))) + length(search == 1) && typeof(search) == "character" + }) +}) + +################################# +################################# +#### #### +#### TEST 13 #### +#### #### +################################# +################################# + +test_that("A query doesn't mistakenly returns field name as value", { + check_db() + expect_false({ + (is.null(simple_search2$test_grade) || all(simple_search2$test_grade == "test_grade")) + }) +}) + +################################# +################################# +#### #### +#### TEST 14 #### +#### #### +################################# +################################# + +test_that("No duplicated results are returned when searching for test id", { + check_db() + expect_false({ + any(duplicated(simple_search2)) + }) +}) + +################################# +################################# +#### #### +#### TEST 15 #### +#### #### +################################# +################################# + +test_that("When multiple doses are linked to a result, no duplicates are returned", { + check_db() + expect_false({ + any(duplicated(simple_search3)) + }) +}) + +################################# +################################# +#### #### +#### TEST 16 #### +#### #### +################################# +################################# + +test_that("get_ecotox_info doesn't throw an error.", { + expect_false({ throws_errors(get_ecotox_info()) }) +})