diff --git a/.Rbuildignore b/.Rbuildignore index 094bd58..47557b4 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -9,3 +9,5 @@ cran-comments.md ^revdep$ ^docs$ ^\.github$ +^vignettes/evergreenreviewgraphs\.Rmd\.orig$ +^vignettes/introducing-europepmc\.Rmd\.orig$ \ No newline at end of file diff --git a/vignettes/evergreenreviewgraphs.Rmd b/vignettes/evergreenreviewgraphs.Rmd index bd5b7e2..5c051bc 100644 --- a/vignettes/evergreenreviewgraphs.Rmd +++ b/vignettes/evergreenreviewgraphs.Rmd @@ -1,7 +1,7 @@ --- title: "Making trend graphs" author: "Najko Jahn" -date: "`r Sys.Date()`" +date: "2021-08-24" output: rmarkdown::html_vignette vignette: > %\VignetteEngine{knitr::rmarkdown} @@ -9,25 +9,7 @@ vignette: > \usepackage[utf8]{inputenc} --- -```{r, echo=FALSE} -knitr::opts_chunk$set( - comment = "#>", - collapse = TRUE, - warning = FALSE, - message = FALSE, - echo = TRUE, - fig.width = 7, - fig.height = 4 -) -options(scipen = 999, digits = 2) -knitr::knit_hooks$set(inline = function(x) { - if(is.numeric(x)){ - return(prettyNum(x, big.mark=",")) - }else{ - return(x) - } - }) -``` + Trend graphs in literature reviews show the development of concepts in scholarly communication. Some trend graphs, however, don't acknowledge that the number of scholarly publications is growing each year, but simply display the absolute number of hits they have found for a given concept. Noam Ross called these misleading graphs evergreen review graphs because of their enduring popularity in review papers. Examples can be found on Twitter under the Hashtag [#evergreenreviewgraph](https://twitter.com/hashtag/evergreenreviewgraph). @@ -38,9 +20,20 @@ This vignette guides you how to make proper trend graphs when reviewing Europe P We use `epmc_hits_trend()` function, which was firstly introduced in Maëlle Salmon's blog post about "How not to make an evergreen review graph"[^1]. The function takes a query in the Europe PMC search syntax[^2] and the period of years over which to perform the search as arguments, and returns a data-frame with year, total number of hits (`all_hits`) and number of hits for the query (`query_hits`). -```{r} + +```r library(europepmc) europepmc::epmc_hits_trend(query = "aspirin", period = 2010:2016) +#> # A tibble: 7 × 3 +#> year all_hits query_hits +#> +#> 1 2010 851021 7219 +#> 2 2011 904618 7888 +#> 3 2012 945899 9054 +#> 4 2013 1003845 10127 +#> 5 2014 1055435 10895 +#> 6 2015 1095859 11750 +#> 7 2016 1116157 12099 ``` By default, synonym search is disabled and only Medline/PubMed index is searched. @@ -53,11 +46,26 @@ By default, synonym search is disabled and only Medline/PubMed index is searched ### Use Case: Growth of Open Access Literature -There is a growing interest in knowing the proportion of open access to scholarly literature. Europe PMC allows searching for open access content with the [`OPEN_ACCESS:Y` parameter](https://europepmc.org/search?query=OPEN_ACCESS:Y&page=1&sortby=Relevance). At the moment, Europe PMC contains `r europepmc::epmc_hits("OPEN_ACCESS:Y")` open access full-texts. Let's see how they are relatively distributed over the period 1995 - 2016. +There is a growing interest in knowing the proportion of open access to scholarly literature. Europe PMC allows searching for open access content with the [`OPEN_ACCESS:Y` parameter](https://europepmc.org/search?query=OPEN_ACCESS:Y&page=1&sortby=Relevance). At the moment, Europe PMC contains 3,740,002 open access full-texts. Let's see how they are relatively distributed over the period 1995 - 2016. -```{r, fig.align='center'} + +```r tt_oa <- europepmc::epmc_hits_trend("OPEN_ACCESS:Y", period = 1995:2016, synonym = FALSE) tt_oa +#> # A tibble: 22 × 3 +#> year all_hits query_hits +#> +#> 1 1995 449064 3337 +#> 2 1996 458526 3508 +#> 3 1997 456744 3665 +#> 4 1998 474613 3836 +#> 5 1999 493745 3918 +#> 6 2000 532019 4328 +#> 7 2001 545674 5479 +#> 8 2002 561426 5894 +#> 9 2003 588572 7148 +#> 10 2004 628141 9795 +#> # … with 12 more rows # we use ggplot2 for plotting the graph library(ggplot2) ggplot(tt_oa, aes(year, query_hits / all_hits)) + @@ -67,6 +75,8 @@ ggplot(tt_oa, aes(year, query_hits / all_hits)) + ylab("Proportion of OA full-texts in Europe PMC") ``` +oa in europe pmc + Be careful with the interpretation of the slower growth in the last years because there are several ways how open access content is added to Europe PMC including the digitalization of back issues.[^3] [^3]: See section "Content Growth" in: McEntyre JR, Ananiadou S, Andrews S, et al. UKPMC: a full text article resource @@ -93,7 +103,8 @@ We only want to search reference lists. Because Europe PMC does not index refere Let's prepare the queries for links to the above mentioned code hosting services: -```{r} + +```r dvcs <- c("code.google.com", "github.com", "sourceforge.net", "bitbucket.org", "cran.r-project.org") # make queries including reference section @@ -102,7 +113,8 @@ dvcs_query <- paste0('REF:"', dvcs, '"') and get publications for which Europe PMC gives access to reference lists for normalizing the review graph. -```{r} + +```r library(dplyr) my_df <- purrr::map_df(dvcs_query, function(x) { # get number of publications with indexed reference lists @@ -115,6 +127,20 @@ my_df <- purrr::map_df(dvcs_query, function(x) { dplyr::select(year, all_hits, refs_hits, query_hits, query_id) }) my_df +#> # A tibble: 40 × 5 +#> year all_hits refs_hits query_hits query_id +#> +#> 1 2009 793068 555477 13 "REF:\"code.google.com\"" +#> 2 2010 851021 540514 40 "REF:\"code.google.com\"" +#> 3 2011 904618 603060 65 "REF:\"code.google.com\"" +#> 4 2012 945899 635448 92 "REF:\"code.google.com\"" +#> 5 2013 1003845 761512 135 "REF:\"code.google.com\"" +#> 6 2014 1055435 797039 140 "REF:\"code.google.com\"" +#> 7 2015 1095859 779562 117 "REF:\"code.google.com\"" +#> 8 2016 1116157 782630 65 "REF:\"code.google.com\"" +#> 9 2009 793068 555477 2 "REF:\"github.com\"" +#> 10 2010 851021 540514 10 "REF:\"github.com\"" +#> # … with 30 more rows ### total hits_summary <- my_df %>% @@ -122,15 +148,24 @@ hits_summary <- my_df %>% summarise(all = sum(query_hits)) %>% arrange(desc(all)) hits_summary +#> # A tibble: 5 × 2 +#> query_id all +#> +#> 1 "REF:\"cran.r-project.org\"" 8221 +#> 2 "REF:\"github.com\"" 1609 +#> 3 "REF:\"code.google.com\"" 667 +#> 4 "REF:\"sourceforge.net\"" 643 +#> 5 "REF:\"bitbucket.org\"" 94 ``` -The proportion of papers where Europe PMC was able to make the cited literature available was `r round(sum(my_df$refs_hits) / sum(my_df$all_hits), digits = 2) *100` for the period 2009-2016. There also seems to be a time-lag between indexing reference lists because the absolute number of publication was decreasing over the years. This is presumably because Europe PMC also includes delayed open access content, i.e. content which is not added immediately with the original publication.[^4] +The proportion of papers where Europe PMC was able to make the cited literature available was 70 for the period 2009-2016. There also seems to be a time-lag between indexing reference lists because the absolute number of publication was decreasing over the years. This is presumably because Europe PMC also includes delayed open access content, i.e. content which is not added immediately with the original publication.[^4] [^4]: Ebd. Now, let's make a proper review graph normalizing our query results with the number of publications with indexed references. -```{r} + +```r library(ggplot2) ggplot(my_df, aes(factor(year), query_hits / refs_hits, group = query_id, color = query_id)) + @@ -141,6 +176,8 @@ ggplot(my_df, aes(factor(year), query_hits / refs_hits, group = query_id, ylab("Proportion of articles in Europe PMC") ``` +literature links to software in europe pmc + #### Discussion and Conclusion Although this figure illustrates the relative popularity of citing code hosted by CRAN and GitHub in recent years, there are some limits that needs to be discussed. As said before, Europe PMC does not extract reference lists from every indexed publication. It furthermore remains open whether and to what extent software is cited outside the reference section, i.e. as footnote or in the acknowledgements. diff --git a/vignettes/evergreenreviewgraphs.Rmd.orig b/vignettes/evergreenreviewgraphs.Rmd.orig new file mode 100644 index 0000000..f36f6d6 --- /dev/null +++ b/vignettes/evergreenreviewgraphs.Rmd.orig @@ -0,0 +1,160 @@ +--- +title: "Making trend graphs" +author: "Najko Jahn" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteEngine{knitr::rmarkdown} + %\VignetteIndexEntry{Making trend graphs} + \usepackage[utf8]{inputenc} +--- + +```{r, echo=FALSE} +knitr::opts_chunk$set( + comment = "#>", + collapse = TRUE, + warning = FALSE, + message = FALSE, + echo = TRUE, + fig.width = 7, + fig.height = 4 +) +options(scipen = 999, digits = 2) +knitr::knit_hooks$set(inline = function(x) { + if(is.numeric(x)){ + return(prettyNum(x, big.mark=",")) + }else{ + return(x) + } + }) +``` + +Trend graphs in literature reviews show the development of concepts in scholarly communication. Some trend graphs, however, don't acknowledge that the number of scholarly publications is growing each year, but simply display the absolute number of hits they have found for a given concept. Noam Ross called these misleading graphs evergreen review graphs because of their enduring popularity in review papers. Examples can be found on Twitter under the Hashtag [#evergreenreviewgraph](https://twitter.com/hashtag/evergreenreviewgraph). + +This vignette guides you how to make proper trend graphs when reviewing Europe PMC literature. In these graphs, the number of hits found is divided by the total number of records indexed in Europe PMC for a given search query. + +## Preparing proper review graphs with `epmc_hits_trend()` + +We use `epmc_hits_trend()` function, which was firstly introduced in Maëlle Salmon's blog post about "How not to make an evergreen review graph"[^1]. The function takes a query in the Europe PMC search syntax[^2] and the period of years over which to perform the search as arguments, and returns a data-frame with year, total number of hits (`all_hits`) and number of hits for the query (`query_hits`). + + +```{r} +library(europepmc) +europepmc::epmc_hits_trend(query = "aspirin", period = 2010:2016) +``` + +By default, synonym search is disabled and only Medline/PubMed index is searched. + +[^1]: + +[^2]: Europe PMC Search Syntax: + +## Use Cases + +### Use Case: Growth of Open Access Literature + +There is a growing interest in knowing the proportion of open access to scholarly literature. Europe PMC allows searching for open access content with the [`OPEN_ACCESS:Y` parameter](https://europepmc.org/search?query=OPEN_ACCESS:Y&page=1&sortby=Relevance). At the moment, Europe PMC contains `r europepmc::epmc_hits("OPEN_ACCESS:Y")` open access full-texts. Let's see how they are relatively distributed over the period 1995 - 2016. + +```{r oa_pmc, fig.align="center", fig.path="../vignettes/", fig.alt="oa in europe pmc"} +tt_oa <- europepmc::epmc_hits_trend("OPEN_ACCESS:Y", period = 1995:2016, synonym = FALSE) +tt_oa +# we use ggplot2 for plotting the graph +library(ggplot2) +ggplot(tt_oa, aes(year, query_hits / all_hits)) + + geom_point() + + geom_line() + + xlab("Year published") + + ylab("Proportion of OA full-texts in Europe PMC") +``` + +Be careful with the interpretation of the slower growth in the last years because there are several ways how open access content is added to Europe PMC including the digitalization of back issues.[^3] + +[^3]: See section "Content Growth" in: McEntyre JR, Ananiadou S, Andrews S, et al. UKPMC: a full text article resource + for the life sciences. *Nucleic Acids Research*. 2011;39(Database):D58–D65. . + +### Use Case: Cited open source software in scholarly publications + +Another nice use case for trend graphs is to study how code and software repositories are cited in scientific literature. In recent years, it has become a good practice not only to re-use openly available software, but also to cite them. The FORCE11 Software Citation Working Group states: + +> In general, we believe that software should be cited on the same basis as any other research product such as a paper or book; that is, authors should cite the appropriate set of software products just as they cite the appropriate set of papers. [(doi:10.7717/peerj-cs.86)](https://doi.org/10.7717/peerj-cs.86) + +So let's see whether we can find evidence for this evolving practice by creating a proper review graph. As a start, we examine these four general purpose hosting services for version-controlled code: + +- [code.google.com](https://code.google.com/) +- [github.com](https://github.com/) +- [sourceforge.net](https://sourceforge.net/) +- [bitbucket.org](https://bitbucket.org/) + +and, of course, [CRAN](https://cran.r-project.org/), the R archive network. + +#### How to query Europe PMC? + +We only want to search reference lists. Because Europe PMC does not index references for its complete collection, we use `has_reflist:y` to restrict our search to those publications with reference lists. These literature sections can be searched with the `REF:` parameter. + +Let's prepare the queries for links to the above mentioned code hosting services: + +```{r} +dvcs <- c("code.google.com", "github.com", + "sourceforge.net", "bitbucket.org", "cran.r-project.org") +# make queries including reference section +dvcs_query <- paste0('REF:"', dvcs, '"') +``` + +and get publications for which Europe PMC gives access to reference lists for normalizing the review graph. + +```{r} +library(dplyr) +my_df <- purrr::map_df(dvcs_query, function(x) { + # get number of publications with indexed reference lists + refs_hits <- + europepmc::epmc_hits_trend("has_reflist:y", period = 2009:2016, synonym = FALSE)$query_hits + # get hit count querying for code repositories + europepmc::epmc_hits_trend(x, period = 2009:2016, synonym = FALSE) %>% + dplyr::mutate(query_id = x) %>% + dplyr::mutate(refs_hits = refs_hits) %>% + dplyr::select(year, all_hits, refs_hits, query_hits, query_id) +}) +my_df + +### total +hits_summary <- my_df %>% + group_by(query_id) %>% + summarise(all = sum(query_hits)) %>% + arrange(desc(all)) +hits_summary +``` + +The proportion of papers where Europe PMC was able to make the cited literature available was `r round(sum(my_df$refs_hits) / sum(my_df$all_hits), digits = 2) *100` for the period 2009-2016. There also seems to be a time-lag between indexing reference lists because the absolute number of publication was decreasing over the years. This is presumably because Europe PMC also includes delayed open access content, i.e. content which is not added immediately with the original publication.[^4] + +[^4]: Ebd. + +Now, let's make a proper review graph normalizing our query results with the number of publications with indexed references. + +```{r software_lit, fig.align="center", fig.path="../vignettes/", fig.alt="literature links to software in europe pmc"} +library(ggplot2) +ggplot(my_df, aes(factor(year), query_hits / refs_hits, group = query_id, + color = query_id)) + + geom_line(size = 1, alpha = 0.8) + + geom_point(size = 2) + + scale_color_brewer(name = "Query", palette = "Set1")+ + xlab("Year published") + + ylab("Proportion of articles in Europe PMC") +``` + +#### Discussion and Conclusion + +Although this figure illustrates the relative popularity of citing code hosted by CRAN and GitHub in recent years, there are some limits that needs to be discussed. As said before, Europe PMC does not extract reference lists from every indexed publication. It furthermore remains open whether and to what extent software is cited outside the reference section, i.e. as footnote or in the acknowledgements. + +Another problem of our query approach is that we did not consider that DOIs can also be used to cite software, a best-practice implemented by [Zenodo and GitHub](https://guides.github.com/activities/citable-code/) or the [The Journal of Open Source Software](https://joss.theoj.org/). + +Lastly, it actually remains unclear, which and what kind of software is cited how often. We could also not control if authors just cited the homepages and not a particular source code repository. One paper can also cite more than one code repository, which is also not represented in the trend graph. + +To conclude, a proper trend graph on the extent of software citation can only be the start for a more sophisticated approach that mines links to software repositories from scientific literature and fetches metadata about these code repositories from the hosting facilities. + +## Conclusion + +This vignette presented first steps on how to make trend graphs with `europepmc`. As our use-cases suggest, please carefully consider how you queried Europe PMC in the interpretation of your graph. Although trend graphs are a nice way to illustrate the development of certain concepts in scientific literature or recent trends in scholarly communication, they must be put in context in order to become meaningful. + +## Acknowledgements + +Big thanks to Maëlle Salmon for getting me started to write this vignette. diff --git a/vignettes/introducing-europepmc.Rmd b/vignettes/introducing-europepmc.Rmd index 0fbef96..55d373c 100644 --- a/vignettes/introducing-europepmc.Rmd +++ b/vignettes/introducing-europepmc.Rmd @@ -1,7 +1,7 @@ --- -title: "Introducing europepmc, an R interface to Europe PMC RESTful API" +title: "Overview" author: "Najko Jahn" -date: "`r Sys.Date()`" +date: "2021-08-24" output: rmarkdown::html_vignette vignette: > %\VignetteEngine{knitr::rmarkdown} @@ -10,14 +10,7 @@ vignette: > --- -```{r echo=FALSE} -knitr::opts_chunk$set( - comment = "#>", - collapse = TRUE, - warning = FALSE, - message = FALSE -) -``` + ## What is searched? @@ -40,21 +33,84 @@ In the following, some examples demonstrate how to search Europe PMC with R. `empc_search()` is the main function to query Europe PMC. It searches both metadata and fulltexts. -```{r} + +```r library(europepmc) europepmc::epmc_search('malaria') +#> # A tibble: 100 × 29 +#> id source pmid doi title authorString journalTitle issue journalVolume +#> +#> 1 34100426 MED 34100426 10.4… New … Lima MN, Ba… Neural Rege… 1 17 +#> 2 33341138 MED 33341138 10.1… Trip… Wang J, Xu … Lancet 10267 396 +#> 3 33341139 MED 33341139 10.1… Trip… van der Plu… Lancet 10267 396 +#> 4 33535760 MED 33535760 10.3… THE … Damiani E, … Acta Med Hi… 2 18 +#> 5 33530764 MED 33530764 10.1… Disc… Hoarau M, V… J Enzyme In… 1 36 +#> 6 33372863 MED 33372863 10.1… ATP2… Lamy A, Mac… Emerg Micro… 1 10 +#> 7 33594960 MED 33594960 10.1… Mana… Kambale-Kom… Hematology 1 26 +#> 8 34283002 MED 34283002 10.1… P… Alhassan AM… Pharm Biol 1 59 +#> 9 34184352 MED 34184352 10.1… Stru… Chhibber-Go… Protein Sci 9 30 +#> 10 34419123 MED 34419123 10.1… Burd… Dao F, Djon… Parasit Vec… 1 14 +#> # … with 90 more rows, and 20 more variables: pubYear , journalIssn , +#> # pageInfo , pubType , isOpenAccess , inEPMC , +#> # inPMC , hasPDF , hasBook , hasSuppl , +#> # citedByCount , hasReferences , hasTextMinedTerms , +#> # hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate , pmcid , versionNumber ``` It is worth noting that Europe PMC expands queries with MeSH synonyms by default, a behavior which can be turned off with the `synonym` parameter. -```{r} + +```r europepmc::epmc_search('malaria', synonym = FALSE) +#> # A tibble: 100 × 29 +#> id source pmid doi title authorString journalTitle issue journalVolume +#> +#> 1 33341139 MED 33341139 10.1… Trip… van der Plu… Lancet 10267 396 +#> 2 33341138 MED 33341138 10.1… Trip… Wang J, Xu … Lancet 10267 396 +#> 3 34100426 MED 34100426 10.4… New … Lima MN, Ba… Neural Rege… 1 17 +#> 4 34184352 MED 34184352 10.1… Stru… Chhibber-Go… Protein Sci 9 30 +#> 5 34380494 MED 34380494 10.1… Publ… Heuschen AK… Malar J 1 20 +#> 6 33530764 MED 33530764 10.1… Disc… Hoarau M, V… J Enzyme In… 1 36 +#> 7 34399767 MED 34399767 10.1… Inve… Njau J, Sil… Malar J 1 20 +#> 8 PPR385006 PPR 10.2… Temp… Ingholt MM,… +#> 9 34419123 MED 34419123 10.1… Burd… Dao F, Djon… Parasit Vec… 1 14 +#> 10 34376219 MED 34376219 10.1… An a… Wanzira H, … BMC Health … 1 21 +#> # … with 90 more rows, and 20 more variables: pubYear , journalIssn , +#> # pageInfo , pubType , isOpenAccess , inEPMC , +#> # inPMC , hasPDF , hasBook , hasSuppl , +#> # citedByCount , hasReferences , hasTextMinedTerms , +#> # hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate , pmcid , versionNumber ``` To get an exact match, use quotes as in the following example: -```{r} + +```r europepmc::epmc_search('"Human malaria parasites"') +#> # A tibble: 100 × 29 +#> id source pmid doi title authorString journalTitle pubYear journalIssn +#> +#> 1 34415329 MED 34415329 10.1… Func… Kimata-Arig… J Biochem 2021 "0021-924x… +#> 2 34087264 MED 34087264 10.1… Dive… Goh XT, Lim… Mol Biochem… 2021 "0166-6851… +#> 3 34400833 MED 34400833 10.1… A he… Tintó-Font … Nat Microbi… 2021 "2058-5276" +#> 4 33789941 MED 33789941 10.1… Addi… Kwon H, Sim… mSphere 2021 "2379-5042" +#> 5 34211355 MED 34211355 An E… Clark NF, T… Yale J Biol… 2021 "0044-0086… +#> 6 34362867 MED 34362867 10.4… High… Lai MY, Raf… Trop Biomed 2021 "0127-5720… +#> 7 33693917 MED 33693917 10.1… Non-… Antinori S,… J Travel Med 2021 "1195-1982… +#> 8 32470136 MED 32470136 10.1… C-te… Kimata-Arig… J Biochem 2020 "0021-924x… +#> 9 PPR353209 PPR 10.1… 5-me… Liu M, Guo … 2021 +#> 10 33797521 MED 33797521 10.4… Comp… Mat Salleh … Trop Biomed 2021 "0127-5720… +#> # … with 90 more rows, and 20 more variables: pubType , +#> # isOpenAccess , inEPMC , inPMC , hasPDF , hasBook , +#> # hasSuppl , citedByCount , hasReferences , +#> # hasTextMinedTerms , hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate , journalVolume , pageInfo , +#> # issue , pmcid , versionNumber ``` ### Managing search results @@ -62,8 +118,29 @@ europepmc::epmc_search('"Human malaria parasites"') By default, 100 records are returned, but the number of results can be expanded or limited with the `limit` parameter. -```{r} + +```r europepmc::epmc_search('"Human malaria parasites"', limit = 10) +#> # A tibble: 10 × 28 +#> id source pmid doi title authorString journalTitle pubYear journalIssn +#> +#> 1 34415329 MED 34415329 10.1… Func… Kimata-Arig… J Biochem 2021 "0021-924x… +#> 2 34087264 MED 34087264 10.1… Dive… Goh XT, Lim… Mol Biochem… 2021 "0166-6851… +#> 3 34400833 MED 34400833 10.1… A he… Tintó-Font … Nat Microbi… 2021 "2058-5276" +#> 4 33789941 MED 33789941 10.1… Addi… Kwon H, Sim… mSphere 2021 "2379-5042" +#> 5 34211355 MED 34211355 An E… Clark NF, T… Yale J Biol… 2021 "0044-0086… +#> 6 34362867 MED 34362867 10.4… High… Lai MY, Raf… Trop Biomed 2021 "0127-5720… +#> 7 33693917 MED 33693917 10.1… Non-… Antinori S,… J Travel Med 2021 "1195-1982… +#> 8 32470136 MED 32470136 10.1… C-te… Kimata-Arig… J Biochem 2020 "0021-924x… +#> 9 PPR353209 PPR 10.1… 5-me… Liu M, Guo … 2021 +#> 10 33797521 MED 33797521 10.4… Comp… Mat Salleh … Trop Biomed 2021 "0127-5720… +#> # … with 19 more variables: pubType , isOpenAccess , inEPMC , +#> # inPMC , hasPDF , hasBook , hasSuppl , +#> # citedByCount , hasReferences , hasTextMinedTerms , +#> # hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate , journalVolume , pageInfo , +#> # issue , pmcid ``` Results are sorted by relevance. Other options via the `sort` parameter are @@ -75,7 +152,8 @@ Results are sorted by relevance. Other options via the `sort` parameter are Sometimes, you would like to check, if articles are indexed in Europe PMC using DOI names, a widely used identifier for scholarly articles. Use `epmc_search_by_doi()` for this purpose. -```{r} + +```r my_dois <- c( "10.1159/000479962", "10.1002/sctm.17-0081", @@ -83,6 +161,19 @@ my_dois <- c( "10.1007/s12017-017-8447-9" ) europepmc::epmc_search_by_doi(doi = my_dois) +#> # A tibble: 4 × 28 +#> id source pmid doi title authorString journalTitle issue journalVolume +#> +#> 1 28957815 MED 28957815 10.1… Clin… Schnieder M… Eur Neurol 5-6 78 +#> 2 28941317 MED 28941317 10.1… Conc… Doeppner TR… Stem Cells … 11 6 +#> 3 29018132 MED 29018132 10.1… One-… Psychogios … Stroke 11 48 +#> 4 28623611 MED 28623611 10.1… Defe… Carboni E, … Neuromolecu… 2-3 19 +#> # … with 19 more variables: pubYear , journalIssn , pageInfo , +#> # pubType , isOpenAccess , inEPMC , inPMC , hasPDF , +#> # hasBook , hasSuppl , citedByCount , hasReferences , +#> # hasTextMinedTerms , hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate , pmcid ``` ### Output options @@ -98,21 +189,83 @@ Please be aware that these lists can become very large. Use the Europe PMC query syntax to search by author names: -```{r} + +```r europepmc::epmc_search('AUTH:"Salmon Maelle"') +#> # A tibble: 10 × 28 +#> id source pmid doi title authorString journalTitle issue journalVolume +#> +#> 1 30378432 MED 30378432 10.1… When… Milà C, Sal… Environ Sci… 22 52 +#> 2 29778830 MED 29778830 10.1… Wear… Salmon M, M… Environ Int 117 +#> 3 29751338 MED 29751338 10.1… Use … Kumar MK, S… Environ Pol… 239 +#> 4 29330030 MED 29330030 10.1… Heal… Mueller N, … Prev Med 109 +#> 5 29626773 MED 29626773 10.1… Deve… Sanchez M, … Sci Total E… 634 +#> 6 29088243 MED 29088243 10.1… Time… Schumacher … PLoS One 10 12 +#> 7 28606699 MED 28606699 10.1… Inte… Tonne C, Sa… Int J Hyg E… 6 220 +#> 8 28708095 MED 28708095 10.3… Pred… Sanchez M, … Int J Envir… 7 14 +#> 9 27063588 MED 27063588 10.2… A sy… Salmon M, S… Euro Survei… 13 21 +#> 10 26250543 MED 26250543 10.1… Baye… Salmon M, S… Biom J 6 57 +#> # … with 19 more variables: pubYear , journalIssn , pageInfo , +#> # pubType , isOpenAccess , inEPMC , inPMC , hasPDF , +#> # hasBook , hasSuppl , citedByCount , hasReferences , +#> # hasTextMinedTerms , hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate , pmcid ``` [Europe PMC Advanced Search](https://europepmc.org/advancesearch) has a auto-suggest field for author names if you feel unsure how the name you are searching for is indexed in Europe PMC. Using the Boolean `OR` operator allows searching for more than one spelling variant: -```{r} + +```r q <- 'AUTH:"PÜHLER Alfred" OR AUTH:"Pühler Alfred Prof. Dr." OR AUTH:"Puhler A"' europepmc::epmc_search(q, limit = 1000) +#> # A tibble: 590 × 29 +#> id source pmid pmcid doi title authorString journalTitle journalVolume +#> +#> 1 34367203 MED 34367203 PMC8… 10.3… ExoS… Geiger O, S… Front Plant… 12 +#> 2 34361893 MED 34361893 PMC8… 10.3… Indi… Hassa J, Kl… Microorgani… 9 +#> 3 34040261 MED 34040261 PMC8… 10.1… Swar… Warnat-Herr… Nature 594 +#> 4 33589928 MED 33589928 10.1… Impl… Mayer G, Mü… Brief Bioin… +#> 5 33643369 MED 33643369 PMC7… 10.3… Exop… Castellani … Front Plant… 12 +#> 6 33441124 MED 33441124 PMC7… 10.1… Dise… Aschenbrenn… Genome Med 13 +#> 7 PPR264825 PPR 10.2… The … Droste J, O… +#> 8 33220679 MED 33220679 10.1… Glob… Nilsson JF,… FEMS Microb… 97 +#> 9 33348776 MED 33348776 PMC7… 10.3… The … Maus I, Tub… Microorgani… 8 +#> 10 33296687 MED 33296687 PMC7… 10.1… Long… Bernardes J… Immunity 53 +#> # … with 580 more rows, and 20 more variables: pubYear , +#> # journalIssn , pageInfo , pubType , isOpenAccess , +#> # inEPMC , inPMC , hasPDF , hasBook , hasSuppl , +#> # citedByCount , hasReferences , hasTextMinedTerms , +#> # hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate , issue , versionNumber ``` There is a considerable overlap between common names. The integration of ORCID, a persistent author identifier, allows unambiguous search for personal publications in Europe PMC. For example, here's how to search for publications written by Bernd Weisshaar (ORCID: ) sorted by the number of times cited in descending order: -```{r} + +```r europepmc::epmc_search('AUTHORID:"0000-0002-7635-3473"', limit = 200, sort = "cited") +#> # A tibble: 150 × 28 +#> id source pmid doi title authorString journalTitle issue journalVolume +#> +#> 1 21873998 MED 21873998 10.1… The … Wang X, Wan… Nat Genet 10 43 +#> 2 20674465 MED 20674465 10.1… MYB … Dubos C, St… Trends Plan… 10 15 +#> 3 11597504 MED 11597504 10.1… The … Stracke R, … Curr Opin P… 5 4 +#> 4 11906833 MED 11906833 10.1… bZIP… Jakoby M, W… Trends Plan… 3 7 +#> 5 14756321 MED 14756321 10.1… An A… Rosso MG, L… Plant Mol B… 1-2 53 +#> 6 12679534 MED 12679534 10.1… The … Heim MA, Ja… Mol Biol Ev… 5 20 +#> 7 11080161 MED 11080161 10.1… Tran… Jin H, Comi… EMBO J 22 19 +#> 8 15361138 MED 15361138 10.1… Comp… Zimmermann … Plant J 1 40 +#> 9 15255866 MED 15255866 10.1… TT2,… Baudry A, H… Plant J 3 39 +#> 10 17419845 MED 17419845 10.1… Diff… Stracke R, … Plant J 4 50 +#> # … with 140 more rows, and 19 more variables: pubYear , +#> # journalIssn , pageInfo , pubType , isOpenAccess , +#> # inEPMC , inPMC , hasPDF , hasBook , hasSuppl , +#> # citedByCount , hasReferences , hasTextMinedTerms , +#> # hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate , pmcid ``` #### Annotations @@ -121,21 +274,78 @@ Europe PMC provides text-mined annotations contained in abstracts and open acces These automatically identified concepts and term can be retrieved at the article-level: -```{r} + +```r europepmc::epmc_annotations_by_id(c("MED:28585529", "PMC:PMC1664601")) +#> # A tibble: 774 × 13 +#> source ext_id pmcid prefix exact postfix name uri id type section +#> +#> 1 MED 28585529 PMC5467160 "tive… Beta… " allo… Beta… http… http… Clin… Title … +#> 2 MED 28585529 PMC5467160 "nomi… genes ".\nRa… gene http… http… Sequ… Title … +#> 3 MED 28585529 PMC5467160 "nomi… genes " is o… gene http… http… Sequ… Abstra… +#> 4 MED 28585529 PMC5467160 " One… genes " are … gene http… http… Sequ… Abstra… +#> 5 MED 28585529 PMC5467160 " ide… beet " (Bet… Beta… http… http… Clin… Abstra… +#> 6 MED 28585529 PMC5467160 "ify … Beta… " ssp.… Beta… http… http… Clin… Abstra… +#> 7 MED 28585529 PMC5467160 "ulga… gene " Rz2 … gene http… http… Sequ… Abstra… +#> 8 MED 28585529 PMC5467160 "e ge… geno… " sequ… geno… http… http… Sequ… Abstra… +#> 9 MED 28585529 PMC5467160 "eque… beet ". Our… Beta… http… http… Clin… Abstra… +#> 10 MED 28585529 PMC5467160 "disc… genes " rele… gene http… http… Sequ… Abstra… +#> # … with 764 more rows, and 2 more variables: provider , subType ``` To obtain a list of articles where Europe PMC has text-minded annotations, either subset the resulting data.frame -```{r} + +```r tt <- epmc_search("malaria") tt[tt$hasTextMinedTerms == "Y" | tt$hasTMAccessionNumbers == "Y",] +#> # A tibble: 94 × 29 +#> id source pmid doi title authorString journalTitle issue journalVolume +#> +#> 1 34100426 MED 34100426 10.4… New … Lima MN, Ba… Neural Rege… 1 17 +#> 2 33535760 MED 33535760 10.3… THE … Damiani E, … Acta Med Hi… 2 18 +#> 3 33530764 MED 33530764 10.1… Disc… Hoarau M, V… J Enzyme In… 1 36 +#> 4 33372863 MED 33372863 10.1… ATP2… Lamy A, Mac… Emerg Micro… 1 10 +#> 5 33594960 MED 33594960 10.1… Mana… Kambale-Kom… Hematology 1 26 +#> 6 34283002 MED 34283002 10.1… P… Alhassan AM… Pharm Biol 1 59 +#> 7 34184352 MED 34184352 10.1… Stru… Chhibber-Go… Protein Sci 9 30 +#> 8 34362867 MED 34362867 10.4… High… Lai MY, Raf… Trop Biomed 3 38 +#> 9 34399767 MED 34399767 10.1… Inve… Njau J, Sil… Malar J 1 20 +#> 10 PPR385006 PPR 10.2… Temp… Ingholt MM,… +#> # … with 84 more rows, and 20 more variables: pubYear , journalIssn , +#> # pageInfo , pubType , isOpenAccess , inEPMC , +#> # inPMC , hasPDF , hasBook , hasSuppl , +#> # citedByCount , hasReferences , hasTextMinedTerms , +#> # hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate , pmcid , versionNumber ``` or expand the query choosing an annotation type or provider from the [Europe PMC Advanced Search](https://europepmc.org/advancesearch) query builder. -```{r} + +```r epmc_search('malaria AND (ANNOTATION_TYPE:"Cell") AND (ANNOTATION_PROVIDER:"Europe PMC")') +#> # A tibble: 100 × 28 +#> id source pmid pmcid doi title authorString journalTitle issue +#> +#> 1 31782768 MED 31782768 PMC79… 10.1… Incre… Jongo SA, Ch… Clin Infect… 11 +#> 2 31808816 MED 31808816 PMC76… 10.1… Retin… Villaverde C… J Pediatric… 5 +#> 3 30989220 MED 30989220 PMC73… 10.1… Clini… Enane LA, Su… J Pediatric… 3 +#> 4 31300826 MED 31300826 PMC72… 10.1… Black… Opoka RO, Wa… Clin Infect… 11 +#> 5 31807752 MED 31807752 10.1… Malar… Marcombe S, … J Med Entom… 3 +#> 6 31505001 MED 31505001 10.1… Acute… Oshomah-Bell… J Trop Pedi… 2 +#> 7 31687768 MED 31687768 10.1… Evalu… Ferdinand DY… Trans R Soc… 3 +#> 8 31693130 MED 31693130 PMC71… 10.1… Reduc… Kingston HWF… J Infect Dis 9 +#> 9 31679146 MED 31679146 10.1… A Sys… Thiengsusuk … Eur J Drug … 2 +#> 10 30852586 MED 30852586 10.1… An Ex… Woodford J, … J Infect Dis 6 +#> # … with 90 more rows, and 19 more variables: journalVolume , +#> # pubYear , journalIssn , pageInfo , pubType , +#> # isOpenAccess , inEPMC , inPMC , hasPDF , hasBook , +#> # hasSuppl , citedByCount , hasReferences , +#> # hasTextMinedTerms , hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate ``` #### Data integrations @@ -143,8 +353,29 @@ epmc_search('malaria AND (ANNOTATION_TYPE:"Cell") AND (ANNOTATION_PROVIDER:"Euro Another nice feature of Europe PMC is to search for cross-references between Europe PMC to other databases. For instance, to get publications cited by entries in the [Protein Data bank in Europe](https://www.ebi.ac.uk/pdbe/node/1) published 2016: -```{r} + +```r europepmc::epmc_search('(HAS_PDB:y) AND FIRST_PDATE:2016') +#> # A tibble: 100 × 28 +#> id source pmid pmcid doi title authorString journalTitle issue +#> +#> 1 27989121 MED 27989121 PMC58… 10.1… Short… Lin J, Pozha… Biochemistry 2 +#> 2 27815281 MED 27815281 PMC52… 10.1… Struc… Wakamatsu T,… Appl Enviro… 2 +#> 3 28035004 MED 28035004 PMC53… 10.1… Struc… Waz S, Nakam… J Biol Chem 7 +#> 4 28030602 MED 28030602 PMC51… 10.1… Struc… Christensen … PLoS One 12 +#> 5 28066558 MED 28066558 PMC51… 10.1… Struc… Gai Z, Wang … Cell Discov +#> 6 28024149 MED 28024149 PMC53… 10.1… Cryst… Kuk AC, Mash… Nat Struct … 2 +#> 7 28031486 MED 28031486 PMC52… 10.1… Struc… Sevrioukova … Proc Natl A… 3 +#> 8 28011634 MED 28011634 PMC53… 10.1… Struc… Levdikov VM,… J Biol Chem 7 +#> 9 28009010 MED 28009010 PMC51… 10.1… Struc… Zhao H, Wei … Sci Rep +#> 10 28197319 MED 28197319 PMC53… 10.1… Struc… Johannes JW,… ACS Med Che… 2 +#> # … with 90 more rows, and 19 more variables: journalVolume , +#> # pubYear , journalIssn , pageInfo , pubType , +#> # isOpenAccess , inEPMC , inPMC , hasPDF , hasBook , +#> # hasSuppl , citedByCount , hasReferences , +#> # hasTextMinedTerms , hasDbCrossReferences , hasLabsLinks , +#> # hasTMAccessionNumbers , firstIndexDate , +#> # firstPublicationDate ``` The following sources are supported @@ -165,14 +396,48 @@ To retrieve metadata about these external database links, use `europepmc_epmc_db Europe PMC let us also obtain citation metadata and reference sections. For retrieving citation metadata per article, use -```{r} + +```r europepmc::epmc_citations("9338777", limit = 500) +#> # A tibble: 233 × 11 +#> id source citationType title authorString journalAbbrevia… pubYear volume +#> +#> 1 33353… MED review-arti… Xeno… Galow AM, G… Int J Mol Sci 2020 21 +#> 2 31565… MED research-ar… Regu… Chung HC, N… J Vet Sci 2019 20 +#> 3 30230… MED research su… Bioe… Legallais C… Adv Healthc Mat… 2018 7 +#> 4 30264… MED research su… Porc… Fiebig U, F… Xenotransplanta… 2018 25 +#> 5 29756… MED historical … Infe… Weiss RA. Xenotransplanta… 2018 25 +#> 6 29642… MED research su… Trac… Kawasaki J,… Viruses 2018 10 +#> 7 28768… MED research su… Pres… Kawasaki J,… J Virol 2017 91 +#> 8 28437… MED research su… Thre… Colon-Moran… Virology 2017 507 +#> 9 28054… MED research su… Anti… Inoue Y, Yo… Ann Biomed Eng 2017 45 +#> 10 27832… MED research-ar… Tran… Kim N, Choi… PLoS One 2016 11 +#> # … with 223 more rows, and 3 more variables: issue , citedByCount , +#> # pageInfo ``` For reference section from an article: -```{r} + +```r europepmc::epmc_refs("28632490", limit = 200) +#> # A tibble: 169 × 19 +#> id source citationType title authorString journalAbbrevia… issue pubYear +#> +#> 1 12002480 MED JOURNAL ART… Tric… Adolfsson-E… Chemosphere 9-10 2002 +#> 2 18795164 MED JOURNAL ART… In v… Ahn KC, Zha… Environ Health … 9 2008 +#> 3 18556606 MED JOURNAL ART… Effe… Aiello AE, … Am J Public Hea… 8 2008 +#> 4 17683018 MED JOURNAL ART… Cons… Aiello AE, … Clin Infect Dis 2007 +#> 5 15273108 MED JOURNAL ART… Rela… Aiello AE, … Antimicrob Agen… 8 2004 +#> 6 18207219 MED JOURNAL ART… The … Allmyr M, H… Sci Total Envir… 1 2008 +#> 7 17007908 MED JOURNAL ART… Tric… Allmyr M, A… Sci Total Envir… 1 2006 +#> 8 26948762 MED JOURNAL ART… Pres… Alvarez-Riv… J Chromatogr A 2016 +#> 9 23192912 MED JOURNAL ART… Expo… Anderson SE… Toxicol Sci 1 2012 +#> 10 25837385 MED JOURNAL ART… Obse… Vladar EK, … Methods Cell Bi… 2015 +#> # … with 159 more rows, and 11 more variables: volume , pageInfo , +#> # citedOrder , match , essn , issn , +#> # publicationTitle , publisherLoc , publisherName , +#> # externalLink , doi ``` #### Fulltext access @@ -181,7 +446,13 @@ Europe PMC gives not only access to metadata, but also to full-texts. Adding `AN Fulltext as xml document can accessed via the PMID or the PubMed Central ID (PMCID): -```{r} + +```r europepmc::epmc_ftxt("PMC3257301") +#> {xml_document} +#>
+#> [1] \n \n PLoS ... +#> [2] \n \n Introduction\n

Atmosphe ... +#> [3] \n \n

We would like to thank Dr. C. Gourlay and Dr. T. ... ``` diff --git a/vignettes/introducing-europepmc.Rmd.orig b/vignettes/introducing-europepmc.Rmd.orig new file mode 100644 index 0000000..035f698 --- /dev/null +++ b/vignettes/introducing-europepmc.Rmd.orig @@ -0,0 +1,187 @@ +--- +title: "Overview" +author: "Najko Jahn" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteEngine{knitr::rmarkdown} + %\VignetteIndexEntry{Overview} + \usepackage[utf8]{inputenc} +--- + + +```{r echo=FALSE} +knitr::opts_chunk$set( + comment = "#>", + collapse = TRUE, + warning = FALSE, + message = FALSE +) +``` + +## What is searched? + +[Europe PMC](https://europepmc.org/) is a repository of life science literature. Europe PMC ingests all PubMed content and extends its index with other literature and patent sources. + +For more background on Europe PMC, see: + + + +Levchenko, M., Gou, Y., Graef, F., Hamelers, A., Huang, Z., Ide-Smith, M., … McEntyre, J. (2017). Europe PMC in 2017. Nucleic Acids Research, 46(D1), D1254–D1260. + +## How to search Europe PMC with R? + +This client supports the [Europe PMC search syntax](https://europepmc.org/Help#SSR). If you are unfamiliar with searching Europe PMC, check out the [Europe PMC query builder](https://europepmc.org/advancesearch), a very nice tool that helps you to build queries. To make use of Europe PMC queries in R, copy & paste the search string to the search functions of this package. + +In the following, some examples demonstrate how to search Europe PMC with R. + +### Free search + +`empc_search()` is the main function to query Europe PMC. It searches both metadata and fulltexts. + + +```{r} +library(europepmc) +europepmc::epmc_search('malaria') +``` + +It is worth noting that Europe PMC expands queries with MeSH synonyms by default, a behavior which can be turned off with the `synonym` parameter. + +```{r} +europepmc::epmc_search('malaria', synonym = FALSE) +``` + +To get an exact match, use quotes as in the following example: + +```{r} +europepmc::epmc_search('"Human malaria parasites"') +``` + +### Managing search results + +By default, 100 records are returned, but the number of results can be expanded or limited with the `limit` parameter. + + +```{r} +europepmc::epmc_search('"Human malaria parasites"', limit = 10) +``` + +Results are sorted by relevance. Other options via the `sort` parameter are + +- `sort = 'cited'` by the number of citation, descending from the most cited publication +- `sort = 'date'` by date published starting with the most recent publication + +### Search by DOIs + +Sometimes, you would like to check, if articles are indexed in Europe PMC using DOI names, a widely used identifier for scholarly articles. Use `epmc_search_by_doi()` for this purpose. + +```{r} +my_dois <- c( + "10.1159/000479962", + "10.1002/sctm.17-0081", + "10.1161/strokeaha.117.018077", + "10.1007/s12017-017-8447-9" + ) +europepmc::epmc_search_by_doi(doi = my_dois) +``` + +### Output options + +By default, a non-nested data frame printed as tibble is returned. +Other formats are `output = "id_list"` returning a list of IDs and sources, +and output = "'raw'"" for getting full metadata as list. +Please be aware that these lists can become very large. + +### More advanced options to search Europe PMC + +#### Author search + +Use the Europe PMC query syntax to search by author names: + +```{r} +europepmc::epmc_search('AUTH:"Salmon Maelle"') +``` + +[Europe PMC Advanced Search](https://europepmc.org/advancesearch) has a auto-suggest field for author names if you feel unsure how the name you are searching for is indexed in Europe PMC. Using the Boolean `OR` operator allows searching for more than one spelling variant: + +```{r} +q <- 'AUTH:"PÜHLER Alfred" OR AUTH:"Pühler Alfred Prof. Dr." OR AUTH:"Puhler A"' +europepmc::epmc_search(q, limit = 1000) +``` + +There is a considerable overlap between common names. The integration of ORCID, a persistent author identifier, allows unambiguous search for personal publications in Europe PMC. For example, here's how to search for publications written by Bernd Weisshaar (ORCID: ) sorted by the number of times cited in descending order: + +```{r} +europepmc::epmc_search('AUTHORID:"0000-0002-7635-3473"', limit = 200, sort = "cited") +``` + +#### Annotations + +Europe PMC provides text-mined annotations contained in abstracts and open access full-text articles. + +These automatically identified concepts and term can be retrieved at the article-level: + +```{r} +europepmc::epmc_annotations_by_id(c("MED:28585529", "PMC:PMC1664601")) +``` + +To obtain a list of articles where Europe PMC has text-minded annotations, either subset the resulting data.frame + +```{r} +tt <- epmc_search("malaria") +tt[tt$hasTextMinedTerms == "Y" | tt$hasTMAccessionNumbers == "Y",] +``` + +or expand the query choosing an annotation type or provider from the [Europe PMC Advanced Search](https://europepmc.org/advancesearch) query builder. + +```{r} +epmc_search('malaria AND (ANNOTATION_TYPE:"Cell") AND (ANNOTATION_PROVIDER:"Europe PMC")') +``` + +#### Data integrations + +Another nice feature of Europe PMC is to search for cross-references between Europe PMC to other databases. For instance, to get publications cited by +entries in the [Protein Data bank in Europe](https://www.ebi.ac.uk/pdbe/node/1) published 2016: + +```{r} +europepmc::epmc_search('(HAS_PDB:y) AND FIRST_PDATE:2016') +``` + +The following sources are supported + +- **CHEBI** a database and ontology of chemical entities of biological interest +- **CHEMBL** a database of bioactive drug-like small molecules +- **EMBL** now ENA, provides a comprehensive record of the world's nucleotide sequencing information +- **INTACT** provides a freely available, open source database system and analysis tools for molecular interaction data +- **INTERPRO** provides functional analysis of proteins by classifying them into families and predicting domains and important sites +- **OMIM** a comprehensive and authoritative compendium of human genes and genetic phenotypes +- **PDB** European resource for the collection, organisation and dissemination of data on biological macromolecular structures +- **UNIPROT** comprehensive and freely accessible resource of protein sequence and functional information +- **PRIDE** PRIDE Archive - proteomics data repository + +To retrieve metadata about these external database links, use `europepmc_epmc_db()`. + +#### Citations and reference sections + +Europe PMC let us also obtain citation metadata and reference sections. For retrieving citation metadata per article, use + +```{r} +europepmc::epmc_citations("9338777", limit = 500) +``` + +For reference section from an article: + +```{r} +europepmc::epmc_refs("28632490", limit = 200) +``` + +#### Fulltext access + +Europe PMC gives not only access to metadata, but also to full-texts. Adding `AND (OPEN_ACCESS:y)` to your search query, returns only those articles where Europe PMC has also the fulltext. + +Fulltext as xml document can accessed via the PMID or the PubMed Central ID (PMCID): + +```{r} +europepmc::epmc_ftxt("PMC3257301") +``` + diff --git a/vignettes/oa_pmc-1.png b/vignettes/oa_pmc-1.png new file mode 100644 index 0000000..7258e91 Binary files /dev/null and b/vignettes/oa_pmc-1.png differ diff --git a/vignettes/software_lit-1.png b/vignettes/software_lit-1.png new file mode 100644 index 0000000..fe894f0 Binary files /dev/null and b/vignettes/software_lit-1.png differ