diff --git a/README.Rmd b/README.Rmd index 536ae43..9fa7a28 100644 --- a/README.Rmd +++ b/README.Rmd @@ -23,7 +23,7 @@ knitr::opts_chunk$set( * Crossref's API issue tracker: https://gitlab.com/crossref/issues * Crossref metadata search API: https://www.crossref.org/labs/crossref-metadata-search/ * Crossref DOI Content Negotiation: https://citation.crosscite.org/docs.html -* Crossref Text and Data Mining (TDM) Services: https://tdmsupport.crossref.org/ +* Crossref Text and Data Mining (TDM) Services: https://www.crossref.org/education/retrieve-metadata/rest-api/text-and-data-mining/ ## Installation @@ -45,7 +45,7 @@ Load `rcrossref` library('rcrossref') ``` -### Register for the Polite Pool +## Register for the Polite Pool If you are intending to access Crossref regularly you will want to send your email address with your queries. This has the advantage that queries are placed in the polite pool of servers. Including your email address is good practice as described in the Crossref documentation under Good manners (https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service). The second advantage is that Crossref can contact you if there is a problem with a query. @@ -59,138 +59,9 @@ Save the file and restart your R session To stop sharing your email when using rcrossref simply delete it from your .Renviron file. -## Citation search - -Use CrossRef's DOI Content Negotiation (https://citation.crosscite.org/docs.html) service, where you can get citations back in various formats, including `apa` - -```{r} -cr_cn(dois = "10.1126/science.169.3946.635", format = "text", style = "apa") -``` - -`bibtex` - -```{r} -cat(cr_cn(dois = "10.1126/science.169.3946.635", format = "bibtex")) -``` - -## Citation count - -Citation count, using OpenURL - -```{r} -cr_citation_count(doi = "10.1371/journal.pone.0042793") -``` - -## Search Crossref metadata API - -The following functions all use the CrossRef API https://github.com/CrossRef/rest-api-doc#readme - -### Look up funder information - -```{r} -cr_funders(query = "NSF") -``` - -### Check the DOI minting agency - -```{r} -cr_agency(dois = '10.13039/100000001') -``` - -### Search works (i.e., articles) - -```{r} -cr_works(filter = c(has_orcid = TRUE, from_pub_date = '2004-04-04'), limit = 1) -``` - -### Search journals - -```{r} -cr_journals(issn = c('1803-2427','2326-4225')) -``` - -### Search license information - -```{r} -cr_licenses(query = 'elsevier') -``` - -### Search based on DOI prefixes - -```{r} -cr_prefixes(prefixes = c('10.1016','10.1371','10.1023','10.4176','10.1093')) -``` - -### Search CrossRef members - -```{r} -cr_members(query = 'ecology', limit = 5) -``` - -### Get N random DOIs - -`cr_r()` uses the function `cr_works()` internally. - -```{r} -cr_r() -``` - -You can pass in the number of DOIs you want back (default is 10) - -```{r} -cr_r(2) -``` - -## Get full text - -Publishers can optionally provide links in the metadata they provide to Crossref for full text of the work, but that data is often missing. Find out more about it at https://support.crossref.org/hc/en-us/articles/215750183-Crossref-Text-and-Data-Mining-Services - -Get some DOIs for articles that provide full text, and that have `CC-BY 3.0` licenses (i.e., more likely to actually be open) - -```{r} -out <- - cr_works(filter = list(has_full_text = TRUE, - license_url = "http://creativecommons.org/licenses/by/3.0/")) -(dois <- out$data$doi) -``` - -From the output of `cr_works` we can get full text links if we know where to look: - -```{r} -do.call("rbind", out$data$link) -``` - -From there, you can grab your full text, but because most links require -authentication, enter another package: `crminer`. - -You'll need package `crminer` for the rest of the work. - -Onc we have DOIs, get URLs to full text content - -```{r eval=FALSE} -if (!requireNamespace("crminer")) { - install.packages("crminer") -} -``` - -```{r} -library(crminer) -(links <- crm_links("10.1155/2014/128505")) -``` - -Then use those URLs to get full text - -```{r eval=FALSE} -crm_pdf(links) -#> /Users/sckott/Library/Caches/R/crminer/128505.pdf -#> Pages: 1 -#> No. characters: 1565 -#> Created: 2014-09-15 -``` - -See also fulltext (https://github.com/ropensci/fulltext) for getting scholarly text -for text mining. +## Documentation +See https://docs.ropensci.org/rcrossref/ to get started ## Meta diff --git a/README.md b/README.md index d5de2a8..3153064 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ rcrossref: R interface to CrossRef APIs * Crossref's API issue tracker: https://gitlab.com/crossref/issues * Crossref metadata search API: https://www.crossref.org/labs/crossref-metadata-search/ * Crossref DOI Content Negotiation: https://citation.crosscite.org/docs.html -* Crossref Text and Data Mining (TDM) Services: https://tdmsupport.crossref.org/ +* Crossref Text and Data Mining (TDM) Services: https://www.crossref.org/education/retrieve-metadata/rest-api/text-and-data-mining/ ## Installation @@ -41,7 +41,7 @@ Load `rcrossref` library('rcrossref') ``` -### Register for the Polite Pool +## Register for the Polite Pool If you are intending to access Crossref regularly you will want to send your email address with your queries. This has the advantage that queries are placed in the polite pool of servers. Including your email address is good practice as described in the Crossref documentation under Good manners (https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service). The second advantage is that Crossref can contact you if there is a problem with a query. @@ -55,399 +55,9 @@ Save the file and restart your R session To stop sharing your email when using rcrossref simply delete it from your .Renviron file. -## Citation search - -Use CrossRef's DOI Content Negotiation (https://citation.crosscite.org/docs.html) service, where you can get citations back in various formats, including `apa` - - -```r -cr_cn(dois = "10.1126/science.169.3946.635", format = "text", style = "apa") -#> [1] "Frank, H. S. (1970). The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance. Science, 169(3946), 635–641. doi:10.1126/science.169.3946.635" -``` - -`bibtex` - - -```r -cat(cr_cn(dois = "10.1126/science.169.3946.635", format = "bibtex")) -#> @article{Frank_1970, -#> doi = {10.1126/science.169.3946.635}, -#> url = {https://doi.org/10.1126%2Fscience.169.3946.635}, -#> year = 1970, -#> month = {aug}, -#> publisher = {American Association for the Advancement of Science ({AAAS})}, -#> volume = {169}, -#> number = {3946}, -#> pages = {635--641}, -#> author = {H. S. Frank}, -#> title = {The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance}, -#> journal = {Science} -#> } -``` - -## Citation count - -Citation count, using OpenURL - - -```r -cr_citation_count(doi = "10.1371/journal.pone.0042793") -#> doi count -#> 1 10.1371/journal.pone.0042793 40 -``` - -## Search Crossref metadata API - -The following functions all use the CrossRef API https://github.com/CrossRef/rest-api-doc#readme - -### Look up funder information - - -```r -cr_funders(query = "NSF") -#> $meta -#> total_results search_terms start_index items_per_page -#> 1 22 NSF 0 20 -#> -#> $data -#> # A tibble: 20 x 6 -#> id name alt.names uri tokens location -#> -#> 1 50110… National Strok… NSF http://dx… national, stro… -#> 2 50110… National Scien… NSF, National Sci… http://dx… national, scie… -#> 3 10000… National Sleep… NSF http://dx… national, slee… United … -#> 4 50110… Norsk Sykeplei… NSF, Norwegian Nu… http://dx… norsk, sykeple… -#> 5 10000… National Scien… USA NSF, NSF, US … http://dx… national, scie… United … -#> 6 10000… Center for Hie… CHM, NSF, Univers… http://dx… center, for, h… United … -#> 7 10001… Arkansas NSF E… Arkansas EPSCoR P… http://dx… arkansas, nsf,… United … -#> 8 10001… Kansas NSF EPS… KNE, NSF EPSCoR http://dx… kansas, nsf, e… United … -#> 9 50110… Natural Scienc… Anhui Provincial … http://dx… natural, scien… China -#> 10 10000… Statens Naturv… Danish National S… http://dx… statens, natur… -#> 11 10000… Office of the … NSF Office of the… http://dx… office, of, th… United … -#> 12 50110… National Natur… Natural Science F… http://dx… national, natu… China -#> 13 10001… Nick Simons Fo… NSF, The Nick Sim… http://dx… nick, simons, … United … -#> 14 10001… BioXFEL Scienc… National Science … http://dx… bioxfel, scien… United … -#> 15 10000… Division of In… IOS, NSF Division… http://dx… division, of, … United … -#> 16 50110… NSFC-Henan Joi… NSFC-Henan Provin… http://dx… nsfc, henan, j… China -#> 17 50110… National Natur… NSFC-Guangdong Jo… http://dx… national, natu… China -#> 18 50110… Data Center of… Data Center of Ma… http://dx… data, center, … China -#> 19 50110… National Natur… NSFC-Yunnan Joint… http://dx… national, natu… China -#> 20 50110… National Natur… NSFC-Shandong Joi… http://dx… national, natu… China -#> -#> $facets -#> NULL -``` - -### Check the DOI minting agency - - -```r -cr_agency(dois = '10.13039/100000001') -#> $DOI -#> [1] "10.13039/100000001" -#> -#> $agency -#> $agency$id -#> [1] "crossref" -#> -#> $agency$label -#> [1] "Crossref" -``` - -### Search works (i.e., articles) - - -```r -cr_works(filter = c(has_orcid = TRUE, from_pub_date = '2004-04-04'), limit = 1) -#> $meta -#> total_results search_terms start_index items_per_page -#> 1 4727973 NA 0 1 -#> -#> $data -#> # A tibble: 1 x 34 -#> container.title created deposited published.online doi indexed issn issue -#> -#> 1 Chemical Commu… 2018-1… 2019-11-… 2019 10.1… 2020-0… 1359… 6 -#> # … with 26 more variables: issued , member , page , -#> # prefix , publisher , score , source , -#> # reference.count , references.count , -#> # is.referenced.by.count , subject , title , type , -#> # update.policy , url , volume , abstract , -#> # language , short.container.title , assertion , -#> # author , funder , link , content_domain , -#> # license , reference -#> -#> $facets -#> NULL -``` - -### Search journals - - -```r -cr_journals(issn = c('1803-2427','2326-4225')) -#> $data -#> # A tibble: 2 x 53 -#> title publisher issn last_status_che… deposits_abstra… deposits_orcids… -#> -#> 1 Jour… "De Gruy… 1805… 2020-09-30 TRUE FALSE -#> 2 Jour… "America… 2326… 2020-09-29 FALSE FALSE -#> # … with 47 more variables: deposits , -#> # deposits_affiliations_backfile , -#> # deposits_update_policies_backfile , -#> # deposits_similarity_checking_backfile , -#> # deposits_award_numbers_current , -#> # deposits_resource_links_current , deposits_articles , -#> # deposits_affiliations_current , deposits_funders_current , -#> # deposits_references_backfile , deposits_abstracts_backfile , -#> # deposits_licenses_backfile , deposits_award_numbers_backfile , -#> # deposits_open_references_backfile , -#> # deposits_open_references_current , deposits_references_current , -#> # deposits_resource_links_backfile , deposits_orcids_backfile , -#> # deposits_funders_backfile , deposits_update_policies_current , -#> # deposits_similarity_checking_current , -#> # deposits_licenses_current , affiliations_current , -#> # similarity_checking_current , funders_backfile , -#> # licenses_backfile , funders_current , -#> # affiliations_backfile , resource_links_backfile , -#> # orcids_backfile , update_policies_current , -#> # open_references_backfile , orcids_current , -#> # similarity_checking_backfile , references_backfile , -#> # award_numbers_backfile , update_policies_backfile , -#> # licenses_current , award_numbers_current , -#> # abstracts_backfile , resource_links_current , -#> # abstracts_current , open_references_current , -#> # references_current , total_dois , current_dois , -#> # backfile_dois -#> -#> $facets -#> NULL -``` - -### Search license information - - -```r -cr_licenses(query = 'elsevier') -#> $meta -#> total_results search_terms start_index items_per_page -#> 1 39 elsevier 0 20 -#> -#> $data -#> # A tibble: 39 x 2 -#> URL work.count -#> -#> 1 http://aspb.org/publications/aspb-journals/open-articles 1 -#> 2 http://creativecommons.org/licenses/by-nc-nd/3.0/ 11 -#> 3 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 -#> 4 http://creativecommons.org/licenses/by-nc/4.0/ 4 -#> 5 http://creativecommons.org/licenses/by/2.0 2 -#> 6 http://creativecommons.org/licenses/by/3.0/ 1 -#> 7 http://creativecommons.org/licenses/by/3.0/igo/ 1 -#> 8 http://creativecommons.org/licenses/by/4.0 9 -#> 9 http://creativecommons.org/licenses/by/4.0/ 16 -#> 10 http://doi.wiley.com/10.1002/tdm_license_1 136 -#> # … with 29 more rows -``` - -### Search based on DOI prefixes - - -```r -cr_prefixes(prefixes = c('10.1016','10.1371','10.1023','10.4176','10.1093')) -#> $meta -#> NULL -#> -#> $data -#> member name -#> 1 http://id.crossref.org/member/78 Elsevier BV -#> 2 http://id.crossref.org/member/340 Public Library of Science (PLoS) -#> 3 http://id.crossref.org/member/297 Springer Science and Business Media LLC -#> 4 http://id.crossref.org/member/1989 Co-Action Publishing -#> 5 http://id.crossref.org/member/286 Oxford University Press (OUP) -#> prefix -#> 1 http://id.crossref.org/prefix/10.1016 -#> 2 http://id.crossref.org/prefix/10.1371 -#> 3 http://id.crossref.org/prefix/10.1023 -#> 4 http://id.crossref.org/prefix/10.4176 -#> 5 http://id.crossref.org/prefix/10.1093 -#> -#> $facets -#> list() -``` - -### Search CrossRef members - - -```r -cr_members(query = 'ecology', limit = 5) -#> $meta -#> total_results search_terms start_index items_per_page -#> 1 23 ecology 0 5 -#> -#> $data -#> # A tibble: 5 x 56 -#> id primary_name location last_status_che… total.dois current.dois -#> -#> 1 1950 Journal of … Suite 8… 2020-09-22 0 0 -#> 2 2899 Association… P.O. Bo… 2020-09-23 0 0 -#> 3 4302 Immediate S… Dept. o… 2020-09-23 6 0 -#> 4 7052 Chinese Jou… Flat C … 2020-09-25 1372 251 -#> 5 2467 Ideas in Ec… Prins W… 2020-09-23 0 0 -#> # … with 50 more variables: backfile.dois , prefixes , -#> # coverge.affiliations.current , -#> # coverge.similarity.checking.current , coverge.funders.backfile , -#> # coverge.licenses.backfile , coverge.funders.current , -#> # coverge.affiliations.backfile , coverge.resource.links.backfile , -#> # coverge.orcids.backfile , coverge.update.policies.current , -#> # coverge.open.references.backfile , coverge.orcids.current , -#> # coverge.similarity.checking.backfile , -#> # coverge.references.backfile , coverge.award.numbers.backfile , -#> # coverge.update.policies.backfile , coverge.licenses.current , -#> # coverge.award.numbers.current , coverge.abstracts.backfile , -#> # coverge.resource.links.current , coverge.abstracts.current , -#> # coverge.open.references.current , coverge.references.current , -#> # flags.deposits.abstracts.current , -#> # flags.deposits.orcids.current , flags.deposits , -#> # flags.deposits.affiliations.backfile , -#> # flags.deposits.update.policies.backfile , -#> # flags.deposits.similarity.checking.backfile , -#> # flags.deposits.award.numbers.current , -#> # flags.deposits.resource.links.current , flags.deposits.articles , -#> # flags.deposits.affiliations.current , -#> # flags.deposits.funders.current , -#> # flags.deposits.references.backfile , -#> # flags.deposits.abstracts.backfile , -#> # flags.deposits.licenses.backfile , -#> # flags.deposits.award.numbers.backfile , -#> # flags.deposits.open.references.backfile , -#> # flags.deposits.open.references.current , -#> # flags.deposits.references.current , -#> # flags.deposits.resource.links.backfile , -#> # flags.deposits.orcids.backfile , -#> # flags.deposits.funders.backfile , -#> # flags.deposits.update.policies.current , -#> # flags.deposits.similarity.checking.current , -#> # flags.deposits.licenses.current , names , tokens -#> -#> $facets -#> NULL -``` - -### Get N random DOIs - -`cr_r()` uses the function `cr_works()` internally. - - -```r -cr_r() -#> [1] "10.17855/jlas.2016.02.35.2.217" -#> [2] "10.1016/0003-4916(80)90392-9" -#> [3] "10.1177/0047117809359041" -#> [4] "10.2305/iucn.uk.2019-2.rlts.t55284a18361901.en" -#> [5] "10.1243/03093247v183173" -#> [6] "10.1094/pd-71-0832" -#> [7] "10.1111/j.1600-051x.1985.tb01384.x" -#> [8] "10.1109/isie.1996.551024" -#> [9] "10.1080/00224545.1995.9713968" -#> [10] "10.1016/j.jmmm.2010.11.007" -``` - -You can pass in the number of DOIs you want back (default is 10) - - -```r -cr_r(2) -#> [1] "10.1055/b-0034-81110" "10.2514/6.2006-1127" -``` - -## Get full text - -Publishers can optionally provide links in the metadata they provide to Crossref for full text of the work, but that data is often missing. Find out more about it at https://support.crossref.org/hc/en-us/articles/215750183-Crossref-Text-and-Data-Mining-Services - -Get some DOIs for articles that provide full text, and that have `CC-BY 3.0` licenses (i.e., more likely to actually be open) - - -```r -out <- - cr_works(filter = list(has_full_text = TRUE, - license_url = "http://creativecommons.org/licenses/by/3.0/")) -(dois <- out$data$doi) -#> [1] "10.1016/s0370-2693(01)01461-7" "10.1016/s0370-2693(01)01505-2" -#> [3] "10.1016/s0370-2693(01)01497-6" "10.1016/s0370-2693(01)01503-9" -#> [5] "10.1016/s0370-2693(01)01486-1" "10.1016/s0370-2693(02)01156-5" -#> [7] "10.1016/s0370-2693(02)01181-4" "10.1016/s0370-2693(01)01471-x" -#> [9] "10.1016/s0370-2693(01)01467-8" "10.1016/s0370-2693(02)01166-8" -#> [11] "10.1016/s0370-2693(02)01174-7" "10.1016/s0370-2693(02)01179-6" -#> [13] "10.1016/s0370-2693(01)01473-3" "10.1016/s0370-2693(01)01518-0" -#> [15] "10.1016/s0370-2693(01)01512-x" "10.1016/s0370-2693(01)01500-3" -#> [17] "10.1016/s0370-2693(01)01487-3" "10.1016/s0370-2693(01)01431-9" -#> [19] "10.1016/s0370-2693(01)01469-1" "10.1016/s0370-2693(01)01450-2" -``` - -From the output of `cr_works` we can get full text links if we know where to look: - - -```r -do.call("rbind", out$data$link) -#> # A tibble: 40 x 4 -#> URL content.type content.version intended.applica… -#> -#> 1 https://api.elsevier.com/cont… text/xml vor text-mining -#> 2 https://api.elsevier.com/cont… text/plain vor text-mining -#> 3 https://api.elsevier.com/cont… text/xml vor text-mining -#> 4 https://api.elsevier.com/cont… text/plain vor text-mining -#> 5 https://api.elsevier.com/cont… text/xml vor text-mining -#> 6 https://api.elsevier.com/cont… text/plain vor text-mining -#> 7 https://api.elsevier.com/cont… text/xml vor text-mining -#> 8 https://api.elsevier.com/cont… text/plain vor text-mining -#> 9 https://api.elsevier.com/cont… text/xml vor text-mining -#> 10 https://api.elsevier.com/cont… text/plain vor text-mining -#> # … with 30 more rows -``` - -From there, you can grab your full text, but because most links require -authentication, enter another package: `crminer`. - -You'll need package `crminer` for the rest of the work. - -Onc we have DOIs, get URLs to full text content - - -```r -if (!requireNamespace("crminer")) { - install.packages("crminer") -} -``` - - -```r -library(crminer) -(links <- crm_links("10.1155/2014/128505")) -#> $pdf -#> http://downloads.hindawi.com/archive/2014/128505.pdf -#> -#> $xml -#> http://downloads.hindawi.com/archive/2014/128505.xml -#> -#> $unspecified -#> http://downloads.hindawi.com/archive/2014/128505.pdf -``` - -Then use those URLs to get full text - - -```r -crm_pdf(links) -#> /Users/sckott/Library/Caches/R/crminer/128505.pdf -#> Pages: 1 -#> No. characters: 1565 -#> Created: 2014-09-15 -``` - -See also fulltext (https://github.com/ropensci/fulltext) for getting scholarly text -for text mining. +## Documentation +See https://docs.ropensci.org/rcrossref/ to get started ## Meta diff --git a/vignettes/rcrossref.Rmd b/vignettes/rcrossref.Rmd index 42098e8..0c4c852 100644 --- a/vignettes/rcrossref.Rmd +++ b/vignettes/rcrossref.Rmd @@ -1,7 +1,7 @@ --- title: rcrossref introduction author: Scott Chamberlain -date: "2020-10-01" +date: "2020-10-02" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{rcrossref introduction} @@ -186,7 +186,7 @@ cr_agency(dois = '10.13039/100000001') cr_works(filter=c(has_orcid=TRUE, from_pub_date='2004-04-04'), limit=1) #> $meta #> total_results search_terms start_index items_per_page -#> 1 4734523 NA 0 1 +#> 1 4738988 NA 0 1 #> #> $data #> # A tibble: 1 x 34 @@ -314,11 +314,11 @@ cr_members(query='ecology', limit = 5) #> # A tibble: 5 x 56 #> id primary_name location last_status_che… total.dois current.dois #> -#> 1 1950 Journal of … Suite 8… 2020-10-01 0 0 -#> 2 2899 Association… P.O. Bo… 2020-10-01 0 0 -#> 3 4302 Immediate S… Dept. o… 2020-09-23 6 0 -#> 4 7052 Chinese Jou… Flat C … 2020-09-25 1372 251 -#> 5 2467 Ideas in Ec… Prins W… 2020-10-01 0 0 +#> 1 1950 Journal of … Suite 8… 2020-10-02 0 0 +#> 2 2899 Association… P.O. Bo… 2020-10-02 0 0 +#> 3 4302 Immediate S… Dept. o… 2020-10-02 6 0 +#> 4 7052 Chinese Jou… Flat C … 2020-10-02 1372 251 +#> 5 2467 Ideas in Ec… Prins W… 2020-10-02 0 0 #> # … with 50 more variables: backfile.dois , prefixes , #> # coverge.affiliations.current , #> # coverge.similarity.checking.current , coverge.funders.backfile , @@ -366,11 +366,11 @@ cr_members(query='ecology', limit = 5) ```r cr_r() -#> [1] "10.1523/jneurosci.4206-04.2005" "10.1353/mln.0.0211" -#> [3] "10.1017/s042482010010617x" "10.1017/cbo9781139583817.010" -#> [5] "10.4324/9781003014645" "10.1371/journal.pcbi.1000711.s001" -#> [7] "10.1021/acs.analchem.5b02366" "10.2307/1959816" -#> [9] "10.1109/tuffc.2018.2872727" "10.3133/ofr7915" +#> [1] "10.1093/oseo/instance.00086433" "10.1093/jaoac/49.5.915" +#> [3] "10.14814/phy2.13993" "10.2210/pdb5rnw/pdb" +#> [5] "10.1002/ajp.1350150207" "10.1111/j.1532-950x.2012.01075.x" +#> [7] "10.1021/ma00060a006" "10.1371/journal.pone.0045474.s001" +#> [9] "10.1086/447083" "10.17352/jbm.000021" ``` You can pass in the number of DOIs you want back (default is 10) @@ -378,5 +378,92 @@ You can pass in the number of DOIs you want back (default is 10) ```r cr_r(2) -#> [1] "10.1108/medar-06-2015-0032" "10.1111/j.1540-6288.1989.tb00341.x" +#> [1] "10.1525/ncl.1983.38.2.99p0363h" "10.1177/1087057109350114" ``` + +## Get full text + +Publishers can optionally provide links in the metadata they provide to Crossref for full text of the work, but that data is often missing. Find out more about it at https://support.crossref.org/hc/en-us/articles/215750183-Crossref-Text-and-Data-Mining-Services + +Get some DOIs for articles that provide full text, and that have `CC-BY 3.0` licenses (i.e., more likely to actually be open) + + +```r +out <- + cr_works(filter = list(has_full_text = TRUE, + license_url = "http://creativecommons.org/licenses/by/3.0/")) +(dois <- out$data$doi) +#> [1] "10.1016/s0370-2693(01)01461-7" "10.1016/s0370-2693(01)01505-2" +#> [3] "10.1016/s0370-2693(01)01497-6" "10.1016/s0370-2693(01)01503-9" +#> [5] "10.1016/s0370-2693(01)01486-1" "10.1016/s0370-2693(02)01156-5" +#> [7] "10.1016/s0370-2693(02)01181-4" "10.1016/s0370-2693(01)01471-x" +#> [9] "10.1016/s0370-2693(01)01467-8" "10.1016/s0370-2693(02)01166-8" +#> [11] "10.1016/s0370-2693(02)01174-7" "10.1016/s0370-2693(02)01179-6" +#> [13] "10.1016/s0370-2693(01)01473-3" "10.1016/s0370-2693(01)01518-0" +#> [15] "10.1016/s0370-2693(01)01512-x" "10.1016/s0370-2693(01)01500-3" +#> [17] "10.1016/s0370-2693(01)01487-3" "10.1016/s0370-2693(01)01431-9" +#> [19] "10.1016/s0370-2693(01)01469-1" "10.1016/s0370-2693(01)01450-2" +``` + +From the output of `cr_works` we can get full text links if we know where to look: + + +```r +do.call("rbind", out$data$link) +#> # A tibble: 40 x 4 +#> URL content.type content.version intended.applica… +#> +#> 1 https://api.elsevier.com/cont… text/xml vor text-mining +#> 2 https://api.elsevier.com/cont… text/plain vor text-mining +#> 3 https://api.elsevier.com/cont… text/xml vor text-mining +#> 4 https://api.elsevier.com/cont… text/plain vor text-mining +#> 5 https://api.elsevier.com/cont… text/xml vor text-mining +#> 6 https://api.elsevier.com/cont… text/plain vor text-mining +#> 7 https://api.elsevier.com/cont… text/xml vor text-mining +#> 8 https://api.elsevier.com/cont… text/plain vor text-mining +#> 9 https://api.elsevier.com/cont… text/xml vor text-mining +#> 10 https://api.elsevier.com/cont… text/plain vor text-mining +#> # … with 30 more rows +``` + +From there, you can grab your full text, but because most links require +authentication, enter another package: `crminer`. + +You'll need package `crminer` for the rest of the work. + +Onc we have DOIs, get URLs to full text content + + +```r +if (!requireNamespace("crminer")) { + install.packages("crminer") +} +``` + + +```r +library(crminer) +(links <- crm_links("10.1155/2014/128505")) +#> $pdf +#> http://downloads.hindawi.com/archive/2014/128505.pdf +#> +#> $xml +#> http://downloads.hindawi.com/archive/2014/128505.xml +#> +#> $unspecified +#> http://downloads.hindawi.com/archive/2014/128505.pdf +``` + +Then use those URLs to get full text + + +```r +crm_pdf(links) +#> /Users/sckott/Library/Caches/R/crminer/128505.pdf +#> Pages: 1 +#> No. characters: 1565 +#> Created: 2014-09-15 +``` + +See also fulltext (https://github.com/ropensci/fulltext) for getting scholarly text +for text mining. diff --git a/vignettes/rcrossref.Rmd.og b/vignettes/rcrossref.Rmd.og index b4c08e6..f7d5418 100644 --- a/vignettes/rcrossref.Rmd.og +++ b/vignettes/rcrossref.Rmd.og @@ -129,3 +129,53 @@ You can pass in the number of DOIs you want back (default is 10) ```{r} cr_r(2) ``` + +## Get full text + +Publishers can optionally provide links in the metadata they provide to Crossref for full text of the work, but that data is often missing. Find out more about it at https://support.crossref.org/hc/en-us/articles/215750183-Crossref-Text-and-Data-Mining-Services + +Get some DOIs for articles that provide full text, and that have `CC-BY 3.0` licenses (i.e., more likely to actually be open) + +```{r} +out <- + cr_works(filter = list(has_full_text = TRUE, + license_url = "http://creativecommons.org/licenses/by/3.0/")) +(dois <- out$data$doi) +``` + +From the output of `cr_works` we can get full text links if we know where to look: + +```{r} +do.call("rbind", out$data$link) +``` + +From there, you can grab your full text, but because most links require +authentication, enter another package: `crminer`. + +You'll need package `crminer` for the rest of the work. + +Onc we have DOIs, get URLs to full text content + +```{r eval=FALSE} +if (!requireNamespace("crminer")) { + install.packages("crminer") +} +``` + +```{r} +library(crminer) +(links <- crm_links("10.1155/2014/128505")) +``` + +Then use those URLs to get full text + +```{r eval=FALSE} +crm_pdf(links) +#> /Users/sckott/Library/Caches/R/crminer/128505.pdf +#> Pages: 1 +#> No. characters: 1565 +#> Created: 2014-09-15 +``` + +See also fulltext (https://github.com/ropensci/fulltext) for getting scholarly text +for text mining.