-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path001_kb_rp_pretest.Rmd
217 lines (177 loc) · 9.76 KB
/
001_kb_rp_pretest.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
title: "Reprint Authors per Country"
output: github_document
bibliography: pubs.bib
---
```{r, echo = FALSE, message = FALSE, warning = FALSE}
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
warning = FALSE,
message = FALSE,
echo = TRUE
)
```
```{r setup, echo = FALSE}
require(tidyverse)
require(RJDBC)
require(rJava)
.jinit()
jdbcDriver <-
JDBC(driverClass = "oracle.jdbc.OracleDriver", classPath = "../inst/jdbc_driver/ojdbc8.jar")
jdbcConnection <-
dbConnect(
jdbcDriver,
"jdbc:oracle:thin:@//biblio-p-db01:1521/bibliodb01.fiz.karlsruhe",
Sys.getenv("kb_user"),
Sys.getenv("kb_pwd")
)
```
## Motivation
Country information about corresponding authors play a crucial role in open access funding [@Schimmer_2015]. The Web of Science contains data about this author role including affiliations, making it a valuable source to determine the productivity of corresponding authors per country and journal.
The aim of this work is to better understand how countries of affiliation from corresponding authors are represented in the WoS-KB, and how they can be retrieved from it. Such an analysis will yield valuable insights about how to calculate the number of scholarly articles per country and journal from the WoS-KB database using the corresponding author addresses only.
## Potential issues
1. A corresponding author lists at least two different countries
2. More than one corresponding author is listed
3. Corresponding authors work in different countries
<!-- why is issue of ex. 1 (multiple affiliations with differing author roles) not mentioned here?
because I was not aware of this issue before. Let's wait and see how the table will be changed after the KB realised this data consistency issue.
Maybe add something like
4. Multiple country affiliations of the same author are listed with differing author roles
The first three issues are of a more conceptual nature, all leading to a situation where one article is associated with multiple countries of affiliation. One has to decide upon a way of handling them which has to be considered at the time of interpretation of the results. Two options for the mentioned issues are whole counting or fractional counting. For whole counting, each article is counted fully for any country of affiliation of any listed corresponding author, so that the number of articles in the single categories do not necessarily add up to the total number of articles any more. Fractional counting assigns a fraction of one article count to each corresponding author, leading to possible non-integer total article counts. The second issue is not much of a problem in our setting, if both authors are situated in the same country, since we are interested in the country of affiliation associated with an article. If, however, the corresponding authors' countries of affiliation differ as in issue 3, we again have to decide upon a counting scheme: do we count each country only once per article or multiple times if multiple corresponding authors work there? The fourth issue is rather a problem of data consistency. -->
## Example 1
Potential issues: A corresponding author lists at least two different countries, e.g., `UT=000400754000138`. Additionally, only one affiliation is tagged as `RP` in `wos_b_2019.items_author_institutions`.
```{sql connection=jdbcConnection}
select distinct wos_b_2019.institutions.countrycode,
wos_b_2019.items_authors_institutions.type,
wos_b_2019.items_authors_institutions.fk_authors,
wos_b_2019.items.ut_eid,
wos_b_2019.items.pubyear
from wos_b_2019.items
inner join
wos_b_2019.items_authors_institutions
on wos_b_2019.items_authors_institutions.fk_items = wos_b_2019.items.pk_items
inner join
wos_b_2019.institutions
on wos_b_2019.institutions.pk_institutions = wos_b_2019.items_authors_institutions.fk_institutions
where wos_b_2019.items.ut_eid = '000400754000138'
order by wos_b_2019.items_authors_institutions.fk_authors
```
## Example 2
Potential issues: More than one corresponding author is listed. For `UT_EID=000372645900002`, the Web of Science lists seven corresponding authors and four reprint adresses. Furthermore, as in Example 1, only one affiliation is tagged as `RP` in `wos_b_2019.items_author_institutions`.
```{sql connection=jdbcConnection, max.print = NA}
select distinct wos_b_2019.institutions.countrycode,
wos_b_2019.items_authors_institutions.type,
wos_b_2019.items_authors_institutions.fk_authors,
wos_b_2019.items.ut_eid,
wos_b_2019.items.pubyear
from wos_b_2019.items
inner join
wos_b_2019.items_authors_institutions
on wos_b_2019.items_authors_institutions.fk_items = wos_b_2019.items.pk_items
inner join
wos_b_2019.institutions
on wos_b_2019.institutions.pk_institutions = wos_b_2019.items_authors_institutions.fk_institutions
where wos_b_2019.items.ut_eid = '000372645900002'
order by wos_b_2019.items_authors_institutions.fk_authors
```
## Example 3
Corresponding authors work in different countries. For `UT_EID=000372645900002`, the Web of Science lists two corresponding authors, one affilated with a Spanish, the other with a Portuguese institution. As in Examples 1 and 2, only one affiliation is tagged with RP in `wos_b_2019.items_author_institutions`. Moreover, checking the WoS raw data reveals that the RP-author `15145104` is not affiliated with an Portuguese institution at all.
```{sql connection=jdbcConnection}
select distinct wos_b_2019.institutions.countrycode,
wos_b_2019.items_authors_institutions.type,
wos_b_2019.items_authors_institutions.fk_authors,
wos_b_2019.items.ut_eid,
wos_b_2019.items.pubyear
from wos_b_2019.items
inner join
wos_b_2019.items_authors_institutions
on wos_b_2019.items_authors_institutions.fk_items = wos_b_2019.items.pk_items
inner join
wos_b_2019.institutions
on wos_b_2019.institutions.pk_institutions = wos_b_2019.items_authors_institutions.fk_institutions
where wos_b_2019.items.ut_eid = '000389110200022'
order by wos_b_2019.items_authors_institutions.fk_authors
```
## SQL strategy to circumvent these issues
### Steps
1. Get all `FK_AUTHORS` tagged as `RP`, and store the result as `rp_author_table`
2. From authors listed in `rp_author_table` obtain all countries of affiliation and store them in `rp_countries_table`
3. Aggregate `rp_countries_table`
#### 1. Get all `FK_AUTHORS` tagged as `RP`, and store the result as `rp_author_table`
```{sql connection=jdbcConnection, output.var="rp_authors"}
select
distinct wos_b_2019.items_authors_institutions.fk_items,
wos_b_2019.items_authors_institutions.fk_authors
from
wos_b_2019.items
inner join
wos_b_2019.databasecollection
on wos_b_2019.databasecollection.fk_items = wos_b_2019.items.pk_items
inner join
wos_b_2019.items_authors_institutions
on wos_b_2019.items_authors_institutions.fk_items = wos_b_2019.items.pk_items
inner join
wos_b_2019.institutions
on wos_b_2019.institutions.pk_institutions = wos_b_2019.items_authors_institutions.fk_institutions
where
wos_b_2019.databasecollection.edition_value in ('WOS.SCI', 'WOS.SSCI', 'WOS.AHCI')
and wos_b_2019.items.doctype in ('Article', 'Review')
and wos_b_2019.items.pubyear in (2014, 2015, 2016, 2017)
and wos_b_2019.items.ut_eid in ('000389110200022', '000372645900002', '000400754000138')
and wos_b_2019.items_authors_institutions.type = 'RP'
```
Store in personal table space
```{r}
dbWriteTable(conn = jdbcConnection,
name = "rp_author_table",
value = rp_authors)
```
#### From authors listed in `rp_author_table` obtain all countries of affilation and store them in `rp_countries_table`
```{sql connection=jdbcConnection, output.var="rp_countries"}
select
countrycode,
wos_b_2019.items_authors_institutions.type,
wos_b_2019.items_authors_institutions.fk_institutions,
wos_b_2019.items_authors_institutions.fk_authors,
wos_b_2019.issues.issn,
ut_eid,
wos_b_2019.items.pubyear
from
wos_b_2019.items
inner join
wos_b_2019.issues
on wos_b_2019.issues.pk_issues = wos_b_2019.items.fk_issues
inner join
wos_b_2019.items_authors_institutions
on wos_b_2019.items_authors_institutions.fk_items = wos_b_2019.items.pk_items
inner join
wos_b_2019.institutions
on wos_b_2019.institutions.pk_institutions = wos_b_2019.items_authors_institutions.fk_institutions
inner join
rp_author_table
on (
rp_author_table.fk_items = wos_b_2019.items_authors_institutions.fk_items
and rp_author_table.fk_authors = wos_b_2019.items_authors_institutions.fk_authors
)
```
Store in personal table space
```{r}
dbWriteTable(conn = jdbcConnection,
name = "rp_countries_table",
value = rp_countries)
```
#### Aggregate `rp_countries_table`
Counting method: country of affilation is only counted once per article.
```{sql connection=jdbcConnection}
select
countrycode,
issn,
count(distinct(ut_eid)) as publications
from
rp_countries_table
group by
countrycode,
issn
order by issn
```