-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
640eab3
commit ea5f5f6
Showing
5 changed files
with
316 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
--- | ||
layout: post | ||
author: Christoph Broschinski | ||
author_lnk: https://github.com/cbroschinski | ||
title: CSC updates collected APC data for several Finnish institutions | ||
date: 2024-10-18 10:00:00 | ||
summary: | ||
categories: [general, openAPC] | ||
comments: true | ||
--- | ||
|
||
|
||
```{r, echo = FALSE} | ||
knitr::opts_knit$set(base.url = "/") | ||
knitr::opts_chunk$set( | ||
comment = "#>", | ||
collapse = TRUE, | ||
warning = FALSE, | ||
message = FALSE, | ||
echo = FALSE, | ||
fig.width = 9, | ||
fig.height = 6 | ||
) | ||
options(scipen = 1, digits = 2) | ||
knitr::knit_hooks$set(inline = function(x) { | ||
prettyNum(x, big.mark=",") | ||
}) | ||
``` | ||
|
||
The [CSC – IT Center for Science](https://www.csc.fi/en/home) has updated its data, providing more APC data for Finnish HEIs. | ||
|
||
The CSC is a non-profit government enterprise which provides IT support and services to Finnish academia and research institutes. It operates [VIRTA](https://wiki.eduuni.fi/display/cscvirtajtp/VIRTA+in+English), a centralized service tasked with collecting publication data from HEIs. VIRTA may also optionally collect cost data on OA publishing (APCs and BPCs), this data is made available to OpenAPC where permission was granted by the data providers. | ||
|
||
Contact Person for the CSC is [Ville Hämäläinen](mailto:[email protected]). | ||
|
||
## Cost data | ||
|
||
```{r, cache.lazy = TRUE} | ||
#' Download APC spreadsheet from github which requires to Curl installed | ||
download_apc <- function(path = NULL, dir = "tmp", file = "apc_de.csv"){ | ||
if(is.null(path)) { | ||
path <- c("https://raw.githubusercontent.com/OpenAPC/openapc-de/v4.121.9-7-6/data/apc_de.csv") | ||
} | ||
dir.create(dir) | ||
download.file(url = path, destfile = paste(dir, file, sep = "/"), method = "curl") | ||
read.csv(paste(dir, file, sep = "/"), header = T,sep =",") | ||
} | ||
my.apc <- download_apc(c("https://raw.githubusercontent.com/OpenAPC/openapc-de/refs/tags/v4.133.3-2-2/data/csc/journals_enriched.csv")) | ||
my.apc <- my.apc[my.apc$institution != "",] | ||
my.apc <- droplevels(my.apc) | ||
``` | ||
|
||
|
||
The updated data set provided by the CSC covers publication fees for `r format(nrow(my.apc), big.mark =",")` articles published between 2013 and 2024, replacing the previous CSC data. Total expenditure amounts to `r sum(my.apc$euro)`€ and the average fee is `r sum(my.apc$euro)/nrow(my.apc)`€. | ||
|
||
|
||
```{r} | ||
d_frame = data.frame(table(my.apc$publisher, dnn="Publisher")) | ||
d_frame = d_frame[with(d_frame, order(-Freq, Publisher)), ] | ||
my.apc$publisher <- factor(my.apc$publisher, levels = d_frame$Publisher) | ||
df.summary <-cbind(tapply(my.apc$euro, my.apc$publisher, length), | ||
tapply(my.apc$euro, my.apc$publisher, sum), | ||
tapply(my.apc$euro, my.apc$publisher, mean)) | ||
colnames(df.summary) <- c("Articles", "Fees paid in EURO", "Mean Fee paid") | ||
knitr::kable(as.data.frame(df.summary), digits = 2) | ||
``` | ||
|
||
## Overview | ||
|
||
A detailed analysis of the contributed data provides the following overview: | ||
|
||
### Fees paid per publisher (in EURO) | ||
|
||
```{r tree_csc_2024_10_18_full} | ||
tt <- aggregate(my.apc$euro, by = list(my.apc$publisher), sum) | ||
colnames(tt) <- c("Publisher", "Euro") | ||
treemap::treemap(tt, index = c("Publisher"), vSize = "Euro", palette = "Paired") | ||
``` | ||
|
||
### Average costs per year (in EURO) | ||
|
||
```{r box_csc_2024_10_18_year_full, echo=FALSE, message = FALSE} | ||
require(ggplot2) | ||
q <- ggplot(my.apc, aes(factor(period), euro)) + geom_boxplot() + geom_point() | ||
q <- q + ylab("Fees paid (in EURO)") + theme(legend.position="top") + theme_bw(base_size = 18) | ||
q + xlab("Funding period") + ylab("APC") | ||
``` | ||
|
||
### Average costs per publisher (in EURO) | ||
|
||
```{r box_csc_2024_10_18_full, echo = FALSE, message = FALSE} | ||
require(ggplot2) | ||
require(tidyverse) | ||
my.apc <- my.apc %>% | ||
mutate(publisher = str_replace(publisher, ".+\\((\\w+)\\)", "\\1")) | ||
d_frame = data.frame(table(my.apc$publisher, dnn="Publisher")) | ||
d_frame = d_frame[with(d_frame, order(-Freq, Publisher)), ] | ||
publishers = as.character(d_frame$Publisher[d_frame$Freq > 20]) | ||
my.apc_reduced = my.apc[my.apc$publisher %in% publishers,] | ||
q <- ggplot(my.apc_reduced, aes(publisher, euro)) + geom_boxplot() + geom_point() | ||
q <- q + ylab("Fees paid (in EURO)") + theme(legend.position="top") + theme_bw(base_size = 18) + coord_flip() | ||
q + xlab("Publisher (> 20 articles)") + ylab("APC") | ||
``` |
Oops, something went wrong.