Skip to content

Commit

Permalink
post: csc update 09/2024
Browse files Browse the repository at this point in the history
  • Loading branch information
cbroschinski committed Oct 18, 2024
1 parent 640eab3 commit ea5f5f6
Show file tree
Hide file tree
Showing 5 changed files with 316 additions and 0 deletions.
112 changes: 112 additions & 0 deletions Rmd/2024-10-18-csc.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
layout: post
author: Christoph Broschinski
author_lnk: https://github.com/cbroschinski
title: CSC updates collected APC data for several Finnish institutions
date: 2024-10-18 10:00:00
summary:
categories: [general, openAPC]
comments: true
---


```{r, echo = FALSE}
knitr::opts_knit$set(base.url = "/")
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
warning = FALSE,
message = FALSE,
echo = FALSE,
fig.width = 9,
fig.height = 6
)
options(scipen = 1, digits = 2)
knitr::knit_hooks$set(inline = function(x) {
prettyNum(x, big.mark=",")
})
```

The [CSC – IT Center for Science](https://www.csc.fi/en/home) has updated its data, providing more APC data for Finnish HEIs.

The CSC is a non-profit government enterprise which provides IT support and services to Finnish academia and research institutes. It operates [VIRTA](https://wiki.eduuni.fi/display/cscvirtajtp/VIRTA+in+English), a centralized service tasked with collecting publication data from HEIs. VIRTA may also optionally collect cost data on OA publishing (APCs and BPCs), this data is made available to OpenAPC where permission was granted by the data providers.

Contact Person for the CSC is [Ville Hämäläinen](mailto:[email protected]).

## Cost data

```{r, cache.lazy = TRUE}
#' Download APC spreadsheet from github which requires to Curl installed
download_apc <- function(path = NULL, dir = "tmp", file = "apc_de.csv"){
if(is.null(path)) {
path <- c("https://raw.githubusercontent.com/OpenAPC/openapc-de/v4.121.9-7-6/data/apc_de.csv")
}
dir.create(dir)
download.file(url = path, destfile = paste(dir, file, sep = "/"), method = "curl")
read.csv(paste(dir, file, sep = "/"), header = T,sep =",")
}
my.apc <- download_apc(c("https://raw.githubusercontent.com/OpenAPC/openapc-de/refs/tags/v4.133.3-2-2/data/csc/journals_enriched.csv"))
my.apc <- my.apc[my.apc$institution != "",]
my.apc <- droplevels(my.apc)
```


The updated data set provided by the CSC covers publication fees for `r format(nrow(my.apc), big.mark =",")` articles published between 2013 and 2024, replacing the previous CSC data. Total expenditure amounts to `r sum(my.apc$euro)`€ and the average fee is `r sum(my.apc$euro)/nrow(my.apc)`€.


```{r}
d_frame = data.frame(table(my.apc$publisher, dnn="Publisher"))
d_frame = d_frame[with(d_frame, order(-Freq, Publisher)), ]
my.apc$publisher <- factor(my.apc$publisher, levels = d_frame$Publisher)
df.summary <-cbind(tapply(my.apc$euro, my.apc$publisher, length),
tapply(my.apc$euro, my.apc$publisher, sum),
tapply(my.apc$euro, my.apc$publisher, mean))
colnames(df.summary) <- c("Articles", "Fees paid in EURO", "Mean Fee paid")
knitr::kable(as.data.frame(df.summary), digits = 2)
```

## Overview

A detailed analysis of the contributed data provides the following overview:

### Fees paid per publisher (in EURO)

```{r tree_csc_2024_10_18_full}
tt <- aggregate(my.apc$euro, by = list(my.apc$publisher), sum)
colnames(tt) <- c("Publisher", "Euro")
treemap::treemap(tt, index = c("Publisher"), vSize = "Euro", palette = "Paired")
```

### Average costs per year (in EURO)

```{r box_csc_2024_10_18_year_full, echo=FALSE, message = FALSE}
require(ggplot2)
q <- ggplot(my.apc, aes(factor(period), euro)) + geom_boxplot() + geom_point()
q <- q + ylab("Fees paid (in EURO)") + theme(legend.position="top") + theme_bw(base_size = 18)
q + xlab("Funding period") + ylab("APC")
```

### Average costs per publisher (in EURO)

```{r box_csc_2024_10_18_full, echo = FALSE, message = FALSE}
require(ggplot2)
require(tidyverse)
my.apc <- my.apc %>%
mutate(publisher = str_replace(publisher, ".+\\((\\w+)\\)", "\\1"))
d_frame = data.frame(table(my.apc$publisher, dnn="Publisher"))
d_frame = d_frame[with(d_frame, order(-Freq, Publisher)), ]
publishers = as.character(d_frame$Publisher[d_frame$Freq > 20])
my.apc_reduced = my.apc[my.apc$publisher %in% publishers,]
q <- ggplot(my.apc_reduced, aes(publisher, euro)) + geom_boxplot() + geom_point()
q <- q + ylab("Fees paid (in EURO)") + theme(legend.position="top") + theme_bw(base_size = 18) + coord_flip()
q + xlab("Publisher (> 20 articles)") + ylab("APC")
```
Loading

0 comments on commit ea5f5f6

Please sign in to comment.