Mapping Collective Intelligence research

This research project aims to create a large and open evidence base on Collective Intelligence (CI) research and its intersection with AI.

The work in this repository is organised in a metaflow pipeline with the following steps:

Create a PostgreSQL database and the required tables as shown in the ER diagram. If they already exist, the initialisation is skipped.
Collect papers from MAG based on Fields of Study (FoS). The pickled responses are stored locally in data/raw/.
Parse the MAG API response in a PostgreSQL database.
Collect the level of a Field of Study in MAG's hierarchy.
Tag papers as CI and AI+CI. This method could be modified to divide a dataset to core and control groups.
Geocode author affiliation using Google Places API.
Tag journals as open access based on a seed list.
Find the type (industry, non-industry) of affiliations based on a seed list.
Process the data used in EDA. This involves changing data types, merging and grouping tables.
Exploratory data analysis of the CI research landscape. Produce Altair plots and store them in reports/figures as HTML pages (some of them are interactive).
- Annual publication increase (base year: 2000)
- Annual sum of citations
- Publications by industry and non-industry affiliations
- International collaborations: % of cross-country teams in CI, AI+CI
- Industry - academia collaborations: % in CI, AI+CI
- Adoption of open access by CI, AI+CI
- Field of study comparison for CI, AI+CI. Produce plots for levels 1, 2 and 3 of the MAG hierarchy. Also produce a plot for a pre-selected list of Fields of Study.
- Annual publications in conferences and journals.
- Number of annual publications in CI, AI+CI.
Collect metadata (publication date, title, abstract etc) for paper referenced by a CI paper. The pickled responses are stored locally in data/interim/.
Calculate annual research diversity using their Fields of Study and Shannon diversity index. This produces an Altair plot which is stored in reports/figures as an HTML page.

Notes

You can use the same pipeline to query MAG with a conference or journal name as described in Orion's docs.
All of the parameters are stored in the model_config.yaml file. Exception: Parameters of Altair plots, like width and height, are hardcoded.

How to rerun the data collection and analysis

Clone the repository.

$ git clone https://github.com/nestauk/ci_mapping

Change your working directory to ci_mapping/ and in an Anaconda environment, install the requirements.

$ pip install -r requirements.txt

Obtain access to Microsoft Knowledge and Google Places APIs.
Create a .env file and add your secrets. You can use .env.example as an example.
Run the metaflow pipeline.

$ python ci_mapping/run_pipeline.py --no-pylint run

The project assumes you have a working PostgreSQL distribution installed and running locally.

Project based on the Nesta cookiecutter data science project template.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
ci_mapping		ci_mapping
data		data
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ci_db_ER_diagram.png		ci_db_ER_diagram.png
logging.yaml		logging.yaml
model_config.yaml		model_config.yaml
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mapping Collective Intelligence research

Notes

How to rerun the data collection and analysis

About

Releases

Packages

Languages

License

johannescastner/ci_mapping

Folders and files

Latest commit

History

Repository files navigation

Mapping Collective Intelligence research

Notes

How to rerun the data collection and analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages