This Bachelor's thesis focused on the evolution and representation of social science ideas in newspaper media. Drawing inspiration from Hallett's (2019) framework, it investigated how these ideas transitioned through three career stages in the media: as objects of interest, interpretants, and credibility signals. The primary goal was to automate and scale up the empirical analysis of these ideas using Natural Language Processing (NLP).
The objective was to create an information retrieval system capable of identifying newspaper articles that mentioned specific public ideas. This system used NLP to automate the classification of these ideas into their respective career stages. The development included experimenting with four different classification models to enhance the accuracy and efficiency of the document classification system.
The project involved:
- Developing an information retrieval system tailored for social scientists.
- Implementing and testing four classification models.
- Training and evaluating these models using a high-quality dataset derived from 'The Bell Curve: Intelligence and Class Structure in American Life' by Charles Murray and Richard J. Herrnstein.
The goal was to get a deeper understanding of the progression of social science ideas in the news. Additionally, it aimed to aid social scientists by automatically retrieving and updating their analysis of how these ideas were represented and evolved in newspaper articles.
This thesis contributed to the field by providing a novel approach to understanding the dissemination and evolution of social science ideas in the media. The use of NLP and machine learning offered a method for large-scale, efficient analysis.