This repository contains simple scripts in Python designed to help historians prepare data for quantitative analysis and visualization. Please the following links for further explanations and contextualisation.
- "Doing Digital History with Python" series on the IEG DH Lab blog, hosted by the Institute of European History in Mainy, Germany
- Python-related post on my Island Studies research blog
- Teaching materials concerning distant reading shared on Github
In many cases, no data samples were provided because the data are either owned by other institutions such as archives or because the data come in standardises, reproducable formats (e.g. RDF files exported from publicly available norm data services). Also, this repository is maintained by a historian who can only dedicate a limited amount of time to coding. If you, therefore, cannot make sense of code posted here and require additional information, do not hesitate to contact me.
All code is shared for reuse and I am also grateful for suggestions to amend it. As I am normally working with very messy historical data, I tend to prefer redundant code that permits me a lot of intermediate exception handling and print statements in order to capture all special cases. Users working on well-structured and ontologically more consistent data may want to streamline my code samples more, especially if they are working with big data and need efficient script performance.
Screen captures of current script performances are shared on my "Digital History" Youtube channel.
The list "Python for digital history" contains environments and packages which I, based on my own experience, especially recommend for specific DH tasks.
For further information, you can read the Python documentation for individual packages on PyPi. Please keep in mind that Python, like any other software, is constantly being updated and that all packages listed are meant to be used with the latest release, Python3. It is also advisable to check out DH-related repositories on GitHub, the leading open source code-sharing platform, discussions in coding-forums such as Stack Overflow, or follow professionally curated blogs and peer-reviewed online journals such “The Programming Historian”. As you will see in the table, several popular packages are already included in the latest release of Python and or come with the Anaconda platform.
If you have not installed Python yet, we recommend installing Anaconda, which includes Python, to make sure that you are not creating potentially conflicting environments. Anaconda is an easy-to-use environment, especially for people with little knowledge of programming.In order to invoke a particular Python package or a module from that package in your script, you need to type “import” or “import … from” at the beginning. The “import” statement in Python is explained in the official documentation.
Whatever your research goals are, do not hesitate to contact the global community of Python users and developers in one of the many Python forums or via Twitter. For coding women, for example, there is the fantastic global network @pyladies. You will always find help.