Access to data for workshops and extended tests of MDAnalysis.
Data sets are stored at external stable URLs (e.g., on figshare, zenodo, or DataDryad) and this package provides a simple interface to download, cache, and access data sets.
To use, install the package
pip install --upgrade MDAnalysisData
or install with conda
conda install --channel conda-forge mdanalysisdata
Import the datasets and access your data set of choice:
from MDAnalysisData import datasets
adk = datasets.fetch_adk_equilibrium()
The returned object contains attributes with the paths to topology and trajectory files so that you can use it directly with, for instance, MDAnalysis:
import MDAnalysis as mda
u = mda.Universe(adk.topology, adk.trajectory)
The metadata object also contains a DESCR
attribute with a
description of the data set, including relevant citations:
print(adk.DESCR)
Data are locally stored in the data directory ~/MDAnalysis_data
(i.e., in the user's home directory). This location can be changed by
setting the environment variable MDANALYSIS_DATA
, for instance
export MDANALYSIS_DATA=/tmp/MDAnalysis_data
The location of the data directory can be obtained with
MDAnalysisData.base.get_data_home()
If the data directory is removed then data are downloaded again. Data file integrity is checked with a SHA256 checksum when the file is downloaded.
The data directory can we wiped with the function
MDAnalysisData.base.clear_data_home()
Please add new datasets to MDAnalysisData. See Contributing new datasets for details, but in short:
- raise an issue in the issue tracker describing what you want to add; this issue will become the focal point for discussions where the developers can easily give advice
- deposit data in an archive under an Open Data compatible license (CC0 or CC-BY preferred)
- write accessor code in MDAnalysisData
This package is modelled after
sklearn.datasets. It
uses code from sklearn.datasets
(under the BSD 3-clause
license).
No data are included; please see the DESCR
attribute for each data
set for authorship, citation, and license information for the data.