This summer I am working as part of the Aspuru-Guzik Research Lab of the Harvard Clean Energy Project. The lab has run simulations of millions of potential molecules that can be used for more efficient solar cells and other clean energy technologies. However, much of the output data has been scattered across different storage systems and directories, and the current database system in MongoDB is somewhat oudated.
My responsibility for this summer is to consolidate the data from simulations the CEP has run so far and update/modernize the database. I also have the secondary goal of creating interesting visualizations for this data that would be useful for the CEP and/or presentable to a layperson or scientist in a tangential field.
This is the repository housing my files for this project, though care will be taken to ensure that my login credentials or other sensitive data are not leaked in this repository.
I have kept a log of my progress on this project here: Project Log
UPDATE: (June 9, 2017) Unfortunately I had some unexpected personal issues come up in the last couple of days and so today is my last day in the lab :(
I was still able to make a catalog of the data records stored in the existing CEP mongo databases -- very helpful for consolidating the old CEP data into one place.
If you are continuing my work or seeking to build upon my scripts, the best place to start is with the Species Enumeration README