-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This Wiki currently acts as a blog for Team Zissou in the course Project in Computer Systems (Project IT).
(Due to splitting up into groups we do not have any detailed blog-post-data from the first couple of weeks)
The first week(s) was mainly spent on reading, discussing and understanding the Graphicionado paper and implementation. We focused on how it should be implemented using the distributed shared memory system ArgoDSM.
In short, additional work included: installing ArgoDSM locally, testing the Argo API locally, running Graph500 (with MPI), requesting access to the Jason cluster, setting up a basic skeleton code for Graphicionado (based on the pseudo-code in the article).
(might fill in some here later)
We spent most of the day reading through articles and code related to GraphX, Giraph and GraphMat. We decided to have the weekly meeting with Stefanos and Magnus on Thursday afternoon (13-15).
After the weekly meeting we sat up some goals for the following week: create a design document, have a runnable sequential implementation of Graphicionado with BFS. We immediately started a design document where we present our choices of how data will be stored in memory. We also made good progress with the sequentially implementation of Graphicionado.
The main portion of the day was spent working on the sequential Graphicionado implementation. We are still lacking a proper graph generator, but our program can read graph data from files (as long as they follow a specific format). Some work was done on creating a settings configuration file, which can be used to limit the number of iterations and set the active vertices.
Today we worked on generating graphs using GTGraph, parsing the data to our program and running our Graphicionado BFS implementation on the graph.
Yesterday we noticed that the BFS algorithm did not perform the actions we expected. After some testing on really small graphs we found several errors in our code, so we spent several hours finding solutions to them. We still have one part hardcoded for BFS where we set all vertex properties to a high VertexProperty value, except for the starting node which starts with VertexProperty value 0. We also started writing some code that will test the Graphicionado runs automatically (it currently includes some tests to confirm the data organization in memory).
One group member made sure to generate several graphs of different sizes using GTGraph, so now we can run the Graphicionado implementation on graphs from 5 vertices up to over 10 000 vertices.
During the implementation of PageRank we noticed some bugs in our code that was not obvious when running BFS and SSSP. For example, the generated graphs sets the vertex ID's from 1 up to the number of vertices in the graph, while we have assumed the ID's start from 0 (which we use to index several arrays). This gave some buggy behaviour when linking vertex ID's to edges.
We fixed the buggy indexing from last week by shifting the vertex ID's to start from 0 instead of 1. The graph analysis with BFS and SSSP still worked after the fix, so we continued trying to implement PageRank. The PageRank algorithm is a bit trickier to fully understand, so we are not sure if we have implemented it correctly at this moment.
Some of the group members started working on the Graphicionado parallelization, which appears to be tricky.
Some refactoring was performed in the sequential program. Most of the group members read up on how the Graphicionado parallelization should be performed, and how the sequential version needs to be changed.
Some group members have worked on parallelizing the sequential implementation, where we took two different approaches. One with dynamic vectors and one with two-dimensional arrays. It is still not entirely clear how the people behind the Graphicionado report implemented the parallelization.
We had a short meeting with Magnus where he showed how we can submit jobs to Tintin (UPPMAX). He submitted our sequential implementation of Graphicionado as a job for two nodes, which worked fine. This guide might come in handy: http://www.uppmax.uu.se/support/user-guides/slurm-user-guide/
After the last Friday meeting we had some guidelines that we wrote down in a separate document (https://docs.google.com/document/d/1wvg3D8VrWTSQ-PsHyfDHxCzA56EFLQoFlYyIutptn2A). During the week we have worked on implementing the parallel version of Graphicionado (using two dimensional arrays), running our code on Tintin using the devel nodes and we have started implementing graph slicing.
We had some issues running the parallel Graphicionado code on Tintin due to bad allocs, which was most likely caused by not having enough memory for the Argo nodes.