Skip to content

nitot/graphee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graphee 📈

Graphee is a light-weight code written in C++ to compute graph properties, such as: PageRank, TrustRank, harmonic centrality, etc...

What's the purpose of Graphee?

In order to improve search engine results, one has compute some values from the graph of the web. Actually, some famous works are known like the PageRank or the TrustRank algorithms. They were booth indispensable to make their search engines pertinent and spam proof.

However, with time the graph of the web has become wider and wider. It clearly contains more than a dozen billion vertices and at least ten times more edges. Some, has chosen to persist in full-RAM computations and using distributed framework as Hadoop or Spark. While, some other approaches as GraphChi or m-flash are using serialization methods, but both limited by uint32_t for the vertex ids.

Graphee's innovations

Thus, we propose to develop software able to handle uint64 and still performing calculations on a single laptop (sometimes with an external hard-drive for wide graphs).

The graph is converted into an adjacency matrix and then saved with the fully-tested (since 70's) Compressed Sparse Row (CSR) matrix format. The serialization is done by dividing vectors into slices, and matrices into blocks, with sizes not exceeding the RAM-limit of the computer.

However, nowadays the graph of the web is so wide that one cannot restart calculations from scratch. The preferred solution would be using delta files, saving only differences between two graph snapshots. But the CSR format is unfriendly with insertions and deletions. So we propose to implement the Dynamic CSR matrix format, which presents high speedups for the matrix modifications.

Also a binding of Graphee with Python is under development !

Efficiency

Technically we use pthread and OpenMP technologies for the parallel calculations on CPUs. The full support or CUDA standards is one of the major properties to be included soon !

Examples & functionalities

NOTE: For now on, the code contains all the needed tools for the PageRank. But I need to write a small script to DL few test data. I'm on my way...

Contributing

Please first read CONRTIBUTING.md and propose what you want or you can fix or add functionalities detailed within TODO.md.

For any questions, comments, or collaborations, please use: n.martin [at] qwantresearch [dot] com.

References

About

Billion-vertex graph on a laptop

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published