Multi Core Tree Edit Distance

In shared-memory multi-core machines the layout of data in memory, dependency of computation threads and overhead of multithreading can affect efficiency of programs. In such systems, adding more processors increases the communication load on the shared data-structures in memory which may lead to performance degradation if they aren’t considered properly. We designed a multi-core scalable algorithm for computing the so-called tree edit distance. The tree edit distance is a measure that determines similarity of tree-structured data and is defined as the minimum cost sequence of edit operations (deletion, insertion, rename) that transforms one tree into the other. For this purpose, we take advantages of multi-core systems by considering abovementioned issues, to decrease the execution-time of tree edit distance computation targeting Zhang-Shasha algorithm [1].

Execution

How to compile:

clang++ -O3 -std=c++11 TreeSim.cpp -lpthread -o mcted

How to execute:

./mcted -h

Methodology

Dependencies Detection:

To compute the distance between two trees, all the subtrees rooted at each of key-root nodes should be compared. The computation of the similarity for key-root pairs may be dependent on the similarity of other pairs appearing in decedent subtrees. Therefore, at each step of computation key-root pairs that can be executed independently are identified and distributed on existing parallel processing units.

Dependencies between key-root nodes

At each step, for identification of independent jobs we use a data-structure that maintains dependencies between subtrees rooted at each of key-root nodes. Each key-root pair depends on all of its decedent key-roots and it can be computed after computation of all of those pairs. A concurrent pool is used to manage all these independent sub-problems. At each instance of time this pool contains all possible independent sub-problems which can be executed in parallel.

Concurrent pool:

In the parallel algorithm all of the threads may want to access to the pool of independent jobs at the same time. Thus, it may reduce the performance of the parallel algorithm if it is not designed well. In order to prevent racing between threads which are executing concurrently, we need to come up with a mechanism for synchronization. Considering the pool as a single queue could be simple but in that case threads which try to modify head or tail elements need to access those elements serially. We employ the idea of distributed queues as a concurrent pool [3].

Thread pool

Overhead of Synchronization:

Different jobs in the pool may have different amount of work to do. This depends on the input tree size. Due to the overhead of synchronization for concurrently executed threads, for small jobs the amount of overhead is larger than the gain that we can get from parallel computing.

Experiments

Results

References

[1] Zhang, Kaizhong, and Dennis Shasha. "Simple fast algorithms for the editing distance between trees and related problems." SIAM journal on computing 18.6 (1989): 1245-1262.
[2] tree-edit-distance.dbresearch.uni-salzburg.at
[3] Haas et al. "Distributed queues in shared memory: multicore performance and scalability through quantitative relaxation." Proceedings of the ACM International Conference on Computing Frontiers. ACM, 2013.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
docs		docs
input		input
src		src
LICENSE		LICENSE
README.md		README.md
TreeSim.cpp		TreeSim.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi Core Tree Edit Distance

Execution

How to compile:

How to execute:

Methodology

Dependencies Detection:

Concurrent pool:

Overhead of Synchronization:

Experiments

References

Author

About

Releases

Packages

Languages

License

aabyaneh/MCTED

Folders and files

Latest commit

History

Repository files navigation

Multi Core Tree Edit Distance

Execution

How to compile:

How to execute:

Methodology

Dependencies Detection:

Concurrent pool:

Overhead of Synchronization:

Experiments

References

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages