AMDMproject

Algorithmic methods for data mining course project.

Before you run

Install the requirements listed in the requirements.txt file. If you have pip you can run.

pip install -r requirements.txt

How to use

python main.py <data_set> <alg_num> <dec_alg> <n_ki> <nt>

Where:
<data_set> = Data set (test, ptest or comp)
<alg_num> = Algorithm number (1-4)
<dec_alg> = decomposition algorithm (eigsh or lobpcg)
<n_ki> = unumber of k-means++ iterations (1 ->)
<nt> = number of threads to parallelize the k-means++ iterations (1 ->)

Algorithm param <alg_num>

Takes values:
1 = unnormalized spectral partitioning with k-means
2 = normalized spectral partitioning with k-means
3 = unnormalized sparse spectral partitioning with custom nk-means++
4 = normalized sparse spectral partitioning with custom nk-means++

normalized and unnormalized refers to how the laplacian and eigenvalue decomposition is calculated.

nk-means++ is a custom k-means++ algorithm which runs k-means n times and returns the clustering with the best objective function.

Examples

Either 2 or all five parameters have to be specified.

If only two parameters a specified k-means++ will be run 10 times with 4 threads.

Run competition files with algorithm 4

python main.py comp 4

Run test files with algorithm 3

python main.py test 3

Run private test files with algorithm 2

python main.py ptest 2

Run single file with algorithm 3

Graph has to be in the graphs directory

python main.py ca-AstroPh.output 3

Run competition files with algorithm 4 with 10 k-means++ iterations parallelized to 10 threads.

python main.py comp 4 10 10

Reproduce best results

Depending on the parameters and the fact the best graph partition is found semi-randomly, the best results might not be reproduced every time. So to reproduce the outputs for the competition run the following script.

python run_best.py <data_set>

Where:
<data_set> = Data set (test or comp)

Example:

python run_best.py comp

Best results

AstroPh: custom sparse algo1 - 40.667
Seed: 95504
CondMat: custom sparse algo1 - 124
Seed: 31882
HepPh: custom sparse algo1 - 14.25
Seed: 748819
HepTh: custom sparse algo1 - 5.6
Seed: 380854
GrQc: custom sparse algo2 - 0.075
Seed: 834113
Oregon: custom sparse algo2 - 2.242
Seed: 952417
soc-Epinions: custom sparse algo2 - 1.571
Seed: 987761
web-NotreDame: custom sparse algo2 - 0.262
Seed: 361760
roadNet-CA: custom sparse algo2 - 0.379
Seed: 826647

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AMDMproject

Before you run

How to use

Algorithm param <alg_num>

Examples

Run competition files with algorithm 4

Run test files with algorithm 3

Run private test files with algorithm 2

Run single file with algorithm 3

Run competition files with algorithm 4 with 10 k-means++ iterations parallelized to 10 threads.

Reproduce best results

Best results

Files

README.md

Latest commit

History

README.md

File metadata and controls

AMDMproject

Before you run

How to use

Algorithm param <alg_num>

Examples

Run competition files with algorithm 4

Run test files with algorithm 3

Run private test files with algorithm 2

Run single file with algorithm 3

Run competition files with algorithm 4 with 10 k-means++ iterations parallelized to 10 threads.

Reproduce best results

Best results