Skip to content

Extracting Wikipedia tables into CSV (basic skeleton for testing/benchmarking solutions)

Notifications You must be signed in to change notification settings

acherm/wikipediamatrix-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Wikipedia Matrix (benchmark)

Extracting Wikipedia tables into CSV files (basic skeleton for testing/benchmarking solutions). Once the git is cloned:

cd wikimatrix 
mvn test

We give 300+ Wikipedia URLs and the challenge is to:

  • integrate the extractors' code (HTML and Wikitext)
  • extract as many relevant tables as possible
  • serialize the results into CSV files (within output/html and output/wikitext)

More details can be found in BenchTest.java. We are expecting to launch mvn test and the results will be in output folder

About

Extracting Wikipedia tables into CSV (basic skeleton for testing/benchmarking solutions)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages