GitHub

#LOMI - Enrich Linked Open Data (DBpedia) with Microdata

This project was created during a one-year project at the Universitiy of Mannheim. There has been efforts to Common Crawl, a project which provides an "open repository of web crawl data that can be accessed and analyzed by anyone". This data is used by the Web Data Commons project to extract Schema.org data in N-Quads format.

The main startup classes are located under "com.maximilian_boehm.lod.main". Ideally, you will need to assign at least 6 GB to the JVM to get results. The program is separated in three phases. Phase 1 is the deduper (A0_Deduper.java) which finds instance with multiple occurences and reduces the occurences to a single one. In phase 2, the transformer (A1_Transformer.java) transforms the instances from the Schema.org-Vocabulary to the dbpedia ontology. And finally in phase 3, the instance matcher (A2_InstanceMatcher.java) finds corresponding matches between data from the web and dbpedia.

See also my blog post for further explanations.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.settings		.settings
lib		lib
src/com/maximilian_boehm/lod		src/com/maximilian_boehm/lod
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

maxboehm/lomi

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages