xml-parsing

What is this

This repo comes from a school project.We were assigned to create a program that, using a xml parser of our choosing, would process and retrieve statistical information from Wikipedia xml snapshots. There are two projects: one developed in Java and another in C. They do the same task but were evaluated on different programming styles and mindsets. Generally speaking, the C version is,on average, twice as fast and the Java version is more readable and generally easier to expand on.

What does it do

Run the main program('program' in C-project and 'run.sh' in Java-project) and pass it any number of xml files. As is, the currently implemented interface code displays all the funcionalities available:

Total number of articles
Unique number of articles
Unique number of revisions
Top 10 contributors
Top 20 largest articles (by number of bytes)
Top N articles with more words
List of articles with a certain prefix
Given an article id, retrieve title.
Given an article id and revision id, retrieve timestamp
Given an user id, retrieve contributer name

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
c_project		c_project
java_project		java_project
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xml-parsing

What is this

What does it do

About

Releases

Packages

Languages

paulo-caldas/wikipedia-parser

Folders and files

Latest commit

History

Repository files navigation

xml-parsing

What is this

What does it do

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages