IR2018Project1

Scholarly Documents Parse and Search

Authors: 刘泽辰罗璨杨鸿瑞徐源
Address: CUFE http://ie.cufe.edu.cn/

About our project:

You can refer to project1说明.txt or http://miner.picp.net/members/Yan/teaching/IR2016Spring.html

Download 1000 papers in PDF format: http://miner.picp.net/members/Yan/teaching/download/IR2016Spring/IR/oriPDFs.zip

About our files:

task1: contains a .py file,which is used to extract fulltext & metadata info from scholarly documents and the results are saved in xml files.

task2: contains four java classes,which are used to build the inverted index with the information you have got in the task1('IndexBuild.java') and search different fields with queries('IndexSearch.java',you can check the search result from the console immediately after it is run.）

task3: contains a maven project,implimenting a simple web search interface to show your search result.The small search engine provides the users with a list of results after a query is submitted and the links about the authors' personal information.

More resources `for reference only` :

About grobid:

https://grobid.readthedocs.io/en/latest/
http://url.cn/5ao9EeN

About lucene:

http://url.cn/5Kxt2h2

About front-end realized by HTML5+CSS+JS:

http://www.w3school.com.cn/html/index.asp
http://www.w3school.com.cn/css/index.asp
http://www.w3school.com.cn/js/index.asp

About maven:

https://www.cnblogs.com/eagle6688/p/7838224.html

About back-end of the website implemented by Tomcat+J2EE+JSP:

https://blog.csdn.net/cyz1151148946/article/details/76691976/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IR2018Project1

About our project:

About our files:

More resources `for reference only` :

About grobid:

About lucene:

About front-end realized by HTML5+CSS+JS:

About maven:

About back-end of the website implemented by Tomcat+J2EE+JSP:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
task1		task1
task2		task2
task3		task3
README.md		README.md
project1说明.txt		project1说明.txt

ZechenLiu/IR2018Project1

Folders and files

Latest commit

History

Repository files navigation

IR2018Project1

About our project:

About our files:

More resources for reference only :

About grobid:

About lucene:

About front-end realized by HTML5+CSS+JS:

About maven:

About back-end of the website implemented by Tomcat+J2EE+JSP:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

More resources `for reference only` :

Packages