IR2018Project1

Scholarly Documents Parse and Search

Authors: 刘泽辰罗璨杨鸿瑞徐源
Address: CUFE http://ie.cufe.edu.cn/

About our project:

You can refer to project1说明.txt or http://miner.picp.net/members/Yan/teaching/IR2016Spring.html

Download 1000 papers in PDF format: http://miner.picp.net/members/Yan/teaching/download/IR2016Spring/IR/oriPDFs.zip

About our files:

task1: contains a .py file,which is used to extract fulltext & metadata info from scholarly documents and the results are saved in xml files.

task2: contains four java classes,which are used to build the inverted index with the information you have got in the task1('IndexBuild.java') and search different fields with queries('IndexSearch.java',you can check the search result from the console immediately after it is run.）

task3: contains a maven project,implimenting a simple web search interface to show your search result.The small search engine provides the users with a list of results after a query is submitted and the links about the authors' personal information.

More resources `for reference only` :

About grobid:

https://grobid.readthedocs.io/en/latest/
http://url.cn/5ao9EeN

About lucene:

http://url.cn/5Kxt2h2

About front-end realized by HTML5+CSS+JS:

http://www.w3school.com.cn/html/index.asp
http://www.w3school.com.cn/css/index.asp
http://www.w3school.com.cn/js/index.asp

About maven:

https://www.cnblogs.com/eagle6688/p/7838224.html

About back-end of the website implemented by Tomcat+J2EE+JSP:

https://blog.csdn.net/cyz1151148946/article/details/76691976/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

IR2018Project1

About our project:

About our files:

More resources `for reference only` :

About grobid:

About lucene:

About front-end realized by HTML5+CSS+JS:

About maven:

About back-end of the website implemented by Tomcat+J2EE+JSP:

Files

README.md

Latest commit

History

README.md

File metadata and controls

IR2018Project1

About our project:

About our files:

More resources for reference only :

About grobid:

About lucene:

About front-end realized by HTML5+CSS+JS:

About maven:

About back-end of the website implemented by Tomcat+J2EE+JSP:

More resources `for reference only` :