Skip to content

Latest commit

 

History

History
76 lines (60 loc) · 1.81 KB

README.md

File metadata and controls

76 lines (60 loc) · 1.81 KB

IR2018Project1

Scholarly Documents Parse and Search

Authors: 刘泽辰 罗璨 杨鸿瑞 徐源
Address: CUFE http://ie.cufe.edu.cn/


About our project:



You can refer to project1说明.txt or http://miner.picp.net/members/Yan/teaching/IR2016Spring.html

Download 1000 papers in PDF format: http://miner.picp.net/members/Yan/teaching/download/IR2016Spring/IR/oriPDFs.zip


About our files:



task1: contains a .py file,which is used to extract fulltext & metadata info from scholarly documents and the results are saved in xml files.

task2: contains four java classes,which are used to build the inverted index with the information you have got in the task1('IndexBuild.java') and search different fields with queries('IndexSearch.java',you can check the search result from the console immediately after it is run.)

task3: contains a maven project,implimenting a simple web search interface to show your search result.The small search engine provides the users with a list of results after a query is submitted and the links about the authors' personal information.


More resources for reference only :



About grobid:

https://grobid.readthedocs.io/en/latest/
http://url.cn/5ao9EeN



About lucene:


http://url.cn/5Kxt2h2

About front-end realized by HTML5+CSS+JS:

http://www.w3school.com.cn/html/index.asp
http://www.w3school.com.cn/css/index.asp
http://www.w3school.com.cn/js/index.asp



About maven:


https://www.cnblogs.com/eagle6688/p/7838224.html

About back-end of the website implemented by Tomcat+J2EE+JSP:


https://blog.csdn.net/cyz1151148946/article/details/76691976/