Skip to content

ZechenLiu/IR2018Project1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IR2018Project1

Scholarly Documents Parse and Search

Authors: 刘泽辰 罗璨 杨鸿瑞 徐源
Address: CUFE http://ie.cufe.edu.cn/


About our project:



You can refer to project1说明.txt or http://miner.picp.net/members/Yan/teaching/IR2016Spring.html

Download 1000 papers in PDF format: http://miner.picp.net/members/Yan/teaching/download/IR2016Spring/IR/oriPDFs.zip


About our files:



task1: contains a .py file,which is used to extract fulltext & metadata info from scholarly documents and the results are saved in xml files.

task2: contains four java classes,which are used to build the inverted index with the information you have got in the task1('IndexBuild.java') and search different fields with queries('IndexSearch.java',you can check the search result from the console immediately after it is run.)

task3: contains a maven project,implimenting a simple web search interface to show your search result.The small search engine provides the users with a list of results after a query is submitted and the links about the authors' personal information.


More resources for reference only :



About grobid:

https://grobid.readthedocs.io/en/latest/
http://url.cn/5ao9EeN



About lucene:


http://url.cn/5Kxt2h2

About front-end realized by HTML5+CSS+JS:

http://www.w3school.com.cn/html/index.asp
http://www.w3school.com.cn/css/index.asp
http://www.w3school.com.cn/js/index.asp



About maven:


https://www.cnblogs.com/eagle6688/p/7838224.html

About back-end of the website implemented by Tomcat+J2EE+JSP:


https://blog.csdn.net/cyz1151148946/article/details/76691976/


About

Scholarly Documents Parse and Search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published