Scholarly Documents Parse and Search
Authors: 刘泽辰 罗璨 杨鸿瑞 徐源
Address: CUFE http://ie.cufe.edu.cn/
You can refer to project1说明.txt or http://miner.picp.net/members/Yan/teaching/IR2016Spring.html
Download 1000 papers in PDF format: http://miner.picp.net/members/Yan/teaching/download/IR2016Spring/IR/oriPDFs.zip
task1: contains a .py file,which is used to extract fulltext & metadata info from scholarly documents and the results are saved in xml files.
task2: contains four java classes,which are used to build the inverted index with the information you have got in the task1('IndexBuild.java') and search different fields with queries('IndexSearch.java',you can check the search result from the console immediately after it is run.)
task3: contains a maven project,implimenting a simple web search interface to show your search result.The small search engine provides the users with a list of results after a query is submitted and the links about the authors' personal information.
https://grobid.readthedocs.io/en/latest/
http://url.cn/5ao9EeN
http://url.cn/5Kxt2h2
http://www.w3school.com.cn/html/index.asp
http://www.w3school.com.cn/css/index.asp
http://www.w3school.com.cn/js/index.asp
https://www.cnblogs.com/eagle6688/p/7838224.html
https://blog.csdn.net/cyz1151148946/article/details/76691976/