Releases: LSmyrnaios/PublicationsRetriever
Releases · LSmyrnaios/PublicationsRetriever
1.2
1.1
Release notes:
- Extract more fulltext urls.
- Avoid crawling/scrapping the pages which have a "restricted" status for their full-text.
- Add support for compressed page content.
- Improve the assignment of "wasUrlValid" and "couldRetry" status for each result.
- Many bug fixes and performance improvements.
- Various quality-of-life improvements.
Full Changelog: v.1.0-stable...1.1
v.1.0-stable
Release notes:
- Added new inteligent algorithm to extract the docUrls faster.
- Added multi-thread support.
- Added support for extracting dataset-urls.
- Improved domain-blocking and path-blocking algorithms.
- Improved metaDocUrls-extraction.
- Auto-detect the contentType from the response-body, if it's not provided.
- Improve Document-mime-type detection.
- Added handling of "special" domains which need custom url-string manipulation in order to retrieve the docUrls.
- Improved errors-handling and handle the "SIGINT"-signal.
- Many bug fixes and performance improvements.
- Improved statistics.
- Added a bash-script to run the example written in the README.
- Added / Improved tests.
- Enhanced logging.
- Updated dependencies.
v0.3-beta
Release notes:
- New command-line-arguments system.
- Bug fixes and optimizations.
- Code refactored.
- Simplified code-testing (make use of JUnit5).
- Added run-examples.
- Updated dependencies.
Compile as described in README.
Run with: ``java -jar doc_urls_retriever-0.3.jar arg1:'-downloadDocFiles' arg2:'-firstDocFileNum' arg3:'NUM' arg4:'-docFilesStorage' arg5:'storageDir' < stdIn:'inputJsonFile' > stdOut:'outputJsonFile'``
v0.2-beta
This release aims to improve performance and stability, as well as to introduce a new feture.
This new feture is the ability to not only find the docUrls (as it was happening till now), but also to download the full-text-documents directly.
Changes:
- Option to download and store the full-text-documents is now available.
- Improved harvesting speed and overall stability.
-> Use custom crawling engine for increased space & time efficiency.
-> Optimize pretty much everyting.
-> Bug fixes all around. - Heavily improved the M.L.A. by every aspect.
- Updated dependencies.
v0.1-beta
Initial pre-release.