Skip to content

Releases: LSmyrnaios/PublicationsRetriever

1.2

08 Nov 11:02
26da14c
Compare
Choose a tag to compare

Release notes:

  • Extract more fulltext urls.
  • Refactor checks for unwanted urls and apply them in more places.
  • Add integration with Jenkins CI and Nexus Maven Repository.
  • Many bug fixes and performance improvements.
  • Various quality-of-life improvements.

Full Changelog: 1.1...1.2

1.1

24 Jan 14:05
Compare
Choose a tag to compare
1.1

Release notes:

  • Extract more fulltext urls.
  • Avoid crawling/scrapping the pages which have a "restricted" status for their full-text.
  • Add support for compressed page content.
  • Improve the assignment of "wasUrlValid" and "couldRetry" status for each result.
  • Many bug fixes and performance improvements.
  • Various quality-of-life improvements.

Full Changelog: v.1.0-stable...1.1

v.1.0-stable

29 May 11:34
Compare
Choose a tag to compare

Release notes:

  • Added new inteligent algorithm to extract the docUrls faster.
  • Added multi-thread support.
  • Added support for extracting dataset-urls.
  • Improved domain-blocking and path-blocking algorithms.
  • Improved metaDocUrls-extraction.
  • Auto-detect the contentType from the response-body, if it's not provided.
  • Improve Document-mime-type detection.
  • Added handling of "special" domains which need custom url-string manipulation in order to retrieve the docUrls.
  • Improved errors-handling and handle the "SIGINT"-signal.
  • Many bug fixes and performance improvements.
  • Improved statistics.
  • Added a bash-script to run the example written in the README.
  • Added / Improved tests.
  • Enhanced logging.
  • Updated dependencies.

v0.3-beta

07 Mar 16:27
Compare
Choose a tag to compare

Release notes:

  • New command-line-arguments system.
  • Bug fixes and optimizations.
  • Code refactored.
  • Simplified code-testing (make use of JUnit5).
  • Added run-examples.
  • Updated dependencies.

Compile as described in README.
Run with: ``java -jar doc_urls_retriever-0.3.jar arg1:'-downloadDocFiles' arg2:'-firstDocFileNum' arg3:'NUM' arg4:'-docFilesStorage' arg5:'storageDir' < stdIn:'inputJsonFile' > stdOut:'outputJsonFile'``

v0.2-beta

06 Jul 15:33
Compare
Choose a tag to compare

This release aims to improve performance and stability, as well as to introduce a new feture.
This new feture is the ability to not only find the docUrls (as it was happening till now), but also to download the full-text-documents directly.

Changes:

  • Option to download and store the full-text-documents is now available.
  • Improved harvesting speed and overall stability.
    -> Use custom crawling engine for increased space & time efficiency.
    -> Optimize pretty much everyting.
    -> Bug fixes all around.
  • Heavily improved the M.L.A. by every aspect.
  • Updated dependencies.

v0.1-beta

03 Mar 02:48
Compare
Choose a tag to compare
v0.1-beta Pre-release
Pre-release

Initial pre-release.