Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduces secondary sort by date #790

Merged
merged 1 commit into from
Feb 13, 2024
Merged

Conversation

JPrevost
Copy link
Member

@JPrevost JPrevost commented Feb 13, 2024

Suggested queries to understand this change better:

Run this one in the PR build and dev1 pipeline playgrounds to confirm this sort change does not affect relevance ranked searches:

{
  search(searchterm: "train station", from: "0", index: "all-current") {
    hits
    records {
      score
      timdexRecordId
      title
      dates {
        kind
        value
      }
    }
  }
}

Run this one in the PR build and dev1 pipeline playgrounds to confirm this sort change does switches to date based sorting for searches in which there is no relevance difference (aka spatial searches with no additional arguments):

{
  search(
    geodistance: {latitude: 42.361145, longitude: -71.057083, distance: "2mi"}
    from: "0"
    index: "all-current"
  ) {
    hits
    records {
      timdexRecordId
      title
      score
      dates {
        kind
        value
      }
    }
  }
}

Why are these changes being introduced:

  • spatial searches with no other arguments do not sort by relevance because all results are equal 1.0 score matches
  • introducing a secondary sort by date desc will allow more consistent and predictable serach result ordering (previously local and deployed indexes were not seeing the same result ordering)

Relevant ticket(s):

How does this address that need:

  • explicitly adds the sorting by score as the primary sort. This is the default behavior, but we need to introduce it explicitly in order to add the secondary sort
  • adds secondary sort by dates.value.as_date descending

Document any side effects to this change:

  • There should be none as any scored results will still take precedent. This has been confirmed by running the same searches locally against dev1 with this change and comparing them the results prior to this change

Developer

  • All new ENV is documented in README
  • All new ENV has been added to Heroku Pipeline, Staging and Prod
  • ANDI or Wave has been run in accordance to
    our guide and
    all issues introduced by these changes have been resolved or opened as new
    issues (link to those issues in the Pull Request details above)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Requires database migrations?

NO

Includes new or updated dependencies?

NO

@jazairi jazairi self-assigned this Feb 13, 2024
@jazairi
Copy link
Contributor

jazairi commented Feb 13, 2024

@JPrevost I confirmed that this change doesn't affect results that are ranked by relevance, and I confirmed that the sorting does seem to sort by date with the second sample query provided.

One question I have is how OpenSearch handles multiple dates with this kind of sort. It seems like it just sorts on the first date available? I think that's fine for now, but post-MVP, we may want to consider refining to certain date types if possible, assuming stakeholders have a preference. For records with multiple dates that span a wide range, it's not always clear to me which should be sorted on. This record is a decent example of this kind of ambiguity (I'm sure there are much weirder ones out there).

@JPrevost
Copy link
Member Author

@jazairi I think it sorts the dates within the record by whatever our sort request was (in this case DESC) and uses that as "the data" for the record. I've paged through a bit and it feels like the records are in date order desc by that logic.

It's possible for GIS data we could identify a date type that is most useful (coverage vs creation), but that would not likely remain constant across sources so I'm not sure there's a better "date order" for a generic use case than this. That said, I absolutely agree we should put more time post-MVP into confirming if this is a useful order and adjusting if not.

Why are these changes being introduced:

* spatial searches with no other arguments do not sort by relevance
  because all results are equal 1.0 score matches
* introducing a secondary sort by date desc will allow more consistent
  and predictable serach result ordering (previously local and deployed
  indexes were not seeing the same result ordering)

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/GDT-185

How does this address that need:

* explicitly adds the sorting by score as the primary sort. This is the
  default behavior, but we need to introduce it explicitly in order to
  add the secondary sort
* adds secondary sort by `dates.value.as_date` descending

Document any side effects to this change:

* There should be none as any scored results will still take precedent.
  This has been confirmed by running the same searches locally against
  dev1 with this change and comparing them the results prior to this
  change
@JPrevost JPrevost force-pushed the gdt-185-sort_by_score_then_date branch from a613ace to e1cdcbd Compare February 13, 2024 20:15
@JPrevost JPrevost temporarily deployed to timdex-pr-790 February 13, 2024 20:15 Inactive
@jazairi
Copy link
Contributor

jazairi commented Feb 13, 2024

Okay, that sort behavior is consistent with what I'm seeing. And yeah, I agree that finding a generically useful date type is going to be tricky...

@JPrevost JPrevost merged commit 3c9db10 into main Feb 13, 2024
4 checks passed
@JPrevost JPrevost deleted the gdt-185-sort_by_score_then_date branch February 13, 2024 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants