Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Z39.50 resultset sort should support numeric sorting #37

Closed
minusdavid opened this issue Jun 20, 2022 · 14 comments
Closed

Z39.50 resultset sort should support numeric sorting #37

minusdavid opened this issue Jun 20, 2022 · 14 comments

Comments

@minusdavid
Copy link

It looks like the "z39.50" ZOOM sort strategy only supports lexicographic sorting.

I was able to get numeric sorting working with the "type7" sort strategy according to "3.2.1. Zebra Extension Embedded Sort Attribute (type 7)" on https://software.indexdata.com/zebra/doc/querymodel-zebra.html#querymodel-zebra-attr-sorting, but I need to use the ZOOM::ResultSet::sort method from https://metacpan.org/pod/ZOOM#sort() to do the sorting so that won't work...

Am I missing something or is this just a flaw in the "z39.50" sort strategy?

@minusdavid
Copy link
Author

We worked around the problem by zero padding the integers going into the sort register so that the lexicographic sort will "do the right thing" anyway.

But it would be great if the yaz sort spec could take a numeric flag so that we wouldn't have to do that workaround.

@MikeTaylor
Copy link
Contributor

MikeTaylor commented Jun 23, 2022 via email

@minusdavid
Copy link
Author

I've tried with both ZOOM::ResultSet::sort() and yaz-client's sort() functions. Haven't tried the ZOOM::Query::sortby function. That's interesting. We typically use ZOOM::Query::PQF or ZOOM::Query::CCL2RPN and then sort the result set. I quite like the idea of ZOOM::Query::sortby instead as that should be more efficient. Koha's already used ZOOM::ResultSet::sort() for many years, so that's what I need to work with, but interesting...

In any case, the sort() we do is "1=12 >i" or "1=12 <i".

unix:/var/run/koha/kohadev/bibliosocket
base biblios
querytype prefix
find e
sort 1=12 <i

Even though 1=12 is an integer (in Koha), it sorts lexicographically. We've worked around it by zero-padding that integer so that the lexicographic sort will work.

--

Actually, I think ZOOM::Query::sortby wouldn't work for us either because the sort spec still can't define a numeric sort when using the PQF attributes which I think is the only option with the ZOOM module.

I was able to get these PQF searches working but they were hand-made from scratch:

Ascending order:
find @or @attr 1=1016 e @attr 7=1 @attr 1=Local-Number @attr 4=109 0

Descending order:
find @or @attr 1=1016 e @attr 7=2 @attr 1=Local-Number @attr 4=109 0

@MikeTaylor
Copy link
Contributor

MikeTaylor commented Oct 11, 2022 via email

@minusdavid
Copy link
Author

My apologies. As far as I can tell, I've described exactly what I'm attempting above in detail. What part is unclear?

  • I perform a search in Zebra using either yaz-client or the Perl ZOOM modules.
  • I send a "sort" command using yaz-client's "sort" or ZOOM::ResultSet->sort
  • Instead of doing a descending title sort like "1=4 >i", we're trying to do a an ascending local number sort using "1=12 <i"
  • The local number is always a positive integer
  • Instead of sorting 1,2,3,4,5,6,7,9,10 it will sort 1,10,2,3,4,5,6,7,8,9

It makes sense that a lot of Zebra/YAZ sorting would be lexicographic string sorting, but there are use cases where we want to sort numerically instead

Our workaround was to zero pad the local number so that it sorts like 01,02,03,04,05,06,07,08,09,10

I was thinking it would be useful if the sort spec had a flag like "n" to denote a numeric sort so that we wouldn't have to use the zero padding workaround.

@MikeTaylor
Copy link
Contributor

My apologies for letting this get dropped on the floor.

You are quite correct that yaz-client's sort command, ZOOM::ResultSet->sort and ZOOM::Query::sortby all use the "1=4 >i 1=21 >s" sort-specification syntax, and that there is no way in that syntax to express that you want numeric rather than lexicographic sorting.

You are also correct that the obvious way to fix this would be to support the use of n among the other flags (i, s, <, >). It would be perfectly reasonable to file an issue at https://github.com/indexdata/yaz/issues requesting this enhancement.

In the mean time, of course, your zero-padding workaround is a functional if inelegant way to get the behaviour you want.

Or you can find a way to use one of the other two ways of expressing sorting within YAZ. (Yes, it's a shame there are three different ways -- this is a historical accident, and I don't think could really have been avoided.) These options are:

  • @attr 7=1 @attr 1=Local-Number @attr 4=109 0 in your PQF query, as you noted
  • sortby Local-Number/cql.number in a CQL query, though that would require you to rewrite the main part of the query from PQF to CQL and I have not tested that it does what we expect it to.

Sorry that it's taken so long to reply, and that the reply is not really satisfactory.

@MikeTaylor
Copy link
Contributor

CC @adamdickmeiss

@adamdickmeiss
Copy link
Contributor

adamdickmeiss commented Oct 12, 2022

You must specify @attr 4=109 in query . See example: test/api/test_sort1.c ..

@MikeTaylor
Copy link
Contributor

Right, but @minusdavid's question is about how to communicate the equivalent of @attr 4=109 in a YAZ sort specification of the "1=4 >i 1=21 >s" kind.

@adamdickmeiss
Copy link
Contributor

There is no numeric sort flag in Z39.50 sort. But you can pass multiple attribute pairs, as in 1=4,4=109 <

@MikeTaylor
Copy link
Contributor

Ah, perfect!

So, @minusdavid, it seems you can likely get what you want using 1=12,4=109 >i.

Please let us know if it works.

(@adamdickmeiss, we should mention this in the YAZ docs.)

@minusdavid
Copy link
Author

Legends!

Thanks very much, Adam and Mike. That's done it!

I had no idea it was possible to pass multiple attribute pairs. Including examples of that in the following would be great:
https://metacpan.org/pod/ZOOM
https://software.indexdata.com/yaz/doc/yaz-client.html#sortspec

@minusdavid
Copy link
Author

I was wondering if @adamdickmeiss had any thoughts on #35 as well? In databases with 1,000,000+ records, we've had to stop using Zebra facets as stop words in the original search query like "the" cause the response time to blow out from 2 seconds to 60+ seconds.

@adamdickmeiss
Copy link
Contributor

Legends!

Thanks very much, Adam and Mike. That's done it!

I had no idea it was possible to pass multiple attribute pairs. Including examples of that in the following would be great:

indexdata/yaz@94e8f9a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants