Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot filter by parentID in lucene query with GroupedOr #6316

Closed
jaandrews opened this issue Sep 9, 2019 · 8 comments
Closed

Cannot filter by parentID in lucene query with GroupedOr #6316

jaandrews opened this issue Sep 9, 2019 · 8 comments

Comments

@jaandrews
Copy link
Contributor

I'm trying to make a lucene query that does a search for child nodes beneath specific pages with the option for additional filtering options [here[(https://github.com/Shazwazza/Examine/wiki/Grouped-Operations). Unfortunately, this does not return any results, even though the query looks correct.

Previously I had been using multiple parentId calls, which works fine until other conditions are involved. I need the result to match at least one of the parentID's, which requires grouping.

Reproduction

Bug summary

GroupedOr examine query fails to find any matches in the ExternalIndex, even though the same query done with query done with the ParentId helper does return results. These do generate different queries "parentID:1234" vs "(parentID:[1234 TO 1234])," though I'm not sure why the structure used by the ParentId is needed, as the former query looks fine to me.

Specifics

Umbraco: 8.1.3

Steps to reproduce

  1. Create a node.
  2. Create a node underneath that node.
  3. Do a search of the ExternalIndex with only GroupedOr(new[] { "parentID" }, new[] { "{parentNodeId}" }).
  4. Check results.

Expected result

The search result includes the one child node of the first node created.

Actual result

No matches were found.

@Shazwazza
Copy link
Contributor

ParentId is indexed as numeric data. Fields that are of a numeric type must be queried with a range query or using the lucene numeric apis. This is why you see the generated query for parent ID using a range query when using the Examine APIs. Examine does not support numeric data fields using the methods like GroupedOr.

Examine 1.0 support nested query operations so you should be able to accomplish what you need without using the Grouped methods.

I haven't had time to update the Examine docs for v1.0 so unfortunately i can't point you to any docs, but i can point you to some code: https://github.com/Shazwazza/Examine/blob/master/src/Examine.Test/Search/FluentApiTests.cs#L1810

Also note that when you use Examine APIs to query a field like Field("parentID", 123) it will detect the value type being passed in and use a numeric query on that field. There are lots of examples in those FluentApiTests, even stuff on ParentID: https://github.com/Shazwazza/Examine/blob/master/src/Examine.Test/Search/FluentApiTests.cs#L53, https://github.com/Shazwazza/Examine/blob/master/src/Examine.Test/Search/FluentApiTests.cs#L191, https://github.com/Shazwazza/Examine/blob/master/src/Examine.Test/Search/FluentApiTests.cs#L144,

@jaandrews
Copy link
Contributor Author

So just to be clear, I have to generate a raw query that looks like the following?

+(parentID:[1234 TO 1234] parentID:[1235 TO 1235])

This is why I was using GroupedOr, as I needed to require the query to match one of a set of ids. I don't think I can use Field because of this requirement. It seems I will need to use the NativeQuery helper, though I have had issues in the past where it modifies my queries (mostly moving around where the parenthesis are). I still need to test whether that happens for this particular query though. Will post an update once I've done so.

@jaandrews
Copy link
Contributor Author

I'm still running into issues making this query. I'm able to generate a raw query with the form

+(parentID:[1178 TO 1178])

which is then compiled to the below query when run through the NativeQuery helper

(+parentID:[1178 TO 1178])

For queries with multiple ids, it looks like

+(parentID:[1100 TO 1100] parentID:[1257 TO 1257] parentID:[1259 TO 1259])

and

(+(parentID:[1100 TO 1100] parentID:[1257 TO 1257] parentID:[1259 TO 1259]))

Both of the generated queries look like they should work. Unfortunately I get 0 results with these queries. It's not a problem of data missing from the index, as I do get results when I run the query with the ParentId helper, so I'm at a bit of a loss as to what's going wrong.

@Shazwazza
Copy link
Contributor

Hi,

So just to be clear, I have to generate a raw query that looks like the following?

No, that's not what i mentioned above. I said

Examine 1.0 support nested query operations so you should be able to accomplish what you need without using the Grouped methods.

The examples I pointed you too use nested query operations like .Group(g => syntax which uses strongly typed syntax.

That said you can also try just using raw queries. Can post the code that you are actually using? I'd suggest you use Luke to debug your queries and index to see what is going on, you need version 1.0.1 from here https://code.google.com/archive/p/luke/downloads, also there's both inclusive and exclusive syntax for range queries, see https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Range%20Searches

You can also debug the output of the strongly typed syntax by doing a ToString() on the IQuery generated and it will show you the compiled raw query being used.

@jaandrews
Copy link
Contributor Author

jaandrews commented Sep 18, 2019

You are correct that the examples you will work for the query I gave. The reason I was using a raw query was that I was also using weights in the queries I wrote. I tested it without that and your suggestion did work when I leave those out.

I am running into one last issue though with this approach. I have two groups in the query that both need to be matched for an entry to be included in the results. Below is my current code.

if (!_examineManager.TryGetIndex(config.ExamineIndex, out var index)) {
    throw new ArgumentException($"No index found with name \"{config.ExamineIndex}\"");
}
var searcher = index.GetSearcher();
var criteria = searcher.CreateQuery(defaultOperation: BooleanOperation.Or);
IBooleanOperation result = null;
var pos = 0;
int limit;
var query = "";
if (config.LimitToParents != null && config.LimitToParents.Any()) {
    limit = config.LimitToParents.Count();
    result = criteria.Group(group => {
        var first = config.LimitToParents.First();
        var innerQuery = group.RangeQuery<int>(new[] { "parentID" }, first, first);
        foreach (var parentId in config.LimitToParents.Skip(1)) {
            innerQuery = innerQuery.Or().RangeQuery<int>(new[] { "parentID" }, parentId, parentId);
        }
        return innerQuery;
    });
}
if (config.AllowedContentTypes != null && config.AllowedContentTypes.Any()) {
    pos = 1;
    limit = config.AllowedContentTypes.Count();
    foreach (var type in config.AllowedContentTypes) {
        if (result == null) {
            result = criteria.NodeTypeAlias(type);
        }
        else {
            result = result.Or().NodeTypeAlias(type);
        }
        pos++;
    }
}
pos = 0;
if (config.Filters != null & config.Filters.Any()) {
    var start = result != null ? result.And() : criteria;
    result = start.Group(group => {
        var first = config.Filters.First();
        var innerQuery = BuildGroup(group, config.Filters.First(), config.UseAndCriteria);
        foreach (var filter in config.Filters.Skip(1)) {
            innerQuery = BuildGroup(innerQuery, filter, config.UseAndCriteria);
        }
        return innerQuery;
    });
}
return result.Execute();

LimitToParents is just a list of parent ids, while AllowedContentTypes is a list of document type aliases. My current problem is that the line

var start = result != null ? result.And() : criteria;

isn't working as expected. The resulting query doesn't make both groups required, so the results include data that only matches one of the groups when I want all of them to match both. I tried adding "+" in front of both groups in luke, Here's an example query that returns data.

((parentID:[1100 TO 1100]) (parentID:[1257 TO 1257]) (parentID:[1259 TO 1259])) (_categories:\"umb document/a3bf27740d0741528824ca6e2cefd81b\")

and one that does not where I added "+" in front of both groups.

+((parentID:[1100 TO 1100]) (parentID:[1257 TO 1257]) (parentID:[1259 TO 1259])) +(_categories:\"umb document/a3bf27740d0741528824ca6e2cefd81b\")

Here is a screenshot of the results I get in luke from the former query.
lucene-result

My aim is to filter out the industry node as it doesn't match any of the parent ids. Also, for some reason I don't get any results in luke when I run the following query

((parentID:[1100 TO 1100]) (parentID:[1257 TO 1257]) (parentID:[1259 TO 1259]))

though umbraco does return the expected results for that particular query (might just be a configuration difference, didn't see anything obvious beyond the analyzer configuration).

Any ideas?

@jaandrews
Copy link
Contributor Author

Forgot to mention that this code is just running against the built in ExternalIndex.

@Shazwazza
Copy link
Contributor

Sorry for the delay, So there's 2 issues here and I figured out your first issue where raw range queries don't work with numerical values, see Shazwazza/Examine#133, have a working prototype of that one but still need more fixing. Lets keep that discussion on the Examine tracker since this is nothing to do with Umbraco's code.

As for your other issue, if you could post this as a separate question on the Examine repo, i can see if i can find time to help but.

And yes, in luke, you cannot run range queries on numerically stored values, it's the same problem and luke will not be able to turn those into numerical range queries.

@jaandrews
Copy link
Contributor Author

K. I've created a new issue report for the issue I've run into with the "And()" method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants