Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slop Query - Term Order sometimes changes result set #2580

Closed
inzanez opened this issue Feb 19, 2025 · 3 comments
Closed

Slop Query - Term Order sometimes changes result set #2580

inzanez opened this issue Feb 19, 2025 · 3 comments

Comments

@inzanez
Copy link

inzanez commented Feb 19, 2025

Describe the bug

  • Queried "term1 term2"~2 and compared the result to "term2 term1"~2
  • The two queries returned a different result set
  • That they return the same result set

Which version of tantivy are you using?
Latest master, but older commits show the same behavior

To Reproduce

Based on the phrase_prefix_searchsample:

use tantivy::collector::TopDocs;
use tantivy::query::QueryParser;
use tantivy::schema::*;
use tantivy::{doc, Index, IndexWriter, ReloadPolicy, Result};
use tempfile::TempDir;

fn main() -> Result<()> {
    let index_path = TempDir::new()?;

    let mut schema_builder = Schema::builder();
    schema_builder.add_text_field("title", TEXT | STORED);
    schema_builder.add_text_field("body", TEXT);
    let schema = schema_builder.build();

    let title = schema.get_field("title").unwrap();
    let body = schema.get_field("body").unwrap();

    let index = Index::create_in_dir(&index_path, schema)?;

    let mut index_writer: IndexWriter = index.writer(50_000_000)?;

    index_writer.add_document(doc!(
    title => "The Old Man and the Sea",
    body => "He was an old man who fished alone in a skiff in the Gulf Stream and he had gone \
            eighty-four days now without taking a fish.",
    ))?;

    index_writer.add_document(doc!(
    title => "Of Mice and Men",
    body => "A few miles south of Soledad, the Salinas River drops in close to the hillside \
            bank and runs deep and green. The water is warm too, for it has slipped twinkling \
            over the yellow sands in the sunlight before reaching the narrow pool. On one \
            side of the river the golden foothill slopes curve up to the strong and rocky \
            Gabilan Mountains, but on the valley side the water is lined with trees—willows \
            fresh and green with every spring, carrying in their lower leaf junctures the \
            debris of the winter’s flooding; and sycamores with mottled, white, recumbent \
            limbs and branches that arch over the pool"
    ))?;

    // Multivalued field just need to be repeated.
    index_writer.add_document(doc!(
    title => "Frankenstein",
    title => "The Modern Prometheus",
    body => "You will rejoice to hear that no disaster has accompanied the commencement of an \
             enterprise which you have regarded with such evil forebodings.  I arrived here \
             yesterday, and my first task is to assure my dear sister of my welfare and \
             increasing confidence in the success of my undertaking."
    ))?;

    index_writer.commit()?;

    let reader = index
        .reader_builder()
        .reload_policy(ReloadPolicy::OnCommitWithDelay)
        .try_into()?;

    let searcher = reader.searcher();

    let query_parser = QueryParser::for_index(&index, vec![title, body]);

    let query = query_parser.parse_query("\"the has\"~3")?;
    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;

    let query = query_parser.parse_query("\"has the\"~3")?;
    let top_docs2 = searcher.search(&query, &TopDocs::with_limit(10))?;
    assert_eq!(top_docs.len(), top_docs2.len());

    Ok(())
}

I could deliver larger data sets where the difference becomes a lot more obvious if required,...

@fulmicoton
Copy link
Collaborator

That is the specification though.

@inzanez
Copy link
Author

inzanez commented Feb 20, 2025

@fulmicoton
Could you point me to that spec? Trying to understand what is happening here,…all I can find easily available does not explain this it seems.

@inzanez
Copy link
Author

inzanez commented Feb 20, 2025

@fulmicoton Ok, think I found the details. Transposition,...I will check that out further but I think that explains the things I am seeing :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants