Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document querying #237

Merged
merged 1 commit into from
Nov 10, 2023
Merged

Add document querying #237

merged 1 commit into from
Nov 10, 2023

Conversation

Kiyoshika
Copy link
Member

@Kiyoshika Kiyoshika commented Nov 7, 2023

This implements a Query builder to pass to DocumentHandle.query() method to traverse the tree and return all node ID(s) that match the predicate. This also supports a FindFirst and FindAll search type. The searching of the tree terminates early on FindFirst.

I need to add some notes in the actual query.rs file explaining more of its usage, but given a Query instance, a node must match ALL conditions provided in the query (i.e., each condition is treated as an AND condition) in order to be returned.

I also want to add a simple extension DocumentHandle.query_multiple (or something along those lines) which would take a Vec<Query> list of predicates and check if a node matches at least one in order to be returned (to support OR-like conditionals.)

@Kiyoshika
Copy link
Member Author

I believe I have everything ready now for the first iteration of document querying. Squashed all commits during development into one.

@Kiyoshika Kiyoshika marked this pull request as ready for review November 9, 2023 23:26
@Kiyoshika Kiyoshika requested a review from a team November 9, 2023 23:28
let root_id = self.get().get_root().id;
node_stack.push(root_id);

while !node_stack.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while let Some(current_node_id) = node_stack.pop() {
  // ...
}

Copy link
Member Author

@Kiyoshika Kiyoshika Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented this suggestion, also moved self.get() outside the loop which probably has minimal overhead but it didn't really make sense to be inside anyways

let query = Query::new().contains_attribute("style").find_all();
let found_ids = doc.query(&query).unwrap();
assert_eq!(found_ids.len(), 3);
assert_eq!(found_ids, [div_id_2, div_id_3, p_id_4]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR: eventually we might be able to simplify these tests and omit the explicit tree construction:

let input = r#"
<div>
  <div id="myid" style="somestyle">
    <p title="hey">
      <p>
        <div style="otherstyle" id="otherid">
          <p>
            <p title="yo" style="cat">
"#;
let doc = Document::parse_str(input).unwrap();

let query = Query::new().contains_attribute("style").find_all();
let found_ids = doc.query(&query).unwrap();
assert_eq!(found_ids, [div_id_2, div_id_3, p_id_4]);
// ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice. Manually constructing the document is pretty annoying but it’s what we have for now

src/html5/parser.rs Show resolved Hide resolved
@Kiyoshika Kiyoshika merged commit 2755f41 into main Nov 10, 2023
4 checks passed
@Kiyoshika Kiyoshika deleted the feat/tree-traversal branch November 10, 2023 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants