Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: binary search tree for attributes #9

Merged
merged 7 commits into from
Feb 11, 2025
Merged

feat: binary search tree for attributes #9

merged 7 commits into from
Feb 11, 2025

Conversation

HideBa
Copy link
Owner

@HideBa HideBa commented Feb 1, 2025

This change is Reviewable

Summary by CodeRabbit

  • New Features

    • Added an optional command-line option to specify attribute indices during data serialization, enabling advanced filtering and more customizable feature queries.
    • Enhanced feature querying capabilities with improved attribute-based filtering, offering more accurate and efficient data retrieval.
    • Introduced new methods for querying attributes based on specified conditions.
    • Added support for attribute indexing in the header writing process.
    • Refined spatial selection parameters for HTTP-based queries to improve feature selection.
    • Introduced new functionality for reading and processing CityJSON data.
    • Added benchmarking for reading FCB files with and without attribute indexing.
    • Implemented a new struct for managing attribute indices and updated the header and feature writing processes accordingly.
  • Chores

    • Streamlined logging behavior and updated build configurations for a cleaner and more efficient development experience.

Copy link

coderabbitai bot commented Feb 1, 2025

Walkthrough

This pull request integrates comprehensive attribute indexing capabilities across multiple components. A new AttributeIndex struct is added to the FlatBuffers schema, with corresponding updates in header generation. New modules and traits for byte serialization, sorting, and querying have been introduced in the bst package. The CLI now accepts an optional attribute index parameter, and both writer and reader modules in fcb_core have been extended to build, serialize, and query attribute indices. Several tests and configuration files are updated to validate and support these changes.

Changes

File(s) Change Summary
src/fbs/header.fbs
src/rust/fcb_core/src/fb/header_generated.rs
Added new struct AttributeIndex and integrated an attribute_index field into Header. Updated field offsets, verification methods, and deprecated old constants.
src/rust/Cargo.toml
src/rust/bst/Cargo.toml
src/rust/fcb_core/Cargo.toml
Added new workspace member "bst" and introduced multiple dependencies (bincode, chrono, ordered-float, once_cell, local bst path) to enhance functionality.
src/rust/bst/src/byte_serializable.rs
src/rust/bst/src/lib.rs
src/rust/bst/src/query.rs
src/rust/bst/src/sorted_index.rs
Introduced ByteSerializable trait, query module with Query, QueryCondition, MultiIndex, and implemented a SortedIndex with binary search and serialization support.
src/rust/cli/src/main.rs Updated the Ser command variant and serialize signature to accept an optional attr_index argument for attribute indexing.
src/rust/fcb_core/src/bin/read.rs
src/rust/fcb_core/src/bin/write.rs
src/rust/fcb_core/src/reader/attr_query.rs
src/rust/fcb_core/src/reader/mod.rs
src/rust/fcb_core/src/writer/attr_index.rs
src/rust/fcb_core/src/writer/attribute.rs
src/rust/fcb_core/src/writer/feature_writer.rs
src/rust/fcb_core/src/writer/header_writer.rs
src/rust/fcb_core/src/writer/mod.rs
src/rust/fcb_core/src/writer/serializer.rs
Extended writer and reader modules to support attribute index building, serialization, and queries. New methods and structures (e.g., select_attr_query, AttributeFeatureOffset, AttributeIndexInfo) added, along with updates to header options and feature offset management.
src/rust/fcb_core/tests/attr_index.rs
src/rust/fcb_core/tests/e2e.rs
src/rust/fcb_core/tests/http.rs
src/rust/fcb_core/tests/read.rs
New and updated tests to validate attribute indexing, query operations, updated bounding box parameters, and in-memory CityJSON processing.
src/rust/makefile
src/rust/wasm/src/lib.rs
Updated Makefile to reference package fcb_wasm instead of wasm, and removed logging initialization from the wasm module.
src/rust/fcb_core/src/fb/feature_generated.rs Deprecated several constant definitions and consolidated derive attributes, streamlining verification logic in feature generation.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant C as CLI (Ser Command)
    participant FW as FcbWriter
    participant HW as HeaderWriter
    participant S as Serializer

    U->>C: Invoke serialize with attr_index option
    C->>FW: Pass input file, output file, and attr_index list
    FW->>HW: Build header_options (including attribute_indices)
    HW->>S: Call to_fcb_header with attribute_indices_info
    S->>HW: Return header with AttributeIndex info
    HW->>FW: Provide header buffer
    FW->>FW: Write attribute index buffer & sorted feature buffer
    FW->>U: Output serialized FCB file
Loading
sequenceDiagram
    participant U as User
    participant FR as FcbReader
    participant MQ as MultiIndex
    participant Q as Query Engine

    U->>FR: Request attribute query with conditions
    FR->>FR: Read header with attribute_index information
    FR->>MQ: Build MultiIndex from attribute index entries
    MQ->>Q: Process Query with conditions (equality, range, etc.)
    Q->>MQ: Return matching offsets
    MQ->>FR: Provide query result (FeatureIter)
    FR->>U: Deliver matching features
Loading

Poem

I’m a little rabbit with ears so keen,
Hoping over fields of code so green.
With attribute indexes now in place,
My data hops with elegance and grace.
Carrots and bytes, a joyous mix—
In our code garden, everything clicks!
🥕🐇 Happy coding under the springtime fix!


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

deepsource-io bot commented Feb 1, 2025

Here's the code health analysis summary for commits b151642..0e6bb00. View details on DeepSource ↗.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource Rust LogoRust❌ Failure
❗ 46 occurences introduced
🎯 16 occurences resolved
View Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

@HideBa HideBa marked this pull request as ready for review February 9, 2025 19:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (37)
src/rust/fcb_core/tests/read.rs (4)

16-22: Consider more explicit error context for file reading.
Currently, a simple "?" error propagation is used (adequate in a test), but for clarity, you could add more context to the error message (e.g., which file failed).


24-33: Consider using named constants for repeated attribute identifiers.
The hard-coded strings (e.g., "b3_h_dak_50p") might be better extracted as constants or configuration values for maintainability.


34-44: Document or make the node size configurable.
The choice of node_size=16 may warrant a comment explaining the rationale or a configuration option for different datasets.


46-48: Potential performance enhancement for feature insertion.
If there's a large number of features, consider adding a bulk insertion method to minimize repeated calls.

src/rust/bst/src/query.rs (5)

5-14: Consider extending operator support.
The current operators cover basic comparisons. You may later introduce “Between” or “In” for more advanced querying.


16-25: Typed keys could improve clarity.
Storing keys as bytes is flexible, but type-safe keys can reduce parse runtime errors and clarify usage.


27-31: Potential for OR/NOT conditions.
Right now, conditions are AND-combined. If you need OR or complex filtering in the future, consider a more expressive query structure.


57-105: Optimize candidate set intersection.
Currently, all offsets are collected before intersecting. For large datasets, consider sorting conditions by expected match size or using streaming intersection to improve performance.


114-123: Review necessity of final sorting.
Sorting the final intersection is O(n log n). If order is not strictly required, you could skip or make it optional.

src/rust/bst/src/sorted_index.rs (5)

9-13: Consider memory strategies for storing offsets.
If many entries share the same key, you might investigate more advanced data structures to reduce memory usage.


27-39: Sorting approach could be documented more clearly.
You're using a simple comparison-based sort. If stable ordering among equals matters, consider a stable sort (e.g., sort_by with stable sort logic) or mention if duplicates are expected.


57-100: Half-open range queries may require extra clarity.
The interval [lower, upper) might confuse callers if they expect inclusive upper bounds. Documenting or exposing an inclusive-range variant is helpful.


102-161: Serialization versioning or checksums recommended.
Consider embedding version fields or checksums to detect corruption or incompatible formats.


163-189: Guard against deserialization failures for AnyIndex.
When converting bytes to type T, parse errors could occur. Currently, these may silently fail. Consider stronger error handling or a fallback logic.

src/rust/bst/src/byte_serializable.rs (2)

112-119: Avoid panicking on invalid UTF-8 for String.

The current approach uses String::from_utf8(...).unwrap(), which panics if the byte slice is not valid UTF-8. Consider returning a Result or handling the error gracefully.

 fn from_bytes(bytes: &[u8]) -> Self {
-    String::from_utf8(bytes.to_vec()).unwrap()
+    // A safer approach returning a default or an error:
+    String::from_utf8(bytes.to_vec()).unwrap_or_default()
 }

4-11: Return a Result from from_bytes to handle potential conversion errors.

All from_bytes methods currently return the type itself, possibly panicking or producing incorrect data on invalid inputs. Consider changing the trait to return a Result to gracefully handle malformed byte slices.

Also applies to: 117-118, 215-231

src/rust/fcb_core/src/reader/attr_query.rs (2)

19-135: Efficient attribute query implementation.

The approach of selectively deserializing only indexes used in the query is optimal. You might consider partial skipping of any irrelevant index bytes to save I/O if the file layout permits.


138-263: Avoid code duplication across query methods.

select_attr_query_seq and select_attr_query share much logic. Consider extracting a shared internal function that both can call to reduce repetition and potential maintenance overhead.

src/rust/bst/src/lib.rs (3)

20-70: Consider extracting record definitions into a helper function.
Repeatedly defining the same structure for test data (ID, city, height, year) can be error-prone and clutter tests. Extracting a helper function to generate these records would make tests more readable and maintainable.


74-120: Reduce boilerplate in building various index entries.
Inserting data into each index (ID, city, height, year) follows a similar pattern of checking for an existing key-value and appending offsets or creating a new entry. Consider extracting a reusable function that handles this routine logic and reduces repetition.


289-349: Potential improvement for error handling in serialization.
All serialization calls use “unwrap()” without handling potential I/O or deserialization errors. Consider replacing “unwrap()” with error propagation (e.g., “? operator”) or custom error handling to maintain robust error reporting.

src/rust/fcb_core/src/writer/attr_index.rs (1)

310-312: Avoid using println! for logging.
Using “println!” in a library context may cause undesired output in client applications. Consider using a logging crate (e.g., log or tracing) or returning an error if needed.

src/rust/fcb_core/src/reader/mod.rs (1)

501-505: Consider unifying item_filter and item_attr_filter logic.
If both filters exist, the iteration logic might become fragmented. Coordinating feature offsets (R-Tree filter) and attribute offsets (item_attr_filter) can get complicated. Consider a unified filter approach or a consistent fallback order.

src/rust/fcb_core/src/writer/attribute.rs (2)

8-8: Consider clarifying the doc comment further.
While the comment helpfully explains the High-Level concept of the schema, you might also enumerate or briefly mention the tuple structure (index, type) within the comment for quick reference.


202-216: Check performance implications for large datasets.
This function iterates over city objects to build index entries. If the number of objects is very large, consider potential optimizations (e.g., streamed or parallel processing). Otherwise, this is a neat approach to generate typed index entries.

src/rust/fcb_core/tests/http.rs (1)

8-12: Document the significance of these coordinates.

The hardcoded coordinates appear to be specific geographic coordinates. Consider adding a comment explaining what area these coordinates represent and why these specific values were chosen for the test.

+    // Coordinates representing <specific area>
     let minx = 84227.77;
     let miny = 445377.33;
     let maxx = 85323.23;
     let maxy = 446334.69;
src/rust/fcb_core/src/bin/write.rs (1)

27-33: Make attribute indices configurable.

The attribute names are currently hardcoded. Consider making these configurable through command-line arguments or a configuration file for better flexibility.

Example approach:

use clap::{App, Arg};

// In main()
let matches = App::new("FCB Writer")
    .arg(Arg::new("attr-indices")
        .long("attr-indices")
        .help("Comma-separated list of attributes to index")
        .takes_value(true))
    .get_matches();

let attr_indices = matches.value_of("attr-indices")
    .map(|s| s.split(',').map(String::from).collect::<Vec<_>>());
src/rust/fcb_core/src/writer/header_writer.rs (2)

19-20: Improve documentation for attribute indices.

The documentation comment for attribute_indices_info could be more descriptive, explaining the purpose and structure of the attribute indices.

-    /// Attribute indices
+    /// Information about indexed attributes, including their names and corresponding feature offsets.
+    /// This is used to build a binary search tree for efficient attribute-based queries.
     pub(super) attribute_indices_info: Option<Vec<AttributeIndexInfo>>,

30-31: Add detailed documentation for attribute indices option.

The documentation for attribute_indices in HeaderWriterOptions should explain its purpose and usage.

-    /// Attribute indices
+    /// List of attribute names to be indexed for efficient querying.
+    /// When provided, binary search trees will be built for these attributes.
     pub attribute_indices: Option<Vec<String>>,
src/rust/fcb_core/src/writer/feature_writer.rs (1)

27-32: Add documentation for AttributeFeatureOffset struct.

The new struct lacks documentation explaining its purpose and fields.

+/// Stores information about attribute offsets within a feature.
+///
+/// This struct is used to track the position and size of attribute data
+/// within the serialized feature, along with any index entries for
+/// efficient attribute-based querying.
 #[derive(Clone, PartialEq, Debug)]
 pub(super) struct AttributeFeatureOffset {
+    /// The offset of the attribute data within the serialized feature
     pub(super) offset: usize,
+    /// The size of the attribute data in bytes
     pub(super) size: usize,
+    /// Index entries for attributes that need to be indexed
     pub(super) index_entries: Vec<AttributeIndexEntry>,
 }
src/rust/fcb_core/tests/attr_index.rs (3)

46-46: Consider parameterizing the test with different attribute combinations.

The test currently uses hardcoded attribute names. Consider parameterizing the test to cover different combinations of attributes.

-let attr_indices = vec!["b3_h_dak_50p".to_string(), "identificatie".to_string()];
+#[test]
+fn test_attr_index_with_params() -> Result<()> {
+    for attrs in vec![
+        vec!["b3_h_dak_50p"],
+        vec!["identificatie"],
+        vec!["b3_h_dak_50p", "identificatie"],
+    ] {
+        let attr_indices = attrs.into_iter().map(String::from).collect::<Vec<_>>();
+        // ... rest of the test logic
+    }
+    Ok(())
+}

81-88: Consider adding early return for performance.

The loop continues checking features even after reaching feat_count. Consider adding an early return.

 while let Ok(Some(feat_buf)) = reader.next() {
     let feature = feat_buf.cur_cj_feature()?;
     deserialized_features.push(feature);
     feat_num += 1;
-    if feat_num >= feat_count {
-        break;
-    }
+    if feat_num == feat_count {
+        return Ok(());
+    }
 }

91-107: Simplify attribute verification logic.

The nested loops and multiple flag variables make the code harder to follow. Consider using any or all combinators.

-let mut contains_b3_h_dak_50p = false;
-let mut contains_identificatie = false;
-for co in feature.city_objects.values() {
-    if co.attributes.is_some() {
-        let attrs = co.attributes.as_ref().unwrap();
-        if let Some(b3_h_dak_50p) = attrs.get("b3_h_dak_50p") {
-            if b3_h_dak_50p.as_f64().unwrap() > 2.0 {
-                contains_b3_h_dak_50p = true;
-            }
-        }
-        if let Some(identificatie) = attrs.get("identificatie") {
-            if identificatie.as_str().unwrap() == "NL.IMBAG.Pand.0503100000012869" {
-                contains_identificatie = true;
-            }
-        }
-    }
-}
+let contains_b3_h_dak_50p = feature.city_objects.values().any(|co| {
+    co.attributes.as_ref().and_then(|attrs| {
+        attrs.get("b3_h_dak_50p").and_then(|v| v.as_f64())
+    }).map_or(false, |v| v > 2.0)
+});
+
+let contains_identificatie = feature.city_objects.values().any(|co| {
+    co.attributes.as_ref().and_then(|attrs| {
+        attrs.get("identificatie").and_then(|v| v.as_str())
+    }).map_or(false, |v| v == "NL.IMBAG.Pand.0503100000012869")
+});
src/fbs/header.fbs (1)

63-66: Consider adding documentation comments.

The new AttributeIndex struct lacks documentation explaining its purpose and field meanings.

+// Represents an index entry for an attribute in the binary search tree
 struct AttributeIndex {
+  // Zero-based index of the attribute in the attributes array
   index: ushort;
+  // Length of the attribute data in bytes
   length: uint;
 }
src/rust/cli/src/main.rs (1)

34-36: Add example usage in help text.

Consider adding an example in the help text to show the expected format of the attribute list.

 /// Comma-separated list of attributes to create index for
+/// Example: --attr-index "height,width,depth"
 #[arg(long)]
 attr_index: Option<String>,
src/rust/fcb_core/tests/e2e.rs (2)

115-149: Remove debug print statements.

These debug print statements should be removed as they're marked with "remove these lines later".

-            // ===============remove these lines later=================
-            println!(
-                "is CityObject same? {:?}",
-                orig_co == des_feat.city_objects.get(id).unwrap()
-            );
-
-            println!(
-                "is attribute same======? {:?}",
-                orig_co.attributes == des_feat.city_objects.get(id).unwrap().attributes
-            );
-            if orig_co.attributes != des_feat.city_objects.get(id).unwrap().attributes {
-                println!("  attributes======:");
-
-                let orig_attrs = orig_co.attributes.as_ref().unwrap();
-                let des_attrs = des_feat
-                    .city_objects
-                    .get(id)
-                    .unwrap()
-                    .attributes
-                    .as_ref()
-                    .unwrap();
-                if orig_attrs.is_object() && des_attrs.is_object() {
-                    for (key, value) in orig_attrs.as_object().unwrap() {
-                        let des_value = des_attrs.get(key);
-                        if des_value.is_none() {
-                            println!("  key not found: {:?}", key);
-                        } else if value != des_value.unwrap() {
-                            println!("  key: {:?}", key);
-                            println!("    original: {:?}", value);
-                            println!("    deserialized: {:?}", des_value.unwrap());
-                        }
-                    }
-                }
-            }
-            // ===============remove these lines later=================

181-186: Address TODO comment.

There's a TODO comment about implementing attributes that should be addressed.

Would you like me to help implement the attribute comparison logic mentioned in the TODO comment?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b151642 and 40b4e96.

📒 Files selected for processing (28)
  • src/fbs/header.fbs (1 hunks)
  • src/rust/Cargo.toml (2 hunks)
  • src/rust/bst/Cargo.toml (1 hunks)
  • src/rust/bst/src/byte_serializable.rs (1 hunks)
  • src/rust/bst/src/lib.rs (1 hunks)
  • src/rust/bst/src/query.rs (1 hunks)
  • src/rust/bst/src/sorted_index.rs (1 hunks)
  • src/rust/cli/src/main.rs (4 hunks)
  • src/rust/fcb_core/Cargo.toml (1 hunks)
  • src/rust/fcb_core/src/bin/read.rs (0 hunks)
  • src/rust/fcb_core/src/bin/write.rs (1 hunks)
  • src/rust/fcb_core/src/fb/feature_generated.rs (1 hunks)
  • src/rust/fcb_core/src/fb/header_generated.rs (10 hunks)
  • src/rust/fcb_core/src/http_reader/mod.rs (0 hunks)
  • src/rust/fcb_core/src/reader/attr_query.rs (1 hunks)
  • src/rust/fcb_core/src/reader/mod.rs (14 hunks)
  • src/rust/fcb_core/src/writer/attr_index.rs (1 hunks)
  • src/rust/fcb_core/src/writer/attribute.rs (2 hunks)
  • src/rust/fcb_core/src/writer/feature_writer.rs (5 hunks)
  • src/rust/fcb_core/src/writer/header_writer.rs (6 hunks)
  • src/rust/fcb_core/src/writer/mod.rs (8 hunks)
  • src/rust/fcb_core/src/writer/serializer.rs (4 hunks)
  • src/rust/fcb_core/tests/attr_index.rs (1 hunks)
  • src/rust/fcb_core/tests/e2e.rs (1 hunks)
  • src/rust/fcb_core/tests/http.rs (1 hunks)
  • src/rust/fcb_core/tests/read.rs (1 hunks)
  • src/rust/makefile (1 hunks)
  • src/rust/wasm/src/lib.rs (0 hunks)
💤 Files with no reviewable changes (3)
  • src/rust/fcb_core/src/bin/read.rs
  • src/rust/fcb_core/src/http_reader/mod.rs
  • src/rust/wasm/src/lib.rs
✅ Files skipped from review due to trivial changes (1)
  • src/rust/bst/Cargo.toml
🔇 Additional comments (35)
src/rust/fcb_core/tests/read.rs (3)

2-6: Imports look appropriate and comprehensive.
No issues observed with the newly introduced imports from fcb_core and standard I/O libraries.


50-52: Memory buffer usage is solid.
Writing to and then seeking in a memory buffer is a neat approach for in-memory testing. No issues found.


59-59: Bounding box query integration looks correct.
The method call to select and test bounding boxes is consistent with your approach.

src/rust/bst/src/query.rs (1)

33-36: MultiIndex definition is straightforward.
No issues noted; the design seems minimal and reusable.

src/rust/bst/src/sorted_index.rs (3)

5-6: Clear offset alias usage.
Using a type alias for offsets keeps code readable and flexible.


16-19: SortedIndex structure is well-organized.
Maintains a simple vector of KeyValue pairs, which is straightforward and efficient for typical use cases.


42-55: SearchableIndex trait is clear.
The trait-based approach allows for flexible query strategies. No issues noted here.

src/rust/fcb_core/src/writer/mod.rs (2)

3-41: Good integration of attribute indexing.

The addition of the attribute_index_entries HashMap, alongside tracking each feature’s offsets, is a clear and maintainable approach to building an index.


94-109: Validate final sorted offsets for features and attribute indices.

Because the code reorders features based on the Hilbert curve, ensure that node.offset usage matches the correct feature in feat_offsets. Also confirm that the final assignment of each attribute index entry matches the sorted user data.

Would you like a script to confirm offset correctness by comparing the final sorted_feature_buf layout with the recorded offsets?

Also applies to: 110-114, 145-149, 175-198, 200-220, 222-224, 225-235, 239-245

src/rust/bst/src/lib.rs (1)

18-70: Good test coverage for date-based queries.
The tests for date comparisons (e.g., year <= 2000-01-01, year between 1950 and 2010) offer strong coverage for chronological edge cases. It's great to see these tests ensuring correct filtering logic.

src/rust/fcb_core/src/reader/mod.rs (3)

120-130: Double-check rtree_index vs. attributes offset usage.
Here, “rtree_index: index_size” is set to the sum of both R-Tree and attribute indexes, while “attributes: self.attr_index_size()” only accounts for attribute index size. Ensure that the final offsets align with the intended meaning of each field for consistent file positioning.


164-180: Confirm correctness of skipping the attribute index within select_bbox_seq.
The code discards “attr_index_size” bytes before constructing FeatureOffset. Make sure the order of skipping R-Tree vs. attribute index is consistent across different methods, especially if you plan to combine bounding-box and attribute filters.


348-357: Ensure consistent offset updates when seeking with item_attr_filter.
In the Seekable iterator, when “item_attr_filter” is present, you jump to “self.feature_offset.total_size() + item_offset” on every iteration, then set “self.cur_pos = item_offset.” Verify these offset calculations won't skip or overlap any features, and that they align with how index data is laid out immediately after R-Tree data.

src/rust/fcb_core/src/writer/attribute.rs (2)

3-4: New imports seem appropriately utilized.
These imports (chrono::NaiveDateTime and cjseq::CityJSONFeature) appear necessary for the newly introduced functions dealing with date-time parsing and city JSON features.


189-200: Robust typed attribute handling.
The new AttributeIndexEntry enum covers the majority of data types used in this schema. Its typed approach is clear and maintainable. Just confirm that deliberately unsupported types (like Byte, UByte, etc.) are handled elsewhere or gracefully reported.

src/rust/fcb_core/src/writer/serializer.rs (5)

10-10: Importing AttributeIndex and related structs is valid.
These imports appear to integrate the new attribute indexing into the serializer mechanism seamlessly.


23-27: Struct name and fields are concise.
AttributeIndexInfo (index, length) is well-named. Confirm that “length” consistently refers to the same concept (e.g., byte length, item count, etc.) throughout the codebase.


43-43: Function signature extension for attribute indices.
Allowing an Option<&[AttributeIndexInfo]> provides flexibility and avoids overhead when no indices are needed. This approach is a good design choice.


50-60: Conditional creation of attribute index vector.
The logic for constructing the FlatBuffer vector only when attribute_indices_info is Some is straightforward and avoids unnecessary allocations.


119-119: Passing the new attribute_index to Header.
This ensures the attribute index information is properly serialized in the header. The change is consistent with the extended Header struct usage.

src/rust/fcb_core/src/fb/header_generated.rs (8)

157-157: Added Default derive for Vector.
Deriving Default is helpful for initialization, especially in scenarios that require zeroed or uninitialized placeholders. No issues here.


477-591: New AttributeIndex struct for FlatBuffers integration.
This struct adds first-class representation for attribute index data (index, length) in the FlatBuffer schema. Implementation looks solid.


1127-1145: Added VT_ATTRIBUTE_INDEX constants.
These offsets properly integrate the new attribute index field into the header’s FlatBuffer layout.


1268-1279: New getter method for attribute_index.
Exposing Option<flatbuffers::Vector<'a, AttributeIndex>> is consistent with how other optional fields are handled.


1486-1491: Verification step for attribute_index.
Ensures that any stored AttributeIndex data is structurally sound in the FlatBuffer. Helps prevent invalid data usage downstream.


1584-1584: Extending HeaderArgs with attribute_index.
Adding attribute_index to this argument struct further cements the field’s integration into the header.


1666-1674: Builder method to set attribute_index.
This method aligns well with FlatBuffers’ builder pattern, making it easy to attach the new field.


1830-1830: Expanded debug output with attribute_index.
Including attribute_index in debug statements aids traceability during debugging or logging.

src/rust/cli/src/main.rs (1)

98-103: 🏁 Script executed:

#!/bin/bash
# Description: Check if there's any validation of attribute names against schema
ast-grep --pattern 'fn validate_attribute_names($_, $_) {
  $$$
}'

Length of output: 68

src/rust/fcb_core/src/fb/feature_generated.rs (2)

10-59: LGTM! Appropriate deprecation notices for enum constants.

The deprecation notices for ENUM_MIN_CITY_OBJECT_TYPE, ENUM_MAX_CITY_OBJECT_TYPE, and ENUM_VALUES_CITY_OBJECT_TYPE are well-documented with clear migration guidance to use associated constants.


505-505: LGTM! Improved derive attribute consolidation.

The consolidation of derive attributes for the Vertex struct improves code readability.

src/rust/Cargo.toml (2)

7-7: LGTM! Added bst to workspace members.

The addition of "bst" to workspace members is consistent with the PR objectives for implementing a binary search tree.


31-34: LGTM! Added appropriate dependencies for binary search tree implementation.

The new dependencies are well-chosen for the implementation:

  • bincode: For efficient binary serialization
  • chrono: For date/time handling
  • ordered-float: For floating-point comparisons
  • once_cell: For lazy static initialization
src/rust/makefile (1)

10-10: LGTM! Updated clippy command with correct package name.

The clippy command now correctly uses fcb_wasm instead of wasm, maintaining consistency with the project's package naming.

src/rust/fcb_core/Cargo.toml (1)

31-33: LGTM! Added required dependencies for binary search tree functionality.

The new dependencies are appropriately configured:

  • bst: Local dependency for binary search tree implementation
  • ordered-float and chrono: Using workspace-level versions for consistency

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
src/rust/fcb_core/tests/read.rs (1)

83-91: Improve bbox intersection logic.

The current implementation might have edge cases where features are incorrectly included/excluded. Consider using a more robust intersection test.

-        // MEMO: it retrieves all features which has intersection with the bbox
-        if x_s.iter().any(|x| *x >= minx as i64)
-            || y_s.iter().any(|y| *y >= miny as i64)
-            || x_s.iter().any(|x| *x <= maxx as i64)
-            || y_s.iter().any(|y| *y <= maxy as i64)
+        // Check if any vertex is inside the bbox or if the feature's bounds intersect the bbox
+        let feat_min_x = x_s.iter().min().copied().unwrap_or(0);
+        let feat_max_x = x_s.iter().max().copied().unwrap_or(0);
+        let feat_min_y = y_s.iter().min().copied().unwrap_or(0);
+        let feat_max_y = y_s.iter().max().copied().unwrap_or(0);
+        if !(feat_max_x < minx as i64 || feat_min_x > maxx as i64 ||
+             feat_max_y < miny as i64 || feat_min_y > maxy as i64)
🧹 Nitpick comments (6)
src/rust/fcb_core/tests/read.rs (2)

24-31: Consider extracting attribute schema building into a helper function.

The attribute schema building logic is duplicated in both test functions. Extract it into a helper function to improve code reusability and maintainability.

+fn build_attr_schema(cj_seq: &CityJSONSeq) -> AttributeSchema {
+    let mut attr_schema = AttributeSchema::new();
+    for feature in cj_seq.features.iter() {
+        for (_, co) in feature.city_objects.iter() {
+            if let Some(attributes) = &co.attributes {
+                attr_schema.add_attributes(attributes);
+            }
+        }
+    }
+    attr_schema
+}

105-113: Add Debug trait implementation for NonSeekableCursor.

Implementing Debug would help with debugging and error messages.

+    #[derive(Debug)]
     struct NonSeekableCursor<T: AsRef<[u8]>> {
         inner: Cursor<T>,
     }
src/rust/fcb_core/tests/attr_index.rs (1)

92-110: Extract feature validation into a helper function.

The feature validation logic is duplicated and could be made more reusable.

+fn validate_feature(feature: &CityJSONFeature) -> Result<()> {
+    let mut contains_b3_h_dak_50p = false;
+    let mut contains_identificatie = false;
+    for co in feature.city_objects.values() {
+        if let Some(attrs) = &co.attributes {
+            if let Some(b3_h_dak_50p) = attrs.get("b3_h_dak_50p") {
+                if b3_h_dak_50p.as_f64().unwrap() > 2.0 {
+                    contains_b3_h_dak_50p = true;
+                }
+            }
+            if let Some(identificatie) = attrs.get("identificatie") {
+                if identificatie.as_str().unwrap() == "NL.IMBAG.Pand.0503100000012869" {
+                    contains_identificatie = true;
+                }
+            }
+        }
+    }
+    assert!(contains_b3_h_dak_50p);
+    assert!(contains_identificatie);
+    Ok(())
+}
src/rust/fcb_core/src/reader/attr_query.rs (1)

92-117: Consider using a builder pattern for query construction.

The query construction logic could be made more maintainable using a builder pattern.

+struct QueryBuilder {
+    conditions: Vec<QueryCondition>,
+}
+
+impl QueryBuilder {
+    fn new() -> Self {
+        Self {
+            conditions: Vec::new(),
+        }
+    }
+
+    fn add_condition(mut self, name: String, operator: Operator, value: ByteSerializableValue) -> Self {
+        self.conditions.push(QueryCondition {
+            field: name,
+            operator,
+            key: value.to_bytes(),
+        });
+        self
+    }
+
+    fn build(self) -> Query {
+        Query {
+            conditions: self.conditions,
+        }
+    }
+}
src/rust/fcb_core/src/reader/mod.rs (2)

50-56: Add documentation for FeatureOffset struct.

The struct lacks documentation explaining its purpose and the meaning of each field.

+/// Represents the offsets of different sections in the FCB file format.
+///
+/// # Fields
+///
+/// * `magic_bytes` - Size of the magic bytes section (typically 8 bytes)
+/// * `header` - Size of the header section
+/// * `rtree_index` - Size of the R-tree index section
+/// * `attributes` - Size of the attributes section
 #[doc(hidden)]
 pub(super) struct FeatureOffset {
     magic_bytes: u64,
     header: u64,
     rtree_index: u64,
     attributes: u64,
 }

299-328: Consider using a state machine pattern for feature iteration.

The feature iteration logic could be made clearer using a proper state machine pattern.

+#[derive(Debug, PartialEq, Eq)]
+enum IterationState {
+    Init,
+    ReadFirstFeatureSize,
+    Reading {
+        current_offset: u64,
+        feature_number: usize,
+    },
+    Finished,
+}
+
+impl IterationState {
+    fn advance(&mut self, new_offset: u64) {
+        if let Self::Reading { current_offset, feature_number } = self {
+            *current_offset = new_offset;
+            *feature_number += 1;
+        }
+    }
+}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40b4e96 and d4b21f0.

📒 Files selected for processing (4)
  • src/rust/fcb_core/src/reader/attr_query.rs (1 hunks)
  • src/rust/fcb_core/src/reader/mod.rs (15 hunks)
  • src/rust/fcb_core/tests/attr_index.rs (1 hunks)
  • src/rust/fcb_core/tests/read.rs (2 hunks)

Comment on lines 49 to 86
match col.type_() {
ColumnType::Int => {
let index = SortedIndex::<i32>::deserialize(&mut buffer.as_slice())?;
multi_index.add_index(col.name().to_string(), Box::new(index));
}
ColumnType::Long => {
let index = SortedIndex::<i64>::deserialize(&mut buffer.as_slice())?;
multi_index.add_index(col.name().to_string(), Box::new(index));
}
ColumnType::Float => {
let index = SortedIndex::<OrderedFloat<f32>>::deserialize(
&mut buffer.as_slice(),
)?;
multi_index.add_index(col.name().to_string(), Box::new(index));
}
ColumnType::Double => {
let index = SortedIndex::<OrderedFloat<f64>>::deserialize(
&mut buffer.as_slice(),
)?;
multi_index.add_index(col.name().to_string(), Box::new(index));
}
ColumnType::String => {
let index = SortedIndex::<String>::deserialize(&mut buffer.as_slice())?;
multi_index.add_index(col.name().to_string(), Box::new(index));
}
ColumnType::Bool => {
let index = SortedIndex::<bool>::deserialize(&mut buffer.as_slice())?;
multi_index.add_index(col.name().to_string(), Box::new(index));
}
ColumnType::DateTime => {
let index =
SortedIndex::<DateTime<Utc>>::deserialize(&mut buffer.as_slice())?;
multi_index.add_index(col.name().to_string(), Box::new(index));
}
_ => {
return Err(anyhow!("unsupported column type"));
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract column type deserialization into a separate function.

The column type deserialization logic is duplicated between seekable and non-seekable implementations. Consider extracting it into a reusable function.

+fn deserialize_column_index(
+    col_type: ColumnType,
+    buffer: &[u8],
+    col_name: String,
+    multi_index: &mut MultiIndex,
+) -> Result<()> {
+    match col_type {
+        ColumnType::Int => {
+            let index = SortedIndex::<i32>::deserialize(buffer)?;
+            multi_index.add_index(col_name, Box::new(index));
+        }
+        ColumnType::Long => {
+            let index = SortedIndex::<i64>::deserialize(buffer)?;
+            multi_index.add_index(col_name, Box::new(index));
+        }
+        // ... other cases ...
+        _ => return Err(anyhow!("unsupported column type")),
+    }
+    Ok(())
+}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
src/rust/fcb_core/src/reader/mod.rs (2)

26-49: Consider documenting the new fields in FeatureIter.

While most fields have documentation comments, the new fields item_attr_filter, feature_offset, and total_feat_count lack documentation explaining their purpose.

Add documentation comments for the new fields:

    /// Selected features or None if no bbox filter
    item_filter: Option<Vec<packed_rtree::SearchResultItem>>,
+   /// Selected attributes or None if no attribute filter
    item_attr_filter: Option<Vec<ValueOffset>>,
    /// Number of selected features (None for undefined feature count)
    count: Option<usize>,
    /// Current feature number
    feat_no: usize,
    /// File offset within feature section
    cur_pos: u64,
    /// Reading state
    state: State,
    /// Whether or not the underlying reader is Seek
    seekable_marker: PhantomData<S>,
+   /// Offsets to different sections in the file
    feature_offset: FeatureOffset,
+   /// Total number of features in the file
    total_feat_count: u64,

52-57: Consider making FeatureOffset fields private.

The fields in FeatureOffset are currently public by default. Since this is a pub(super) struct, consider making the fields private and providing accessor methods if needed.

 pub(super) struct FeatureOffset {
-    magic_bytes: u64,
-    header: u64,
-    rtree_index: u64,
-    attributes: u64,
+    magic_bytes: u64,
+    header: u64,
+    rtree_index: u64,
+    attributes: u64,
 }
src/rust/fcb_core/src/bin/read_cj.rs (2)

8-8: Add return type documentation.

The function's return type Result<()> should be documented to clarify what errors might be returned.

-fn read_cj(input: &str) -> Result<()> {
+/// Reads a CityJSON file and extracts specific attributes from features.
+/// 
+/// # Errors
+/// Returns an error if:
+/// - The file cannot be opened or read
+/// - The JSON parsing fails
+/// - The data is not a valid CityJSONSeq
+fn read_cj(input: &str) -> Result<()> {

26-30: Extract attribute names as constants.

The attribute names "documentnummer" and "identificatie" are hardcoded. Consider extracting them as constants for better maintainability.

+const DOCUMENT_NUMBER_ATTR: &str = "documentnummer";
+const IDENTIFICATION_ATTR: &str = "identificatie";
+
 fn read_cj(input: &str) -> Result<()> {
     // ...
     if let Some(attributes) = &co.attributes {
         if let Some(document_number) =
-            attributes.get("documentnummer").and_then(|v| v.as_str())
+            attributes.get(DOCUMENT_NUMBER_ATTR).and_then(|v| v.as_str())
         {
             document_nummers.push(document_number.to_string());
         }
         if let Some(identification) =
-            attributes.get("identificatie").and_then(|v| v.as_str())
+            attributes.get(IDENTIFICATION_ATTR).and_then(|v| v.as_str())
         {
             identifications.push(identification.to_string());
         }
     }

Also applies to: 31-35

src/rust/fcb_core/benches/read_attr.rs (2)

52-63: Use constants for query parameters.

The attribute name and values in the query are hardcoded. Use the same constants as defined above for consistency.

     let query: AttrQuery = vec![
         (
-            "b3_h_dak_50p".to_string(),
+            ATTR_NAME.to_string(),
             Operator::Gt,
-            ByteSerializableValue::F64(OrderedFloat(20.0)),
+            ByteSerializableValue::F64(OrderedFloat(MIN_VALUE)),
         ),
         (
-            "b3_h_dak_50p".to_string(),
+            ATTR_NAME.to_string(),
             Operator::Lt,
-            ByteSerializableValue::F64(OrderedFloat(50.0)),
+            ByteSerializableValue::F64(OrderedFloat(MAX_VALUE)),
         ),
     ];

104-137: Add documentation for benchmark parameters.

The benchmark function lacks documentation about its parameters and what it's measuring.

+/// Benchmarks FCB file reading with and without attribute indexing.
+///
+/// # Parameters
+/// - `c`: Criterion instance for benchmarking
+///
+/// The benchmark compares:
+/// - Reading without attribute index: Full scan of features
+/// - Reading with attribute index: Using indexed attributes for faster access
+///
+/// Results are printed in a table format showing mean execution times.
 pub fn read_benchmark(c: &mut Criterion) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4b21f0 and 21f61fc.

📒 Files selected for processing (6)
  • src/rust/fcb_core/Cargo.toml (2 hunks)
  • src/rust/fcb_core/benches/read_attr.rs (1 hunks)
  • src/rust/fcb_core/src/bin/read_cj.rs (1 hunks)
  • src/rust/fcb_core/src/reader/attr_query.rs (1 hunks)
  • src/rust/fcb_core/src/reader/mod.rs (15 hunks)
  • src/rust/makefile (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/rust/makefile
  • src/rust/fcb_core/src/reader/attr_query.rs
🔇 Additional comments (3)
src/rust/fcb_core/src/reader/mod.rs (2)

3-3: LGTM! Module organization looks good.

The new imports and module declarations are well-organized:

  • bst::ValueOffset for attribute indexing
  • New attr_query module with proper visibility

Also applies to: 16-16, 18-18


317-328: Verify attribute filtering logic in both iterator implementations.

The attribute filtering logic differs between seekable and non-seekable iterators:

  • Non-seekable: Uses relative offsets and skips bytes
  • Seekable: Uses absolute offsets with seek_from_start

Please verify that both implementations handle edge cases correctly:

  1. When attributes are filtered but no spatial filtering is applied
  2. When both spatial and attribute filters are applied
  3. When neither filter is applied

Also applies to: 363-372

❌ Verification inconclusive

🏁 Script executed:

#!/bin/bash
# Search for test cases covering these scenarios
rg -A 10 "test.*attr.*filter" 

Length of output: 31


Action Required: Manual Verification of Attribute Filtering Edge Cases

The automated search did not yield any test cases verifying the attribute filtering logic for both seekable and non-seekable iterator implementations. This makes it difficult to conclusively assess the correctness of handling the following edge cases:

  • When attributes are filtered but no spatial filtering is applied
  • When both spatial and attribute filters are applied
  • When neither filter is applied

Please manually verify these cases with appropriate tests or additional runtime checks to ensure consistency between the two implementations.

src/rust/fcb_core/Cargo.toml (1)

31-33: LGTM! Dependencies and targets are well-structured.

The new dependencies and targets align well with the PR's objectives. The workspace-level version management for ordered-float and chrono ensures consistency across the project.

Also applies to: 64-66, 72-74

Comment on lines 281 to 287
fn attr_index_size(&self) -> u64 {
let header = self.buffer.header();
header
.attribute_index()
.map(|attr_index| attr_index.iter().map(|ai| ai.length()).sum::<u32>())
.unwrap_or(0) as u64
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding error handling for malformed attribute indices.

The attr_index_size method assumes the attribute index data is well-formed. Consider adding validation to handle potential corruption in the flatbuffer data.

 fn attr_index_size(&self) -> u64 {
     let header = self.buffer.header();
-    header
-        .attribute_index()
-        .map(|attr_index| attr_index.iter().map(|ai| ai.length()).sum::<u32>())
-        .unwrap_or(0) as u64
+    header
+        .attribute_index()
+        .map(|attr_index| {
+            attr_index
+                .iter()
+                .try_fold(0u32, |acc, ai| {
+                    let len = ai.length();
+                    if len > u32::MAX - acc {
+                        Err(anyhow!("Attribute index size overflow"))
+                    } else {
+                        Ok(acc + len)
+                    }
+                })
+                .unwrap_or(0)
+        })
+        .unwrap_or(0) as u64
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn attr_index_size(&self) -> u64 {
let header = self.buffer.header();
header
.attribute_index()
.map(|attr_index| attr_index.iter().map(|ai| ai.length()).sum::<u32>())
.unwrap_or(0) as u64
}
fn attr_index_size(&self) -> u64 {
let header = self.buffer.header();
header
.attribute_index()
.map(|attr_index| {
attr_index
.iter()
.try_fold(0u32, |acc, ai| {
let len = ai.length();
if len > u32::MAX - acc {
Err(anyhow!("Attribute index size overflow"))
} else {
Ok(acc + len)
}
})
.unwrap_or(0)
})
.unwrap_or(0) as u64
}

Comment on lines +26 to +34
if let Some(b3_h_dak_50p) = attributes.get("b3_h_dak_50p") {
if b3_h_dak_50p.as_f64().unwrap() > 20.0
&& b3_h_dak_50p.as_f64().unwrap() < 50.0
{
// println!("b3_h_dak_50p: {:?}", b3_h_dak_50p);
target_feat_num += 1;
continue;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract duplicate feature counting logic.

The feature counting logic is duplicated between read_fcb_without_attr_index and read_fcb_with_attr_index. Consider extracting it into a shared function.

+const ATTR_NAME: &str = "b3_h_dak_50p";
+const MIN_VALUE: f64 = 20.0;
+const MAX_VALUE: f64 = 50.0;
+
+fn count_matching_features(attributes: &serde_json::Map<String, serde_json::Value>) -> bool {
+    if let Some(value) = attributes.get(ATTR_NAME) {
+        if let Some(num) = value.as_f64() {
+            return num > MIN_VALUE && num < MAX_VALUE;
+        }
+    }
+    false
+}
+
 // In both functions, replace the duplicate logic with:
 if let Some(attributes) = &co.attributes {
-    if let Some(b3_h_dak_50p) = attributes.get("b3_h_dak_50p") {
-        if b3_h_dak_50p.as_f64().unwrap() > 20.0
-            && b3_h_dak_50p.as_f64().unwrap() < 50.0
-        {
-            target_feat_num += 1;
-            continue;
-        }
+    if count_matching_features(attributes) {
+        target_feat_num += 1;
+        continue;
     }
 }

Also applies to: 74-82

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
src/rust/fcb_core/src/writer/attribute.rs (1)

69-69: ⚠️ Potential issue

Add proper error handling for unwrap() calls.

Multiple unwrap() calls could lead to panics if the values are null or of unexpected types.

Apply this diff to add proper error handling:

-            size_of::<u32>() + colval.as_str().unwrap().len()
+            size_of::<u32>() + colval.as_str().unwrap_or_default().len()
-            size_of::<u32>() + colval.as_str().unwrap().len(), //TODO: check if this is correct
+            size_of::<u32>() + colval.as_str().unwrap_or_default().len() //TODO: check if this is correct

Also applies to: 75-75

♻️ Duplicate comments (1)
src/rust/fcb_core/src/writer/attr_index.rs (1)

63-185: 🛠️ Refactor suggestion

Reduce code duplication in match arms.

The match arms follow an identical pattern with only the types and variant names changing. This duplication can be reduced using a macro or helper function.

Here's a suggested approach using a macro:

macro_rules! handle_attribute_type {
    ($schema_index:expr, $attribute_entries:expr, $type:ty, $variant:pat => $val:expr) => {
        build_index_generic::<$type, _>($schema_index, $attribute_entries, |entry| {
            if let $variant = entry {
                if index == $schema_index {
                    Some($val)
                } else {
                    None
                }
            } else {
                None
            }
        })
    };
}

// Usage in match:
match *coltype {
    ColumnType::Bool => handle_attribute_type!(
        *schema_index,
        attribute_entries,
        bool,
        AttributeIndexEntry::Bool { index, val } => *val
    ),
    ColumnType::Int => handle_attribute_type!(
        *schema_index,
        attribute_entries,
        i32,
        AttributeIndexEntry::Int { index, val } => *val
    ),
    // ... similar for other types
}
🧹 Nitpick comments (7)
src/rust/fcb_core/src/writer/attribute.rs (3)

45-45: Consider improving numeric type guessing.

The TODO comment indicates uncertainty about defaulting to ULong. Consider implementing range-based type inference or accepting type hints from the schema.

Would you like me to help implement a more accurate type guessing mechanism that considers value ranges?


235-236: Replace println! with proper logging.

Using println! for error handling is not ideal for production code. Consider using a proper logging framework.

Apply this diff to use proper logging:

-                println!("Attribute {} not found in schema", attr);
+                log::warn!("Attribute {} not found in schema", attr);
-                                eprintln!("Failed to parse DateTime: {}", e);
+                                log::error!("Failed to parse DateTime: {}", e);
-                    println!("Attribute {} is not supported for indexing", attr);
+                    log::warn!("Attribute {} is not supported for indexing", attr);

Also, consider adding the log crate to your dependencies:

 [dependencies]
+log = "0.4"

Also applies to: 307-311, 317-318


377-461: Consider adding more test cases.

While the current test cases are good, consider adding tests for:

  • DateTime parsing
  • Binary data handling
  • Invalid/malformed attribute values
  • Edge cases for numeric types

Would you like me to help implement these additional test cases?

src/rust/fcb_core/src/reader/attr_query.rs (2)

107-108: Improve error messages with more context.

The error messages for missing attribute index and columns could be more descriptive to help with debugging:

-            .ok_or_else(|| anyhow!("attribute index not found"))?;
+            .ok_or_else(|| anyhow!("attribute index not found in header"))?;
-            .ok_or_else(|| anyhow!("no columns found in header"))?;
+            .ok_or_else(|| anyhow!("no columns found in header schema"))?;

Also applies to: 110-111


164-165: Extract error handling into constants for consistency.

The error handling is duplicated between select_attr_query and select_attr_query_seq. Consider extracting the error messages into constants:

+const ERR_ATTRIBUTE_INDEX_NOT_FOUND: &str = "attribute index not found in header";
+const ERR_NO_COLUMNS_FOUND: &str = "no columns found in header schema";

-            .ok_or_else(|| anyhow::anyhow!("attribute index not found"))?;
+            .ok_or_else(|| anyhow::anyhow!(ERR_ATTRIBUTE_INDEX_NOT_FOUND))?;
-            .ok_or_else(|| anyhow::anyhow!("no columns found in header"))?
+            .ok_or_else(|| anyhow::anyhow!(ERR_NO_COLUMNS_FOUND))?

Also applies to: 167-169

src/rust/fcb_core/src/reader/mod.rs (1)

329-340: Add comments explaining the attribute filtering logic.

The attribute filtering logic in both seekable and non-seekable implementations would benefit from comments explaining:

  • Why we need to check if we haven't reached the attribute offset yet
  • The difference in seeking strategies between seekable and non-seekable implementations

Also applies to: 375-384

src/rust/fcb_core/src/writer/attr_index.rs (1)

15-23: Add documentation for the generic function.

Consider adding documentation to explain:

  • Purpose of the function
  • Generic type constraints
  • Parameters and return value
  • Example usage
+/// Builds a sorted index for attribute values of type T.
+/// 
+/// # Type Parameters
+/// * `T` - The attribute value type that must implement `Ord + Clone + ByteSerializable`
+/// * `F` - A closure type that extracts Option<T> from AttributeIndexEntry
+/// 
+/// # Arguments
+/// * `schema_index` - The index of the attribute in the schema
+/// * `attribute_entries` - Map of feature offsets to their attribute entries
+/// * `extract` - Closure that extracts the typed value from an AttributeIndexEntry
+/// 
+/// # Returns
+/// * `Option<(Vec<u8>, AttributeIndexInfo)>` - The serialized index and its metadata
 fn build_index_generic<T, F>(
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21f61fc and 0e6bb00.

📒 Files selected for processing (6)
  • src/rust/bst/src/byte_serializable.rs (1 hunks)
  • src/rust/fcb_core/src/bin/read_cj.rs (1 hunks)
  • src/rust/fcb_core/src/reader/attr_query.rs (1 hunks)
  • src/rust/fcb_core/src/reader/mod.rs (15 hunks)
  • src/rust/fcb_core/src/writer/attr_index.rs (1 hunks)
  • src/rust/fcb_core/src/writer/attribute.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/rust/fcb_core/src/bin/read_cj.rs
  • src/rust/bst/src/byte_serializable.rs
🔇 Additional comments (10)
src/rust/fcb_core/src/writer/attribute.rs (3)

1-9: LGTM! Clean imports and type definitions.

The new imports and type definitions are well-organized and properly documented.


189-200: LGTM! Well-structured attribute index entry enum.

The enum is well-designed with appropriate derive macros and covers all necessary attribute types.


202-216: LGTM! Clear and focused implementation.

The function has a single responsibility and effectively processes CityJSON features for indexing.

src/rust/fcb_core/src/reader/attr_query.rs (3)

18-67: Extract column type deserialization into a separate function.

The column type deserialization logic is duplicated between different column types. Consider extracting it into a reusable function to improve maintainability and reduce code duplication.


69-87: LGTM!

The function is well-structured, handles all possible variants of ByteSerializableValue, and follows a consistent pattern.


89-100: LGTM!

The function is concise, focused, and efficiently maps query conditions.

src/rust/fcb_core/src/reader/mod.rs (3)

52-57: LGTM!

The FeatureOffset struct is well-organized and provides a clean abstraction for managing file offsets. The total_size method is simple and correct.

Also applies to: 303-305


515-517: Consider handling potential overflow in features_count.

The unchecked conversion from u64 to usize could cause issues on 32-bit platforms.


281-299: LGTM!

The attr_index_size method is well-implemented with proper overflow checking and error handling.

src/rust/fcb_core/src/writer/attr_index.rs (1)

1-13: LGTM! Well-organized imports.

The imports are logically organized and all are actively used in the implementation.

@HideBa HideBa merged commit 84fc275 into main Feb 11, 2025
4 of 5 checks passed
@HideBa HideBa deleted the bst branch February 11, 2025 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant