-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ffi: Redesign Deserializer
API to deserialize key-value pair IR streams one IR unit at a time (fixes #539).
#549
Conversation
WalkthroughThe changes introduce a new header file Changes
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used📓 Learnings (1)
🔇 Additional comments (7)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Outside diff range and nitpick comments (9)
components/core/src/clp/ffi/ir_stream/IrUnitType.hpp (1)
7-15
: LGTM: Enum class definition is well-structured.The
IrUnitType
enum class is well-defined with appropriate naming conventions and underlying type. The enum values cover the main types of IR units as described in the PR objectives.A minor suggestion for improvement:
Consider adding a brief comment for each enum value to describe its purpose. This can enhance code readability and maintainability. For example:
enum class IrUnitType : uint8_t { LogEvent = 0, // Represents a log event in the IR stream SchemaTreeNodeInsertion, // Indicates the insertion of a schema tree node UtcOffsetChange, // Signifies a change in UTC offset EndOfStream, // Marks the end of the IR stream };components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (5)
18-20
: Typo in function documentationThere is a minor typographical error in the documentation comment for
get_ir_unit_type_from_tag
. The word "of" is unnecessary in the sentence: "The IR unit type of indicated by the given tag."
17-21
: Add parameter description fortag
In the documentation comment for
get_ir_unit_type_from_tag
, consider adding a description for thetag
parameter to enhance clarity and maintain consistency with other function documentation.
Line range hint
23-33
: Add parameter descriptions forreader
andtag
In the documentation comment for
deserialize_ir_unit_schema_tree_node_insertion
, please consider adding descriptions for thereader
andtag
parameters to improve clarity and maintain consistency.
Line range hint
37-44
: Add parameter description forreader
In the documentation comment for
deserialize_ir_unit_utc_offset_change
, consider adding a description for thereader
parameter to enhance clarity and maintain consistency.
Line range hint
47-62
: Add parameter descriptions forreader
andtag
In the documentation comment for
deserialize_ir_unit_kv_pair_log_event
, please consider adding descriptions for thereader
andtag
parameters to improve clarity and maintain consistency.components/core/src/clp/ffi/ir_stream/Deserializer.hpp (2)
34-34
: Provide a description for the template parameterIrUnitHandler
The documentation tag
@tparam IrUnitHandler
lacks a description. Providing a brief explanation helps users understand the role of the template parameter.Apply this diff to add a description:
- * @tparam IrUnitHandler + * @tparam IrUnitHandler The type of the IR unit handler that defines user-defined operations on deserialized IR units.
44-44
: Add a description for parameterir_unit_handler
The
@param
documentation forir_unit_handler
is missing a description. Including a description enhances the clarity of the documentation.Apply this diff to include the description:
- * @param ir_unit_handler + * @param ir_unit_handler An instance of the IR unit handler to process each deserialized IR unit.components/core/tests/test-ir_encoding_methods.cpp (1)
123-125
: Nitpick: Adjustconst
qualifier placement in return typeFor consistency with common C++ style, place the
const
qualifier before the type in the return value.Apply this diff:
- [[nodiscard]] auto get_deserialized_log_events() const -> vector<KeyValuePairLogEvent> const& { + [[nodiscard]] auto get_deserialized_log_events() const -> const vector<KeyValuePairLogEvent>& { return m_deserialized_log_events; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
- components/core/CMakeLists.txt (1 hunks)
- components/core/src/clp/ffi/ir_stream/Deserializer.cpp (0 hunks)
- components/core/src/clp/ffi/ir_stream/Deserializer.hpp (2 hunks)
- components/core/src/clp/ffi/ir_stream/IrUnitType.hpp (1 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (2 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1 hunks)
- components/core/tests/test-ir_encoding_methods.cpp (4 hunks)
💤 Files with no reviewable changes (1)
- components/core/src/clp/ffi/ir_stream/Deserializer.cpp
🧰 Additional context used
🪛 cppcheck
components/core/tests/test-ir_encoding_methods.cpp
[performance] 110-110: Function parameter 'schema_tree_node_locator' should be passed by const reference.
(passedByValue)
🔇 Additional comments (8)
components/core/src/clp/ffi/ir_stream/IrUnitType.hpp (3)
1-5
: LGTM: Header guards and include statement are correct.The header guards follow the expected naming convention, and the inclusion of is appropriate for using the uint8_t type.
6-6
: LGTM: Namespace declaration is correct.The namespace
clp::ffi::ir_stream
is consistent with the file path and follows the expected naming convention.
16-18
: LGTM: Proper file closure.The namespace is correctly closed with a helpful comment, and the header guard is properly terminated. These practices contribute to code readability and maintainability.
components/core/CMakeLists.txt (1)
336-336
: LGTM! Consider adding specific unit tests for IrUnitType.The addition of
src/clp/ffi/ir_stream/IrUnitType.hpp
to theSOURCE_FILES_unitTest
list is appropriate and aligns with the PR objectives. This ensures that the new IR unit type definitions are included in the unit test compilation process.To ensure comprehensive test coverage for the new
IrUnitType
functionality, please verify if any additional test files need to be added. You can run the following script to check for existing test files related toIrUnitType
:If no specific test files or test cases for
IrUnitType
are found, consider adding them to ensure proper coverage of the new functionality.✅ Verification successful
Tests for IrUnitType already exist and adequately cover the necessary functionality. No additional tests are required.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for existing test files related to IrUnitType # Search for test files that might be related to IrUnitType echo "Searching for existing test files related to IrUnitType:" fd -e cpp -e hpp test.*IrUnitType tests/ # Check if there are any test cases using IrUnitType in existing test files echo "Checking for test cases using IrUnitType in existing test files:" rg "IrUnitType" tests/Length of output: 470
Script:
#!/bin/bash # Description: Locate test files related to IrUnitType across the repository echo "Searching for test files related to IrUnitType using common naming conventions:" # Common test file patterns test_patterns=("test_*" "*_test" "*Test" "*Spec") for pattern in "${test_patterns[@]}"; do echo "Looking for files matching pattern: $pattern" fd --type f --glob "$pattern*" --exclude-dir={node_modules,dist,build} . done echo "Searching within identified test files for usage of IrUnitType:" # Find all test files based on common patterns test_files=$(fd --type f --glob "test_*" --glob "*_test*" --glob "*Test*" --glob "*Spec*" --exclude-dir={node_modules,dist,build}) # Search for IrUnitType in the identified test files rg "IrUnitType" $test_filesLength of output: 4965
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
25-25
: Include of "IrUnitType.hpp" is appropriateThe addition of
#include "IrUnitType.hpp"
ensures that theIrUnitType
definitions are available, which is necessary for the new function implementations.components/core/tests/test-ir_encoding_methods.cpp (3)
95-98
: Looks good!The implementation of
handle_log_event
correctly moves thelog_event
into the vector, ensuring efficient memory usage.
1193-1208
: Validation of deserialized data is thoroughThe test correctly compares the number of serialized and deserialized log events and validates the content, ensuring data integrity.
1177-1192
: Verify exception handling for deserializer creationEnsure that any potential errors during
Deserializer
creation are appropriately handled, even thoughhas_error()
is checked. Consider adding detailed error messages for better troubleshooting.Would you like assistance in enhancing error handling for the deserializer creation?
switch (ir_unit_type) { | ||
case IrUnitType::LogEvent: { | ||
auto result{ | ||
deserialize_ir_unit_kv_pair_log_event(reader, tag, m_schema_tree, m_utc_offset) | ||
}; | ||
if (result.has_error()) { | ||
return result.error(); | ||
} | ||
|
||
if (auto const err{m_ir_unit_handler.handle_log_event(std::move(result.value()))}; | ||
IRErrorCode::IRErrorCode_Success != err) | ||
{ | ||
return ir_error_code_to_errc(err); | ||
} | ||
break; | ||
} | ||
|
||
case IrUnitType::SchemaTreeNodeInsertion: { | ||
std::string key_name; | ||
auto const result{deserialize_ir_unit_schema_tree_node_insertion(reader, tag, key_name) | ||
}; | ||
if (result.has_error()) { | ||
return result.error(); | ||
} | ||
|
||
auto const node_locator{result.value()}; | ||
if (m_schema_tree->has_node(node_locator)) { | ||
return std::errc::protocol_error; | ||
} | ||
|
||
if (auto const err{m_ir_unit_handler.handle_schema_tree_node_insertion(node_locator)}; | ||
IRErrorCode::IRErrorCode_Success != err) | ||
{ | ||
return ir_error_code_to_errc(err); | ||
} | ||
|
||
std::ignore = m_schema_tree->insert_node(node_locator); | ||
break; | ||
} | ||
|
||
case IrUnitType::UtcOffsetChange: { | ||
auto const result{deserialize_ir_unit_utc_offset_change(reader)}; | ||
if (result.has_error()) { | ||
return result.error(); | ||
} | ||
|
||
auto const new_utc_offset{result.value()}; | ||
if (auto const err{ | ||
m_ir_unit_handler.handle_utc_offset_change(m_utc_offset, new_utc_offset) | ||
}; | ||
IRErrorCode::IRErrorCode_Success != err) | ||
{ | ||
return ir_error_code_to_errc(err); | ||
} | ||
|
||
m_utc_offset = new_utc_offset; | ||
break; | ||
} | ||
|
||
case IrUnitType::EndOfStream: { | ||
if (auto const err{m_ir_unit_handler.handle_end_of_stream()}; | ||
IRErrorCode::IRErrorCode_Success != err) | ||
{ | ||
return ir_error_code_to_errc(err); | ||
} | ||
m_is_complete = true; | ||
break; | ||
} | ||
|
||
default: | ||
return std::errc::protocol_not_supported; | ||
} | ||
|
||
return ir_unit_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor deserialize_next_ir_unit
method to improve readability
The deserialize_next_ir_unit
method contains a lengthy switch statement handling multiple IrUnitType
cases. Refactoring each case into separate private methods would enhance readability and maintainability.
Consider refactoring as follows:
-
Extract the logic for each
IrUnitType
case into dedicated private methods, such asprocess_log_event
,process_schema_tree_node_insertion
,process_utc_offset_change
, andprocess_end_of_stream
. -
The updated
deserialize_next_ir_unit
method would look cleaner:
template <IrUnitHandlerInterface IrUnitHandler>
requires(std::move_constructible<IrUnitType>)
auto Deserializer<IrUnitHandler>::deserialize_next_ir_unit(ReaderInterface& reader)
-> OUTCOME_V2_NAMESPACE::std_result<IrUnitType> {
if (is_stream_completed()) {
return std::errc::operation_not_permitted;
}
encoded_tag_t tag{};
if (IRErrorCode::IRErrorCode_Success != deserialize_tag(reader, tag)) {
return std::errc::no_message_available;
}
auto const ir_unit_type{get_ir_unit_type_from_tag(tag)};
switch (ir_unit_type) {
case IrUnitType::LogEvent:
return process_log_event(reader, tag);
case IrUnitType::SchemaTreeNodeInsertion:
return process_schema_tree_node_insertion(reader, tag);
case IrUnitType::UtcOffsetChange:
return process_utc_offset_change(reader);
case IrUnitType::EndOfStream:
return process_end_of_stream();
default:
return std::errc::protocol_not_supported;
}
}
And implement each process_*
method accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current size of this switch statement is still manageable. We've already made handle_xxx
and deserialize_xxx
to make the code readable. What do you think? @kirkrodrigues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass clang-tidy's complexity check, I'm okay with this.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Outdated
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1)
18-23
: LGTM: Well-defined function for IR unit type determinationThe new function
get_ir_unit_type_from_tag
is well-designed and aligns with the PR objectives. The use ofstd::optional
and the[[nodiscard]]
attribute are appropriate.Consider adding a brief explanation of what constitutes an invalid tag in the documentation for clarity.
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (2)
51-52
: Enhanced public interface with new and updated methodsThe changes to the public methods align well with the new design of the
Deserializer
class. The updatedcreate
method, newdeserialize_next_ir_unit
method, and additional utility methods improve the class's functionality and usability.One minor suggestion:
Consider adding a brief explanation of what "stream completion" means in the documentation for the
is_stream_completed
method. For example:/** - * @return Whether the stream has completed. A stream is considered completed if an - * end-of-stream IR unit has already been deserialized. + * @return Whether the stream has completed. A stream is considered completed if an + * end-of-stream IR unit has already been deserialized, indicating that no more IR units + * are available for processing. */Also applies to: 99-100, 106-112
125-166
: Well-implemented create and deserialize_next_ir_unit methodsThe implementations of both
create
anddeserialize_next_ir_unit
methods are thorough and handle various scenarios appropriately. The error handling and delegation to specific IR unit handlers indeserialize_next_ir_unit
are particularly well-structured.One minor suggestion for the
create
method:Consider using a more specific error code when the version check fails. For example:
if (std::string_view{static_cast<char const*>(cProtocol::Metadata::BetaVersionValue)} != version) { - return std::errc::protocol_not_supported; + return std::errc::version_not_supported; // If this error code exists, or create a custom one }This would provide more precise information about the nature of the error.
Also applies to: 168-261
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- components/core/src/clp/ffi/ir_stream/Deserializer.hpp (2 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (2 hunks)
- components/core/tests/test-ir_encoding_methods.cpp (4 hunks)
🧰 Additional context used
📓 Learnings (4)
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (2)
Learnt from: LinZhihao-723 PR: y-scope/clp#549 File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:149-151 Timestamp: 2024-10-01T08:02:43.352Z Learning: Exceptions are disabled when calling `nlohmann::json::parse` by passing `nullptr` and `false` as the second and third arguments.
Learnt from: LinZhihao-723 PR: y-scope/clp#549 File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210 Timestamp: 2024-10-01T07:59:15.290Z Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723 PR: y-scope/clp#549 File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:464-480 Timestamp: 2024-10-01T07:58:47.126Z Learning: In the function `get_ir_unit_type_from_tag()`, any tag that isn't explicitly categorized should be assumed to be a log event. Adding extra warnings or checks is unnecessary and could be costly since a log event IR unit may start with many possible tags. Unsupported tags will be handled during the deserialization process.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1)
Learnt from: LinZhihao-723 PR: y-scope/clp#549 File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:464-480 Timestamp: 2024-10-01T07:58:47.126Z Learning: In the function `get_ir_unit_type_from_tag()`, any tag that isn't explicitly categorized should be assumed to be a log event. Adding extra warnings or checks is unnecessary and could be costly since a log event IR unit may start with many possible tags. Unsupported tags will be handled during the deserialization process.
components/core/tests/test-ir_encoding_methods.cpp (3)
Learnt from: LinZhihao-723 PR: y-scope/clp#549 File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186 Timestamp: 2024-10-01T07:59:11.208Z Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723 PR: y-scope/clp#549 File: components/core/tests/test-ir_encoding_methods.cpp:110-110 Timestamp: 2024-10-01T08:00:48.347Z Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
Learnt from: LinZhihao-723 PR: y-scope/clp#549 File: components/core/tests/test-ir_encoding_methods.cpp:100-106 Timestamp: 2024-10-01T08:00:17.257Z Learning: In the `IrUnitHandler` class, the method `handle_utc_offset_change` must be a member function to implement `IrUnitHandlerInterface`.
🪛 cppcheck
components/core/tests/test-ir_encoding_methods.cpp
[performance] 108-108: Function parameter 'schema_tree_node_locator' should be passed by const reference.
(passedByValue)
🔇 Additional comments (9)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (3)
5-5
: LGTM: Appropriate inclusion of headerThe addition of the header is necessary and correctly placed for the new function that returns std::optional.
15-15
: LGTM: Proper inclusion of IrUnitType.hppThe inclusion of "IrUnitType.hpp" is necessary for the new function and is correctly placed in the file.
Line range hint
1-80
: Overall assessment: Changes align well with PR objectivesThe modifications to this file, including the new function
get_ir_unit_type_from_tag
and necessary header inclusions, effectively support the redesign ofDeserializer
APIs as outlined in the PR objectives. The changes provide a clear mechanism for determining IR unit types from tags, which is crucial for consuming the IR stream by IR units.The existing function declarations remain unchanged, maintaining compatibility while allowing for the enhanced functionality introduced by the new additions.
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (3)
4-5
: New includes added to support enhanced functionalityThe additional includes appear to be appropriate for the new features implemented in the
Deserializer
class. They include necessary standard library headers and project-specific headers that align with the class's expanded functionality.Also applies to: 7-10, 19-24
36-37
: Improved flexibility and type safety with templated IrUnitHandlerThe addition of the
IrUnitHandler
template parameter with theIrUnitHandlerInterface
concept constraint enhances the flexibility of theDeserializer
class while maintaining type safety. Thestd::move_constructible<IrUnitType>
constraint is a good practice to ensure efficient handling ofIrUnitType
objects.
116-116
: Private members and constructor updated to support new functionalityThe changes to the private members and constructor are well-aligned with the new design of the
Deserializer
class. The addition ofm_ir_unit_handler
andm_is_complete
members, along with the updated constructor, properly support the new functionality introduced in the public methods.Also applies to: 121-122
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3)
25-25
: Include statement for "IrUnitType.hpp" is appropriateThe inclusion of
"IrUnitType.hpp"
is necessary for the newIrUnitType
definitions utilized within this file.
173-177
: Declaration ofis_log_event_ir_unit_tag
is correctThe function
is_log_event_ir_unit_tag
is appropriately declared to determine valid leading tags for log event IR units.
468-478
: Implementation ofis_log_event_ir_unit_tag
is accurateThe function correctly checks for valid leading tags of a log event IR unit, ensuring proper identification of log events.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Show resolved
Hide resolved
Co-authored-by: davidlion <[email protected]>
Deserializer
APIs to deserialize key-value pair IR stream by IR units (fixes #539).Deserializer
API to deserialize key-value pair IR streams one IR unit at a time (fixes #539).
…eams one IR unit at a time (fixes y-scope#539). (y-scope#549)
…eams one IR unit at a time (fixes y-scope#539). (y-scope#549)
Description
This PR redesigns
Deserializer
APIs as described in #539. It involves the following major changes:Deserializer
only maintains the state of the stream; user-defined operations, such as the operations on the deserialized log events, are given through an instance that implementsIrUnitHandlerInterface
(introduced in ffi: AddIrUnitHandlerInterface
to perform user-defined handling for deserialized IR units. #540). The type of the IR unit handler is taken as a template parameter ofDeserializer
.This PR also re-implements the unit tests using the new deserializer APIs with a sample IR unit handler to verify the functionalities.
Validation performed
Summary by CodeRabbit
New Features
IrUnitHandler
for improved log event handling and deserialization processes.Bug Fixes
Deserializer
class, streamlining the deserialization logic.Chores
IrUnitHandler
.