Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

[APM-823] Multi Delimiter #580

Merged
merged 31 commits into from
Jan 16, 2024
Merged

[APM-823] Multi Delimiter #580

merged 31 commits into from
Jan 16, 2024

Conversation

maierlars
Copy link

Add support for multi delimiter analyzer.

ToDo:

  • Implement the generic case.
  • Write more tests.
  • Add deserialization and registration code.

@maierlars maierlars self-assigned this Dec 4, 2023
Copy link
Contributor

@MBkkt MBkkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed few things, also code style for new code is different, it's written in .clang-tidy file

}

auto find_next_delim() {
return std::search(this->data_.begin(), this->data_.end(), bytes_.begin(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 109 to 110
return std::find_if(data_.begin(), data_.end(),
[&](auto c) { return bytes_[c]; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input string can contains bytes which isn't ascii :(
I think better to check it via condition (instead of make bigger bytes_)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHAR_MAX commonly 127, so bytes_ handle only positive (ascii) chars in such case.
It's ok for delimiter (because valid utf-8 with single char contains only ascii),
but it's invalid for any byte in utf-8 input, because any multiple byte character have bytes like 0b1*******

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see.

@maierlars maierlars marked this pull request as ready for review December 11, 2023 10:58
Copy link
Contributor

@MBkkt MBkkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix codestyle

@gnusi gnusi merged commit 28629c0 into master Jan 16, 2024
2 of 3 checks passed
@gnusi gnusi deleted the feature/apm-823-multi-delimiter branch January 16, 2024 12:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants