-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async support v2 #417
async support v2 #417
Conversation
Response to #314 (comment) Actually, I'll plan to merge similar traits So it is good time for you to investigate that surface and how it can be integrated with |
Okay, I've moved a lot of stuff around to hopefully separate some things better. I've also implemented async support on the existing
|
1852c70 is a pretty big change which is hard to review. Could you split this commit into several? At least I can suggest extract introducing builder pattern into a separate commit. Also, can you rebase your work and squash up all fixup changes as much as possible before the finishing work (for example, you've changed Also, some changes seems to a little complicated to me. Actually I was thinking about something like (signatures approximate, just to illustrate the idea): struct SliceReader<'i>(&'i [u8]);
struct IoReader<R: BufRead>(R);
struct AsyncIoReader<R: AsyncBufRead>(R);
/// The existing Reader struct
pub struct Reader<R> {
reader: R,
// ...
}
impl<R> Reader<R> {
// common parsing code, setup code
}
impl<'i> Reader<SliceReader<'i>> {
pub fn read_event(&mut self) { /* ... */ }
pub fn read_event_async(&mut self) { /* ... */ }
}
impl<R: BufRead> Reader<IoReader<R>> {
pub fn read_event_into(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}
impl<R: AsyncBufRead> Reader<AsyncIoReader<R>> {
pub fn read_event_into_async(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}
////////////////////////////////////////////////////////////////////////////
/// The struct that I would introduce in my namespaces PR
pub struct NsReader<R> {
reader: Reader<R>,
// ...
}
impl<R> NsReader<R> {
// common parsing code, setup code
}
impl<'i> NsReader<SliceReader<'i>> {
pub fn read_event(&mut self) { /* ... */ }
pub fn read_event_async(&mut self) { /* ... */ }
}
impl<R: BufRead> NsReader<IoReader<R>> {
pub fn read_event_into(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}
impl<R: AsyncBufRead> NsReader<AsyncIoReader<R>> {
pub fn read_event_into_async(&mut self, buf: &mut Vec<u8>) { /* ... */ }
} |
Okay, so you'd prefer having the |
ad259c9
to
208d801
Compare
@Mingun I think I've reimplemented this now the way you suggested and with minimal unrelated changes. The builder pattern can still be added again in a later PR. I also didn't separate the namespaced read methods in this PR but it should be relatively easy to add. |
Codecov Report
@@ Coverage Diff @@
## master #417 +/- ##
==========================================
+ Coverage 49.58% 50.48% +0.90%
==========================================
Files 22 26 +4
Lines 13935 14429 +494
==========================================
+ Hits 6909 7284 +375
- Misses 7026 7145 +119
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big change, so now I made only a quick overview. I'll plan to made a second review, probably after my remarks will be addressed.
Fist of all, this variant looks much more simpler and better that a previous attempt. For the maintenance purposes I prefer that each commit was as smaller as possible and focusing on the one thing. Your f9cdf27 commit actually does the much more that just splitting reader. I suggest to split it for a several commits:
- the one that changes usage of buffered methods in tests and examples into borrowing ones
- the one that introduces a helper macros which conditionally includes
await
andasync
keywords - the one that moves code to the separate files
- the one that actually implements
async
reader. The adding tests commit then probably can be included in that commit - any other that you'll decide to made to keep the review burden low
Please ensure, that after each commit code in the buildable state by running commands that CI run and additionally cargo doc --all-features
. If some tests will temporary fail, please include the list of failed tests to the commit message (just copy output of cargo test
after "failures:")
Try to keep copies of code low. Use helper functions / traits / macros if needed. For example, the original read_bang_element
probably contains a bug, because read
are always >=1
but there is read == 0
check, I do not want to copy it over the whole project.
Also add an entry to the Changelog.md.
It also will be a great to run benchmarks for the async code and compare results with the sync code, so it will be a plus if you add these benchmarks.
@dralley, I also would to hear your opinion
src/reader.rs
Outdated
|
||
#[cfg(feature = "async")] | ||
pub use self::async_reader::AsyncReader; | ||
pub use self::{io_reader::IoReader, slice_reader::SliceReader}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to not group imports
pub use self::{io_reader::IoReader, slice_reader::SliceReader}; | |
pub use self::io_reader::IoReader; | |
pub use self::slice_reader::SliceReader; |
/// loop { | ||
/// match reader.read_event_into(&mut buf) { | ||
/// match reader.read_event() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea to use borrowing reading in tests and examples, but I also want to show the buffering variant somewhere, so the users know, how to use it.
Besides, this is a quite isolated change and I prefer to have it as a separate commit. This will make the history much more clean and allow to focus on really important things when doing commit-by-commit review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with this, the README does have an example, but it would be helpful for the documentation to have one readily available as well.
src/reader.rs
Outdated
use crate::errors::Error; | ||
use crate::reader::{BangType, XmlSource}; | ||
use crate::reader::{BangType}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use crate::reader::{BangType}; | |
use crate::reader::BangType; |
and in other similar places
src/reader/async_reader.rs
Outdated
use tokio::{ | ||
fs::File, | ||
io::{self, AsyncBufRead, AsyncBufReadExt, AsyncRead, BufReader}, | ||
}; | ||
|
||
use crate::{ | ||
events::{BytesText, Event}, | ||
name::{QName, ResolveResult}, | ||
Error, Result, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to not group imports
use tokio::{ | |
fs::File, | |
io::{self, AsyncBufRead, AsyncBufReadExt, AsyncRead, BufReader}, | |
}; | |
use crate::{ | |
events::{BytesText, Event}, | |
name::{QName, ResolveResult}, | |
Error, Result, | |
}; | |
use tokio::fs::File; | |
use tokio::io::{self, AsyncBufRead, AsyncBufReadExt, AsyncRead, BufReader}; | |
use crate::events::{BytesText, Event}; | |
use crate::name::{QName, ResolveResult}; | |
use crate::{Error, Result}; |
Also, could you add a brief documentation for a module?
src/reader/async_reader.rs
Outdated
let available = match self.fill_buf().await { | ||
Ok(n) if n.is_empty() => break, | ||
Ok(n) => n, | ||
Err(ref e) if e.kind() == io::ErrorKind::Interrupted => continue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this ref
required?
Err(ref e) if e.kind() == io::ErrorKind::Interrupted => continue, | |
Err(e) if e.kind() == io::ErrorKind::Interrupted => continue, |
src/reader/io_reader.rs
Outdated
use std::{ | ||
fs::File, | ||
io::{self, BufRead, BufReader, Read}, | ||
ops::{Deref, DerefMut}, | ||
path::Path, | ||
}; | ||
|
||
use crate::{ | ||
events::{BytesText, Event}, | ||
name::{QName, ResolveResult}, | ||
Error, Result, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to not group imports
use std::{ | |
fs::File, | |
io::{self, BufRead, BufReader, Read}, | |
ops::{Deref, DerefMut}, | |
path::Path, | |
}; | |
use crate::{ | |
events::{BytesText, Event}, | |
name::{QName, ResolveResult}, | |
Error, Result, | |
}; | |
use std::fs::File; | |
use std::io::{self, BufRead, BufReader, Read}; | |
use std::ops::{Deref, DerefMut}; | |
use std::path::Path; | |
use crate::events::{BytesText, Event}; | |
use crate::name::{QName, ResolveResult}; | |
use crate::{Error, Result}; |
Also, could you add a brief documentation for a module?
src/reader/slice_reader.rs
Outdated
use crate::{ | ||
events::{BytesText, Event}, | ||
name::{QName, ResolveResult}, | ||
Error, Result, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to not group imports
use crate::{ | |
events::{BytesText, Event}, | |
name::{QName, ResolveResult}, | |
Error, Result, | |
}; | |
use crate::events::{BytesText, Event}; | |
use crate::name::{QName, ResolveResult}; | |
use crate::{Error, Result}; |
Also, could you add a brief documentation for a module?
tests/async_test.rs
Outdated
let mut buf = Vec::new(); | ||
let mut count = 0; | ||
loop { | ||
match reader.read_event_into(&mut buf).await.unwrap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that function was read_event_into_async
, because a reader usually will not see a reader
type
#[tokio::test] | ||
async fn test_sample() { | ||
let path = PathBuf::from(env!("CARGO_MANIFEST_DIR")); | ||
let mut reader = Reader::from_file_async(path.join("tests/documents/sample_rss.xml")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use include_bytes!
or include_str!
unless this test tests the from_file_async
method.
Cargo.toml
Outdated
@@ -94,6 +97,9 @@ serialize = ["serde"] | |||
## Enables support for recognizing all [HTML 5 entities](https://dev.w3.org/html5/html-author/charref) | |||
escape-html = [] | |||
|
|||
## Enable support for asynchronous reading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add more details here? References to corresponding builder methods would be enough
3c2b0a6
to
6b35382
Compare
Okay, I think I have addressed all your concerns now. Feel free to mark your discussions as resolved where it applies and keep them open otherwise. I've added some docs, updated examples and split the commits as best as I could. |
9df6f20
to
2af8199
Compare
Rebased onto current master to resolve conflicts. |
@999eagle Apologies for all the breakages! Thank you for splitting apart the commits - I actually suggest to go a step further and break off anything in this PR which is independent of the fundamental changes (e.g. maybe commits #1 and #4) into a separate PR that can be merged easily. Likewise if there are fundamental changes which aren't tied directly to the async-ones, we can possibly review and merge that separately. The motivation is just to reduce the amount of the unmerged work and prevent merge conflicts. There are other fundamental changes being worked on simultaneously right now (see #158) and reconciling the massive patch sets together will not be a pleasant experience unless it is done in smaller pieces. |
Hi @dralley, while those two commits are somewhat independent of the other changes, they don't really do anything useful on their own. The first commit is nothing but a requirement for testing the changes of the third commit, which in turn requires an example for explicitly buffered access for files as most examples just read strings or byte slices and thus don't have buffered reading anymore. |
@999eagle Yes, that makes perfect sense, and it would be helpful. |
src/reader/slice_reader.rs
Outdated
} | ||
} | ||
|
||
/// Private functions for a [`Reader`] based on an [`IoReader`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo I believe?
/// Private functions for a [`Reader`] based on an [`IoReader`]. | |
/// Private functions for a [`Reader`] based on a [`SliceReader`]. |
Almost done reviewing the first 4 commits (which as you said will be split into the new PR), and it is looking fine so far. I really appreciate all the work you've put in to get this cleaned up. |
This also changes the test cases in the `reader::test::check` macro to allow for reader-specific tests.
This is relevant: https://blog.rust-lang.org/inside-rust/2022/07/27/keyword-generics.html Anything that comes of that initiative would be months or years down the road, but it might be worthwhile to provide feedback and examples early in the process. |
As discussed in #314, this is a new attempt to add support for
async
reading.This PR is still WIP and open for discussion on implementation details.