-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: extract bencode tokenizer #14
Refactor: extract bencode tokenizer #14
Conversation
3bf1c2f
to
619467f
Compare
619467f
to
f179b66
Compare
f179b66
to
bdca6f3
Compare
Split parser logic into two types: - Tokenizer: It returns bencoded tokens. - Generator: It iterator over bencoded tokens to generate the JSON.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #14 +/- ##
===========================================
- Coverage 99.23% 99.15% -0.08%
===========================================
Files 11 12 +1
Lines 2749 2610 -139
===========================================
- Hits 2728 2588 -140
- Misses 21 22 +1 ☔ View full report in Codecov by Sentry. |
bdca6f3
to
83eeefd
Compare
Remove the writer without affecting other parts of the code.
ACK ec6cc56 |
@josecelano Looks good. Just wondering, should we really tolerate line brakes in a Bencode Input? |
Hi @da2ce7 I think I only added it to tolerate the line break at the end of the bencode value because it makes more flexible to run the application like this: echo "4:spam" | cargo run If we don't tolerate line breaks you only can use this: printf "4:spam" | cargo run I don't like it either. Maybe we can "clean" the input stream only in the main app. |
I've opened an issue: #19 |
This refactoring changes the current implementation to extract the tokenizer. It splits parser logic into two types:
NOTES
SUBTASKS
Writer
from the tokenizer. It's not needed.PERFORMANCE
In the current version, bencoded strings are cached in memory before starting writing to the output (because we nned the whole string to check if it's a valid UTF-8). In this PR, bencoded integers are also cached in memory because the whole integer value is a token. This should not be a problem since integers are short, unlike strings.
FUTURE PRs
We could:
Iterator
trait for the tokenizer.