Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logging of dataset cleaning #7

Open
Lena-Jurkschat opened this issue Dec 15, 2022 · 0 comments
Open

Improve logging of dataset cleaning #7

Lena-Jurkschat opened this issue Dec 15, 2022 · 0 comments
Labels
invalid This doesn't seem right Nice-to-have

Comments

@Lena-Jurkschat
Copy link

Either removed samples or byted are tracked, somehow misleading I find

Final Size in Byte should also change I guess
```12/15/2022 15:28:29 - INFO - __main__ - Applied filter: filter_remove_empty_docs
12/15/2022 15:28:29 - INFO - __main__ -      Initial number of samples: 4194 samples
12/15/2022 15:28:29 - INFO - __main__ -      Removed samples: 33 samples
12/15/2022 15:28:29 - INFO - __main__ -      Removed percentage: 0.79 %
12/15/2022 15:28:29 - INFO - __main__ -      Final number of samples: 4161 samples
12/15/2022 15:28:29 - INFO - __main__ -      Initial size in bytes: 0.2003 GB
12/15/2022 15:28:29 - INFO - __main__ -      Removed bytes: 0.0000 GB
12/15/2022 15:28:29 - INFO - __main__ -      Removed percentage in bytes: 0.00 %
12/15/2022 15:28:29 - INFO - __main__ -      Final size in bytes: 0.2003 GB
@Lena-Jurkschat Lena-Jurkschat added invalid This doesn't seem right Nice-to-have labels Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right Nice-to-have
Projects
None yet
Development

No branches or pull requests

1 participant