improve feed converters to normalized form #148
Replies: 2 comments 1 reply
-
I'm unsure if it can be done concurrently due to the row dependencies needed to resolve the correct event order. In the past, a different logic was used. It may help to find the better way: https://github.com/nkaz001/hftbacktest/blob/599a4e0e3aca91e98c8207be638a216ebd232dbe/hftbacktest/data/validation.py. I think there are areas where we can optimize in the same logic. The overhead possibly comes from parsing the CSV, creating the structured array, and possibly the merge sort. Anyway, please feel free to submit a PR that includes a performance comparison and ensures the outcome remains the same. |
Beta Was this translation helpful? Give feedback.
-
There is the discord server: https://discord.gg/K9pDsWrPEe |
Beta Was this translation helpful? Give feedback.
-
Firstly, wonderful work guys, I'm hoping to contribute to the repository and wondering if you have any thoughts on optimizing the data normalization
utils
? currently it seems like the data utilbinancefutures.convert
reads one line at a time and processes it, is it possible to somehow optimize this using batch reads and/or reading a batch and processing each line concurrently? I do understand that you can only know the full state of the next timestamp by knowing the state at the previous timestamp, but can we not do the evaluation in partial first and then perform an aggregation step to update the state of the whole batch along with outlier handling?because i currently see that the convert function takes a long time, I ask so that I can invest my time in the right place, please let me know if there's a possibility of optimization in this context. Thank you
Beta Was this translation helpful? Give feedback.
All reactions