Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Improve Parquet Writer for Better Performance and Maintainability #46

Open
hash-data opened this issue Jan 3, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@hash-data
Copy link
Collaborator

hash-data commented Jan 3, 2025

Current Implementation:
Olake currently uses the Fraugster Parquet library, which has the following limitations:

  • Inefficient memory usage: The library consumes more memory than optimal.
  • Concurrency issues: It is not well-suited for concurrent writes, limiting performance in multi-threaded environments.
  • Maintenance concerns: The library is not actively maintained, which could lead to potential issues with updates or compatibility.

Proposed Improvement:
Switch to a more optimized Parquet writer that addresses these concerns, focusing on:

  • Improved memory efficiency: Lower memory consumption during operations.
  • Enhanced concurrency support: Better handling of concurrent writes for improved performance.
  • Active maintenance: Ensuring long-term reliability and compatibility with updates.
    One such library is parquet-go, which is actively maintained and optimized for both memory usage and concurrency.

Benefits of the Proposal:

  • Reduced resource consumption
  • Improved performance in multi-threaded use cases
  • Better maintainability and future-proofing of the project
@hash-data hash-data changed the title Improve Parquet Writer for Better Performance and Maintainability [Proposal] Improve Parquet Writer for Better Performance and Maintainability Jan 3, 2025
@hash-data hash-data added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Jan 7, 2025
@sebzz2k2
Copy link

sebzz2k2 commented Jan 11, 2025

Hey @hash-data
I would like to work on this, if no one is currently working on it

@hash-data
Copy link
Collaborator Author

Hi @sebzz2k2 sure you can start working on it, let me know if any help required.

@hash-data
Copy link
Collaborator Author

@sebzz2k2 quick check-in, any update on the same?
any help required ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
Status: In progress
Development

No branches or pull requests

2 participants