Welcome to the Apache Hudi community! We appreciate your interest in contributing to this open-source data lake platform. This guide will walk you through the process of making your first contribution.
If you are new to the project, we recommend starting with issues listed in https://github.com/apache/hudi-rs/contribute.
Testing and reporting bugs are also valueable contributions. Please follow the issue template to file bug reports.
All issues tagged for a release can be found in the corresponding milestone page, see https://github.com/apache/hudi-rs/milestones.
Features, bugs, and p0
issues that are targeting the next release can be found in
this project view. Pull requests won't be tracked in the project
view, instead, they will be linked to the corresponding issues.
- Install Rust, e.g. as described here
- Have a compatible Python version installed (check
python/pyproject.toml
for current requirement)
For most of the time, use dev commands specified in the Makefile
.
To setup python virtual env, run
make setup-venv
Note
This will run python3
command to set up the virtual environment in venv/
.
Activate the virtual environment by running source venv/bin/activate
for example.
Once a virtual environment is activated, build the project for development by
make develop
This will install hudi
dependency built from your local repo to the virtual env.
For Rust,
# For all tests
make test-rust
# or
cargo test --workspace
# For all tests in a crate / package
cargo test -p hudi-core
# For a specific test case
cargo test -p hudi-core table::tests::hudi_table_get_schema
For Python,
# For all tests
make test-python
# or
pytest -s python/tests
# For a specific test case
pytest python/tests/test_table_read.py -s -k "test_read_table_has_correct_schema"
Run the below command and fix issues if any:
make format check test
When submitting a pull request, please follow these guidelines:
- Title Format: The pull request title must follow the format outlined in
the conventional commits spec. This is a standardized format for commit
messages, and also allows us to auto-generate change logs and release notes. Since only the
main
branch requires this format, and we always squash commits and then merge the PR, incremental commits' messages do not need to conform to it. - Line Count: A general guideline is to keep the PR's diff, i.e., max(added lines, deleted lines), less than 1000 lines. Keeping PRs concise makes it easier for reviewers to thoroughly examine changes without experiencing fatigue. If your changes exceed this limit, consider breaking them down into smaller, logical PRs that address specific aspects of the feature or bug fix.
- Coverage Requirements: All new features and bug fixes must include appropriate unit tests to ensure functionality and prevent regressions. Tests should cover both typical use cases and edge cases. Ensure that new tests pass locally before submitting the PR.
- Code Comments: Properly designed APIs and code should be self-explanatory and make in-code comments redundant. In case that complex logic or non-obvious implementations are absolutely unavoidable, please add comments to explain the code's purpose and behavior.
We use codecov to generate code coverage report and enforce code coverage rate. See codecov.yml for the configuration.
To help with contributing to the project, please explore Hudi's documentation for further learning.
We expect all community members to follow our Code of Conduct.