diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 0000000..32c3675 --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1,2 @@ +# root directory +* @skydoorkai @adamantboy @hxdtest @nash635 diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 0000000..04ce096 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,39 @@ +--- +name: Bug report +about: Create a report to help us improve +title: '' +labels: report +assignees: '' + +--- + +**Describe the bug** +A clear and concise description of what the bug is. + +**To Reproduce** +Steps to reproduce the unexpected case: +1. What kink of training? [e.g. FDDP] +2. The command using? [e.g. dlrover-run xxxxx xxxx] +3. When and where? +4. See error + +**Logs or Screenshots** +Logs(necessary) or screenshots to help explain your problem. + +**Expected behavior** +A clear and concise description of what you expected to happen. + +**APP Info (please complete the following information):** + - DLRover: [e.g. 0.3.8] + - Torch [e.g. 2.1.2] + +**ENV Info (please complete the following information):** + - Platform: [e.g. ubuntu xxx] + - Python: [e.g. 3.8.1] + - GRPC [e.g. 1.5.x] + +**HARDWARE Info (please complete the following information):** + - Device: [e.g. GPU A100 / NPU Ascend 910] + +**Additional context** +Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 0000000..bbcbbe7 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,20 @@ +--- +name: Feature request +about: Suggest an idea for this project +title: '' +labels: '' +assignees: '' + +--- + +**Is your feature request related to a problem? Please describe.** +A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] + +**Describe the solution you'd like** +A clear and concise description of what you want to happen. + +**Describe alternatives you've considered** +A clear and concise description of any alternative solutions or features you've considered. + +**Additional context** +Add any other context or screenshots about the feature request here. diff --git a/.github/ISSUE_TEMPLATE/question.md b/.github/ISSUE_TEMPLATE/question.md new file mode 100644 index 0000000..62fcf82 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/question.md @@ -0,0 +1,10 @@ +--- +name: Question +about: For questions. +title: '' +labels: question +assignees: '' + +--- + + diff --git a/.github/actions/atorch-pre-commit/action.yml b/.github/actions/atorch-pre-commit/action.yml new file mode 100644 index 0000000..c942caf --- /dev/null +++ b/.github/actions/atorch-pre-commit/action.yml @@ -0,0 +1,10 @@ +--- +name: atorch-pre-commit +description: run pre-commit to check codes for atorch +runs: + using: 'docker' + image: "easydl/atorch:aci" + args: + - "/bin/bash" + - "-c" + - "sh dev/scripts/pre-commit.sh" diff --git a/.github/actions/atorch-python-test/action.yml b/.github/actions/atorch-python-test/action.yml new file mode 100644 index 0000000..eea3eb7 --- /dev/null +++ b/.github/actions/atorch-python-test/action.yml @@ -0,0 +1,13 @@ +--- +name: atorch-python-test +description: run pytest to execute python test cases of atorch python +runs: + using: 'docker' + image: "registry.cn-hangzhou.aliyuncs.com/atorch/atorch-open-20240430:pt210" + args: + - "/bin/bash" + - "-c" + - "pip install dlrover[torch]==0.4.0 --no-deps \ +&& echo -e 'import math\ninf = math.inf\nnan = math.nan\nstring_classes = \ +(str, bytes)' > /opt/conda/lib/python3.8/site-packages/torch/_six.py \ +&& PYTHONPATH=. pytest atorch/tests/common_tests" diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 0000000..8c8d451 --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,15 @@ +### What changes were proposed in this pull request? + +Please describe the changes you have made or proposed in this pull request. + +### Why are the changes needed? + +Explain the purpose or motivation behind these changes. What problem are you trying to solve? + +### Does this PR introduce any user-facing change? + +Specify whether this pull request introduces any changes that users will directly interact with or notice. + +### How was this patch tested? + +Detail the testing process you have undertaken to ensure the changes in this pull request are valid and working as intended. diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml new file mode 100644 index 0000000..52cdaad --- /dev/null +++ b/.github/workflows/main.yml @@ -0,0 +1,26 @@ +--- +name: CI + +on: + pull_request: + workflow_dispatch: + push: + branches: [master] + +jobs: + python-test: + runs-on: self-hosted + steps: + # This step checks out a copy of your repository. + - uses: actions/checkout@v3 + with: + clean: false + # This step references the directory that contains the action. + - uses: ./.github/actions/atorch-python-test + pre-commit: + runs-on: ubuntu-latest + steps: + # This step checks out a copy of your repository. + - uses: actions/checkout@v3 + # This step references the directory that contains the action. + - uses: ./.github/actions/atorch-pre-commit diff --git a/README.md b/README.md index 4b09a7b..7f2a74c 100644 --- a/README.md +++ b/README.md @@ -7,8 +7,8 @@ - [![GitHub Repo stars](https://img.shields.io/github/stars/intelligent-machine-learning/dlrover?style=social)](https://github.com/intelligent-machine-learning/dlrover/stargazers) - [![Build](https://github.com/intelligent-machine-learning/dlrover/actions/workflows/main.yml/badge.svg)](https://github.com/intelligent-machine-learning/dlrover/actions/workflows/main.yml) + [![GitHub Repo stars](https://img.shields.io/github/stars/intelligent-machine-learning/atorch?style=social)](https://github.com/intelligent-machine-learning/atorch/stargazers) + [![Build](https://github.com/intelligent-machine-learning/atorch/actions/workflows/main.yml/badge.svg)](https://github.com/intelligent-machine-learning/atorch/actions/workflows/main.yml) [![PyPI Status Badge](https://badge.fury.io/py/atorch.svg)](https://pypi.org/project/atorch/) @@ -74,8 +74,8 @@ pip install atorch ``` # clone repository -git clone https://github.com/intelligent-machine-learning/dlrover.git -cd dlrover/atorch +git clone https://github.com/intelligent-machine-learning/atorch.git +cd atorch # build package, optional set version. bash dev/scripts/build.sh [version] # install the created package in dist directory. Note that if version is set, file name is different. @@ -90,7 +90,7 @@ pip install dist/atorch-0.1.0.dev0-py3-none-any.whl - To run [auto_accelerate examples](examples/auto_accelerate): ``` -cd dlrover/atorch/examples/auto_accelerate +cd atorch/examples/auto_accelerate # Single process train python train.py --model_type toy # Distributed train diff --git a/setup.py.tpl b/setup.py.tpl index bf67b40..52a245e 100644 --- a/setup.py.tpl +++ b/setup.py.tpl @@ -204,7 +204,7 @@ setup( " large-scale pretraining and finetuning of LLMs with over 100 billion parameters and" " thousands of advanced GPUs.", author="Ant Group", - url="https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch", + url="https://github.com/intelligent-machine-learning/atorch", python_requires=">=3.8", packages=find_packages(exclude=["*test*", "benchmarks*"]), install_requires=required_deps,