Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream Accelerate #1741

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
a66f9f5
init
IlyasMoutawwakil Feb 4, 2025
8058e80
style
IlyasMoutawwakil Feb 4, 2025
ee4b974
change local accelerate name to avoid import issues
IlyasMoutawwakil Feb 4, 2025
7abfff3
local FP8ContextWrapper
IlyasMoutawwakil Feb 4, 2025
6abfb7d
local extract_model_from_parallel
IlyasMoutawwakil Feb 4, 2025
c49b0a2
local accelerate
IlyasMoutawwakil Feb 4, 2025
37ff6b0
accelerator state
IlyasMoutawwakil Feb 4, 2025
246f836
fix simple trainer tests
IlyasMoutawwakil Feb 5, 2025
731d392
pass fast distributed tests
IlyasMoutawwakil Feb 5, 2025
88e12e0
deepspeed model parallel
IlyasMoutawwakil Feb 5, 2025
4b58645
Merge branch 'main' into upstream-accelerate
IlyasMoutawwakil Feb 5, 2025
a9a9a15
Merge branch 'main' into upstream-accelerate
IlyasMoutawwakil Feb 6, 2025
17b8291
merge
IlyasMoutawwakil Feb 6, 2025
b8fc49f
test
IlyasMoutawwakil Feb 10, 2025
a0f11d0
pass fast distributed tests
IlyasMoutawwakil Feb 10, 2025
e1b32f8
test sloz
IlyasMoutawwakil Feb 10, 2025
cd513b5
remove force_autocast
IlyasMoutawwakil Feb 10, 2025
abdc734
do fp8 conversion in trainer
IlyasMoutawwakil Feb 11, 2025
7f7dedc
fix
IlyasMoutawwakil Feb 11, 2025
a3951e5
test
IlyasMoutawwakil Feb 15, 2025
c32b701
Merge branch 'main' into upstream-accelerate
IlyasMoutawwakil Feb 15, 2025
4240e55
test
IlyasMoutawwakil Feb 18, 2025
c0eaa4b
cancel
IlyasMoutawwakil Feb 18, 2025
bfd23ea
fix
IlyasMoutawwakil Feb 18, 2025
a8523c8
test
IlyasMoutawwakil Feb 18, 2025
a30ab5f
path
IlyasMoutawwakil Feb 18, 2025
356d898
test
IlyasMoutawwakil Feb 18, 2025
b653727
use runner's HABANA_VISIBLE_DEVICES
IlyasMoutawwakil Feb 18, 2025
beeeffa
don't use the same networks as host
IlyasMoutawwakil Feb 18, 2025
04fbe19
remove
IlyasMoutawwakil Feb 18, 2025
6f2fc7e
sanitize
IlyasMoutawwakil Feb 18, 2025
6a84721
now run accelerate
IlyasMoutawwakil Feb 18, 2025
145f062
exit after tests
IlyasMoutawwakil Feb 18, 2025
98a1ab1
clean hf_home
IlyasMoutawwakil Feb 18, 2025
6ff0ddb
test
IlyasMoutawwakil Feb 18, 2025
3ad3e51
shm
IlyasMoutawwakil Feb 18, 2025
017f899
HABANA_VISIBLE_MODULES
IlyasMoutawwakil Feb 18, 2025
1e2d3f1
fix
IlyasMoutawwakil Feb 18, 2025
e89782b
abandon running accelerate tests in CI
IlyasMoutawwakil Feb 18, 2025
e689bc4
test new DOCKER_HABANA_VISIBLE_DEVICES
IlyasMoutawwakil Feb 18, 2025
5ff3160
fix
IlyasMoutawwakil Feb 18, 2025
245fcda
Merge branch 'main' into upstream-accelerate
IlyasMoutawwakil Feb 18, 2025
62c64fd
fix
IlyasMoutawwakil Feb 18, 2025
ed6cc03
test
IlyasMoutawwakil Feb 18, 2025
50c783f
ACCELERATE_BYPASS_DEVICE_MAP for now
IlyasMoutawwakil Feb 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
243 changes: 122 additions & 121 deletions .github/workflows/slow_tests.yml

Large diffs are not rendered by default.

312 changes: 155 additions & 157 deletions .github/workflows/slow_tests_gaudi2.yml

Large diffs are not rendered by default.

37 changes: 37 additions & 0 deletions .github/workflows/test_accelerate_gaudi2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Accelerate integration tests

on:
workflow_dispatch:
pull_request:
branches: [main]
push:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
accelerate:
name: Test Accelerate integration
runs-on: [self-hosted, linux, x64, gaudi2, fast]

steps:
- name: Checkout
uses: actions/checkout@v4
- name: Pull image
run: |
docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
- name: Run tests
run: |
docker run \
--rm \
--ipc=host \
--runtime=habana \
--cap-add=sys_nice \
-v $PWD:/root/workspace \
--workdir=/root/workspace \
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
-e HABANA_VISIBLE_DEVICES=$DOCKER_HABANA_VISIBLE_DEVICES \
vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest \
/bin/bash tests/ci/accelerate.sh
5 changes: 2 additions & 3 deletions examples/stable-diffusion/training/textual_inversion.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import torch.nn.functional as F
import torch.utils.checkpoint
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import ProjectConfiguration
from diffusers import (
Expand All @@ -51,7 +52,6 @@
from transformers import CLIPTextModel, CLIPTokenizer

from optimum.habana import GaudiConfig
from optimum.habana.accelerate import GaudiAccelerator
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline
from optimum.habana.utils import set_seed

Expand Down Expand Up @@ -588,12 +588,11 @@ def main():

gaudi_config = GaudiConfig.from_pretrained(args.gaudi_config_name)

accelerator = GaudiAccelerator(
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision="bf16" if gaudi_config.use_torch_autocast or args.bf16 else "no",
log_with=args.report_to,
project_config=accelerator_project_config,
force_autocast=gaudi_config.use_torch_autocast or args.bf16,
)

if args.report_to == "wandb":
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import torch.nn.functional as F
import torch.utils.checkpoint
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import ProjectConfiguration
from diffusers import (
Expand All @@ -48,7 +49,6 @@
from tqdm.auto import tqdm

from optimum.habana import GaudiConfig
from optimum.habana.accelerate import GaudiAccelerator
from optimum.habana.diffusers import (
GaudiStableDiffusionXLPipeline,
)
Expand Down Expand Up @@ -576,12 +576,11 @@ def main():

gaudi_config = GaudiConfig.from_pretrained(args.gaudi_config_name)

accelerator = GaudiAccelerator(
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision="bf16" if gaudi_config.use_torch_autocast or args.bf16 else "no",
log_with=args.report_to,
project_config=accelerator_project_config,
force_autocast=gaudi_config.use_torch_autocast or args.bf16,
)

if args.report_to == "wandb":
Expand Down
5 changes: 2 additions & 3 deletions examples/stable-diffusion/training/train_controlnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import torch.nn.functional as F
import torch.utils.checkpoint
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import ProjectConfiguration
from datasets import load_dataset
Expand All @@ -54,7 +55,6 @@
from transformers import AutoTokenizer, PretrainedConfig

from optimum.habana import GaudiConfig
from optimum.habana.accelerate import GaudiAccelerator
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionControlNetPipeline
from optimum.habana.utils import set_seed

Expand Down Expand Up @@ -765,12 +765,11 @@ def main(args):
# Set autocast to True for --bf16
if args.bf16:
gaudi_config.use_torch_autocast = True
accelerator = GaudiAccelerator(
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision="bf16" if gaudi_config.use_torch_autocast else "no",
log_with=args.report_to,
project_config=accelerator_project_config,
force_autocast=gaudi_config.use_torch_autocast,
)

# Make one log on every process with the configuration for debugging.
Expand Down
9 changes: 4 additions & 5 deletions examples/stable-diffusion/training/train_dreambooth.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,10 @@
import torch.nn.functional as F
import torch.utils.checkpoint
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import DistributedDataParallelKwargs
from accelerate.utils.dataclasses import DistributedType
from diffusers import (
AutoencoderKL,
DDPMScheduler,
Expand All @@ -60,8 +62,6 @@
from transformers import AutoTokenizer, PretrainedConfig

from optimum.habana import GaudiConfig
from optimum.habana.accelerate import GaudiAccelerator
from optimum.habana.accelerate.utils.dataclasses import GaudiDistributedType
from optimum.habana.diffusers import GaudiStableDiffusionPipeline
from optimum.habana.transformers.trainer import _is_peft_model
from optimum.habana.utils import set_seed
Expand Down Expand Up @@ -834,12 +834,11 @@ def main(args):

gaudi_config = GaudiConfig.from_pretrained(args.gaudi_config_name)
gaudi_config.use_torch_autocast = gaudi_config.use_torch_autocast or args.mixed_precision == "bf16"
accelerator = GaudiAccelerator(
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_dir=logging_dir,
force_autocast=gaudi_config.use_torch_autocast,
)
if args.report_to == "wandb":
import wandb
Expand Down Expand Up @@ -1088,7 +1087,7 @@ def unwrap_model(model, training=False):
if not training:
return model
else:
if accelerator.distributed_type == GaudiDistributedType.MULTI_HPU:
if accelerator.distributed_type == DistributedType.MULTI_HPU:
kwargs = {}
kwargs["gradient_as_bucket_view"] = True
accelerator.ddp_handler = DistributedDataParallelKwargs(**kwargs)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,10 @@
import torch
import torch.utils.checkpoint
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import DistributedDataParallelKwargs, ProjectConfiguration
from accelerate.utils.dataclasses import DistributedType
from datasets import load_dataset
from diffusers import (
AutoencoderKL,
Expand Down Expand Up @@ -68,8 +70,6 @@
from transformers import T5EncoderModel

from optimum.habana import GaudiConfig
from optimum.habana.accelerate import GaudiAccelerator
from optimum.habana.accelerate.utils.dataclasses import GaudiDistributedType
from optimum.habana.utils import set_seed


Expand Down Expand Up @@ -643,12 +643,11 @@ def main(args):

accelerator_project_config = ProjectConfiguration(project_dir=args.output_dir, logging_dir=logging_dir)
kwargs = DistributedDataParallelKwargs(find_unused_parameters=False)
accelerator = GaudiAccelerator(
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_config=accelerator_project_config,
force_autocast=gaudi_config.use_torch_autocast,
kwargs_handlers=[kwargs],
)

Expand Down Expand Up @@ -762,7 +761,7 @@ def save_model_hook(models, weights, output_dir):
def load_model_hook(models, input_dir):
transformer_ = None

if not accelerator.distributed_type == GaudiDistributedType.DEEPSPEED:
if not accelerator.distributed_type == DistributedType.DEEPSPEED:
while len(models) > 0:
model = models.pop()

Expand Down Expand Up @@ -1075,7 +1074,7 @@ def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
progress_bar.update(1)
global_step += 1

if accelerator.is_main_process or accelerator.distributed_type == GaudiDistributedType.DEEPSPEED:
if accelerator.is_main_process or accelerator.distributed_type == DistributedType.DEEPSPEED:
if global_step % args.checkpointing_steps == 0:
# _before_ saving state, check if this save would set us over the `checkpoints_total_limit`
if args.checkpoints_total_limit is not None:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,10 @@
import torch.nn.functional as F
import torch.utils.checkpoint
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import DistributedDataParallelKwargs
from accelerate.utils.dataclasses import DistributedType
from diffusers import (
AutoencoderKL,
DDPMScheduler,
Expand Down Expand Up @@ -67,8 +69,6 @@
from transformers import AutoTokenizer, PretrainedConfig

from optimum.habana import GaudiConfig
from optimum.habana.accelerate import GaudiAccelerator
from optimum.habana.accelerate.utils.dataclasses import GaudiDistributedType
from optimum.habana.diffusers import GaudiStableDiffusionXLPipeline
from optimum.habana.transformers.trainer import _is_peft_model
from optimum.habana.utils import set_seed
Expand Down Expand Up @@ -821,12 +821,11 @@ def main(args):
logging_dir = Path(args.output_dir, args.logging_dir)
gaudi_config = GaudiConfig.from_pretrained(args.gaudi_config_name)
gaudi_config.use_torch_autocast = gaudi_config.use_torch_autocast or args.mixed_precision == "bf16"
accelerator = GaudiAccelerator(
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_dir=logging_dir,
force_autocast=gaudi_config.use_torch_autocast,
)
if args.report_to == "wandb":
if not is_wandb_available():
Expand Down Expand Up @@ -1019,7 +1018,7 @@ def unwrap_model(model, training=False):
if not training:
return model
else:
if accelerator.distributed_type == GaudiDistributedType.MULTI_HPU:
if accelerator.distributed_type == DistributedType.MULTI_HPU:
kwargs = {}
kwargs["gradient_as_bucket_view"] = True
accelerator.ddp_handler = DistributedDataParallelKwargs(**kwargs)
Expand Down
11 changes: 5 additions & 6 deletions examples/stable-diffusion/training/train_text_to_image_sdxl.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,10 @@
import torch.nn.functional as F
import torch.utils.checkpoint
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import DistributedDataParallelKwargs, ProjectConfiguration
from accelerate.utils.dataclasses import DistributedType
from datasets import load_dataset
from diffusers import (
AutoencoderKL,
Expand All @@ -61,8 +63,6 @@
from transformers import AutoTokenizer, PretrainedConfig

from optimum.habana import GaudiConfig
from optimum.habana.accelerate import GaudiAccelerator
from optimum.habana.accelerate.utils.dataclasses import GaudiDistributedType
from optimum.habana.diffusers import (
GaudiDDIMScheduler,
GaudiEulerAncestralDiscreteScheduler,
Expand Down Expand Up @@ -714,12 +714,11 @@ def main(args):

gaudi_config = GaudiConfig.from_pretrained(args.gaudi_config_name)
gaudi_config.use_torch_autocast = gaudi_config.use_torch_autocast or args.bf16
accelerator = GaudiAccelerator(
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision="bf16" if gaudi_config.use_torch_autocast else "no",
log_with=args.report_to,
project_config=accelerator_project_config,
force_autocast=gaudi_config.use_torch_autocast,
)

if args.report_to == "wandb":
Expand Down Expand Up @@ -896,7 +895,7 @@ def main(args):
for idx, dt in enumerate(dataset["train"]):
dt["image"].save(f"{args.mediapipe}/{idx}.jpg")
f.write(dt["text"] + "\n")
if accelerator.distributed_type != GaudiDistributedType.NO:
if accelerator.distributed_type != DistributedType.NO:
torch.distributed.barrier()

from media_pipe_imgdir import get_dataset_for_pipeline
Expand Down Expand Up @@ -1145,7 +1144,7 @@ def unwrap_model(model, training=False):
if not training:
return model
else:
if accelerator.distributed_type == GaudiDistributedType.MULTI_HPU:
if accelerator.distributed_type == DistributedType.MULTI_HPU:
kwargs = {}
kwargs["gradient_as_bucket_view"] = True
accelerator.ddp_handler = DistributedDataParallelKwargs(**kwargs)
Expand Down
4 changes: 2 additions & 2 deletions examples/trl/ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@
from typing import List, Optional

import torch
from accelerate import Accelerator
from datasets import load_dataset
from peft import LoraConfig
from tqdm import tqdm
from transformers import Adafactor, AutoModelForSequenceClassification, AutoTokenizer, HfArgumentParser, pipeline
from trl import AutoModelForCausalLMWithValueHead
from trl.core import LengthSampler

from optimum.habana.accelerate import GaudiAccelerator
from optimum.habana.trl import GaudiPPOConfig, GaudiPPOTrainer, adapt_PreTrainedModelWrapper_to_gaudi
from optimum.habana.utils import set_seed

Expand Down Expand Up @@ -191,7 +191,7 @@ def collator(data):
set_seed(config.seed)

# Now let's build the model, the reference model, and the tokenizer.
current_device = GaudiAccelerator().local_process_index
current_device = Accelerator().local_process_index
lora_config = LoraConfig(
r=script_args.lora_r,
lora_alpha=script_args.lora_alpha,
Expand Down
4 changes: 2 additions & 2 deletions optimum/habana/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
import transformers
import transformers.utils.fx

from ..accelerate.utils import extract_model_from_parallel
from ..accelerate.utils.modeling import gaudi_check_device_same
from ..local_accelerate.utils import extract_model_from_parallel
from ..local_accelerate.utils.modeling import gaudi_check_device_same
from ..quantizers.bitsandbytes import (
gaudi_bitsandbytesconfig_post_init,
gaudi_create_quantized_param,
Expand Down
Loading
Loading