Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: GPU memory does not get cleared after stopping training. (SD1.5) #649

Open
AlexanderZhk opened this issue Jan 20, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@AlexanderZhk
Copy link

AlexanderZhk commented Jan 20, 2025

What happened?

When stopping training in the GUI (Stop Training button) with SD1.5 fine-tuning, GPU memory does not get cleared. So next run OOM-s
Tested on e895ddc, where it does successfully clear.

I'm running CUDA 12.4 on Ubuntu, please let me know what more information would be helpful

If logs are needed, I'll also be happy to provide them, but haven't seen anything out of the ordinary. Just the OneTrainer process taking around the same VRAM as while training.

Only thing that helps, is completely restarting OneTrainer.

What did you expect would happen?

Succesfull gc after stopping the run, like in earlier versions. My guess is def torch_gc() fails on torch 2.5.x or with CUDA 12.4, but I'm not too familiar with it to troubleshoot further.

What I tried without success:
calling del on model, optimizer, calling torch_gc() after the model has been saved.

Output of pip freeze

1 to 1 with requirements-cuda.txt, as I'm running on a clean VM

my config.json

config.json
{
    "__version": 6,
    "training_method": "FINE_TUNE",
    "model_type": "STABLE_DIFFUSION_15",
    "debug_mode": false,
    "debug_dir": "debug",
    "workspace_dir": "workspace/run",
    "cache_dir": "workspace-cache/run",
    "tensorboard": true,
    "tensorboard_expose": false,
    "tensorboard_port": 6006,
    "validation": false,
    "validate_after": 10,
    "validate_after_unit": "NEVER",
    "continue_last_backup": false,
    "include_train_config": "NONE",
    "base_model_name": "/home/root/OneTrainer/models/model3.safetensors",
    "weight_dtype": "FLOAT_32",
    "output_dtype": "FLOAT_32",
    "output_model_format": "SAFETENSORS",
    "output_model_destination": "/home/root/OneTrainer/models/model3.safetensors",
    "gradient_checkpointing": "OFF",
    "enable_async_offloading": true,
    "enable_activation_offloading": true,
    "layer_offload_fraction": 0.0,
    "force_circular_padding": false,
    "concept_file_name": "training_concepts/test.json",
    "concepts": null,
    "aspect_ratio_bucketing": true,
    "latent_caching": false,
    "clear_cache_before_training": false,
    "learning_rate_scheduler": "LINEAR",
    "custom_learning_rate_scheduler": null,
    "scheduler_params": [
        {
            "key": "",
            "value": ""
        }
    ],
    "learning_rate": 6e-06,
    "learning_rate_warmup_steps": 0.0,
    "learning_rate_cycles": 1.0,
    "learning_rate_min_factor": 0.0,
    "epochs": 5,
    "batch_size": 3,
    "gradient_accumulation_steps": 1,
    "ema": "OFF",
    "ema_decay": 0.8,
    "ema_update_step_interval": 999,
    "dataloader_threads": 4,
    "train_device": "cuda",
    "temp_device": "cuda",
    "train_dtype": "BFLOAT_16",
    "fallback_train_dtype": "BFLOAT_16",
    "enable_autocast_cache": false,
    "only_cache": false,
    "resolution": "512",
    "attention_mechanism": "XFORMERS",
    "align_prop": false,
    "align_prop_probability": 1.0,
    "align_prop_loss": "AESTHETIC",
    "align_prop_weight": 0.1,
    "align_prop_steps": 20,
    "align_prop_truncate_steps": 0.5,
    "align_prop_cfg_scale": 7.0,
    "mse_strength": 0.0,
    "mae_strength": 1.0,
    "log_cosh_strength": 0.0,
    "vb_loss_strength": 1.0,
    "loss_weight_fn": "CONSTANT",
    "loss_weight_strength": 5.0,
    "dropout_probability": 0.01,
    "loss_scaler": "NONE",
    "learning_rate_scaler": "NONE",
    "clip_grad_norm": null,
    "offset_noise_weight": 0.0,
    "perturbation_noise_weight": 0.0,
    "rescale_noise_scheduler_to_zero_terminal_snr": false,
    "force_v_prediction": false,
    "force_epsilon_prediction": false,
    "min_noising_strength": 0.0,
    "max_noising_strength": 1.0,
    "timestep_distribution": "UNIFORM",
    "noising_weight": 1.0,
    "noising_bias": 0.0,
    "unet": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": 0,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "prior": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": 0,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "text_encoder": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "text_encoder_layer_skip": 0,
    "text_encoder_2": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": 30,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "text_encoder_2_layer_skip": 0,
    "text_encoder_3": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": 30,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "text_encoder_3_layer_skip": 0,
    "vae": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "effnet_encoder": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "decoder": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "decoder_text_encoder": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "decoder_vqgan": {
        "__version": 0,
        "model_name": "",
        "include": true,
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE",
        "dropout_probability": 0.0,
        "train_embedding": true,
        "attention_mask": false,
        "guidance_scale": 1.0
    },
    "masked_training": false,
    "unmasked_probability": 0.97,
    "unmasked_weight": 0.1,
    "normalize_masked_area_loss": false,
    "embedding_learning_rate": null,
    "preserve_embedding_norm": false,
    "embedding": {
        "__version": 0,
        "uuid": "6a1518cd-ddef-4fd5-aea4-da8019004a3f",
        "model_name": "",
        "placeholder": "<embedding>",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "token_count": 1,
        "initial_embedding_text": "*"
    },
    "additional_embeddings": [],
    "embedding_weight_dtype": "FLOAT_32",
    "cloud": {
        "__version": 0,
        "enabled": false,
        "type": "RUNPOD",
        "file_sync": "NATIVE_SCP",
        "create": true,
        "name": "OneTrainer",
        "tensorboard_tunnel": true,
        "sub_type": "",
        "gpu_type": "",
        "volume_size": 100,
        "min_download": 0,
        "remote_dir": "/workspace",
        "huggingface_cache_dir": "/workspace/huggingface_cache",
        "onetrainer_dir": "/workspace/OneTrainer",
        "install_cmd": "git clone https://github.com/Nerogar/OneTrainer",
        "install_onetrainer": true,
        "update_onetrainer": true,
        "detach_trainer": false,
        "run_id": "job1",
        "download_samples": true,
        "download_output_model": true,
        "download_saves": true,
        "download_backups": false,
        "download_tensorboard": false,
        "delete_workspace": false,
        "on_finish": "NONE",
        "on_error": "NONE",
        "on_detached_finish": "NONE",
        "on_detached_error": "NONE"
    },
    "peft_type": "LOHA",
    "lora_model_name": "",
    "lora_rank": 32,
    "lora_alpha": 32.0,
    "lora_decompose": false,
    "lora_decompose_norm_epsilon": true,
    "lora_weight_dtype": "BFLOAT_16",
    "lora_layers": "attentions",
    "lora_layer_preset": "attn-mlp",
    "bundle_additional_embeddings": true,
    "optimizer": {
        "__version": 0,
        "optimizer": "ADAMW",
        "adam_w_mode": false,
        "alpha": null,
        "amsgrad": false,
        "beta1": 0.9,
        "beta2": 0.999,
        "beta3": null,
        "bias_correction": false,
        "block_wise": false,
        "capturable": false,
        "centered": false,
        "clip_threshold": null,
        "d0": null,
        "d_coef": null,
        "dampening": null,
        "decay_rate": null,
        "decouple": false,
        "differentiable": false,
        "eps": 1e-08,
        "eps2": null,
        "foreach": false,
        "fsdp_in_use": false,
        "fused": false,
        "fused_back_pass": false,
        "growth_rate": null,
        "initial_accumulator_value": null,
        "is_paged": false,
        "log_every": null,
        "lr_decay": null,
        "max_unorm": null,
        "maximize": false,
        "min_8bit_size": null,
        "momentum": null,
        "nesterov": false,
        "no_prox": false,
        "optim_bits": null,
        "percentile_clipping": null,
        "r": null,
        "relative_step": false,
        "safeguard_warmup": false,
        "scale_parameter": false,
        "stochastic_rounding": true,
        "use_bias_correction": false,
        "use_triton": false,
        "warmup_init": false,
        "weight_decay": 0.03,
        "weight_lr_power": null,
        "decoupled_decay": false,
        "fixed_decay": false,
        "rectify": false,
        "degenerated_to_sgd": false,
        "k": null,
        "xi": null,
        "n_sma_threshold": null,
        "ams_bound": false,
        "adanorm": false,
        "adam_debias": false,
        "slice_p": null,
        "cautious": false
    },
    "optimizer_defaults": {
        "SGD": {
            "__version": 0,
            "optimizer": "SGD",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": null,
            "beta2": null,
            "beta3": null,
            "bias_correction": false,
            "block_wise": false,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": 0.0,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": null,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": null,
            "momentum": 0.0,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": null,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.01,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "SCHEDULE_FREE_ADAMW": {
            "__version": 0,
            "optimizer": "SCHEDULE_FREE_ADAMW",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": 0.9,
            "beta2": 0.999,
            "beta3": null,
            "bias_correction": false,
            "block_wise": false,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-08,
            "eps2": null,
            "foreach": true,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": null,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": null,
            "r": 0.0,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.1,
            "weight_lr_power": 0.0,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "ADAGRAD_8BIT": {
            "__version": 0,
            "optimizer": "ADAGRAD_8BIT",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": null,
            "beta2": null,
            "beta3": null,
            "bias_correction": false,
            "block_wise": true,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-10,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": 0,
            "is_paged": false,
            "log_every": null,
            "lr_decay": 0.0,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": 4096,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": 8,
            "percentile_clipping": 100.0,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.05,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "SGD_8BIT": {
            "__version": 0,
            "optimizer": "SGD_8BIT",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": null,
            "beta2": null,
            "beta3": null,
            "bias_correction": false,
            "block_wise": true,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": 0.0,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": null,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": 4096,
            "momentum": 0.0001,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": 100.0,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.05,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "ADAMW_8BIT": {
            "__version": 0,
            "optimizer": "ADAMW_8BIT",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": 0.9,
            "beta2": 0.999,
            "beta3": null,
            "bias_correction": false,
            "block_wise": true,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-08,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": 4096,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": 32,
            "percentile_clipping": 100.0,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.01,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "ADAFACTOR": {
            "__version": 0,
            "optimizer": "ADAFACTOR",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": null,
            "beta2": null,
            "beta3": null,
            "bias_correction": false,
            "block_wise": false,
            "capturable": false,
            "centered": false,
            "clip_threshold": 1.0,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": -0.8,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-30,
            "eps2": 0.001,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": null,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": null,
            "r": null,
            "relative_step": true,
            "safeguard_warmup": false,
            "scale_parameter": true,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.03,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "ADAM_8BIT": {
            "__version": 0,
            "optimizer": "ADAM_8BIT",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": 0.9,
            "beta2": 0.999,
            "beta3": null,
            "bias_correction": false,
            "block_wise": true,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-08,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": 4096,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": 32,
            "percentile_clipping": 100.0,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.03,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "ADAMW": {
            "__version": 0,
            "optimizer": "ADAMW",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": 0.9,
            "beta2": 0.999,
            "beta3": null,
            "bias_correction": false,
            "block_wise": false,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-08,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": null,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": null,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.03,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "PRODIGY": {
            "__version": 0,
            "optimizer": "PRODIGY",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": 0.9,
            "beta2": 0.99,
            "beta3": null,
            "bias_correction": false,
            "block_wise": false,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": 1e-06,
            "d_coef": 1.0,
            "dampening": null,
            "decay_rate": null,
            "decouple": true,
            "differentiable": false,
            "eps": 1e-08,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": "inf",
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": null,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": null,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": true,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.01,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "ADAM": {
            "__version": 0,
            "optimizer": "ADAM",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": 0.9,
            "beta2": 0.999,
            "beta3": null,
            "bias_correction": false,
            "block_wise": false,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-08,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": true,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": null,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": null,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": false,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.03,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "RMSPROP": {
            "__version": 0,
            "optimizer": "RMSPROP",
            "adam_w_mode": false,
            "alpha": 0.99,
            "amsgrad": false,
            "beta1": null,
            "beta2": null,
            "beta3": null,
            "bias_correction": false,
            "block_wise": true,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-08,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": 4096,
            "momentum": 0.0,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": 32,
            "percentile_clipping": 100.0,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.01,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        },
        "RMSPROP_8BIT": {
            "__version": 0,
            "optimizer": "RMSPROP_8BIT",
            "adam_w_mode": false,
            "alpha": 0.99,
            "amsgrad": false,
            "beta1": null,
            "beta2": null,
            "beta3": null,
            "bias_correction": false,
            "block_wise": true,
            "capturable": false,
            "centered": false,
            "clip_threshold": null,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": null,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-08,
            "eps2": null,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": 4096,
            "momentum": 0.0,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": 100.0,
            "r": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.01,
            "weight_lr_power": null,
            "decoupled_decay": false,
            "fixed_decay": false,
            "rectify": false,
            "degenerated_to_sgd": false,
            "k": null,
            "xi": null,
            "n_sma_threshold": null,
            "ams_bound": false,
            "adanorm": false,
            "adam_debias": false,
            "slice_p": null,
            "cautious": false
        }
    },
    "sample_definition_file_name": "training_samples/samples.json",
    "samples": null,
    "sample_after": 1,
    "sample_after_unit": "HOUR",
    "sample_image_format": "JPG",
    "sample_video_format": "MP4",
    "sample_audio_format": "MP3",
    "samples_to_tensorboard": true,
    "non_ema_sampling": false,
    "backup_after": 3,
    "backup_after_unit": "NEVER",
    "rolling_backup": false,
    "rolling_backup_count": 3,
    "backup_before_save": false,
    "save_every": 10,
    "save_every_unit": "HOUR",
    "save_skip_first": 0,
    "save_filename_prefix": "newer_latest"
}
@AlexanderZhk AlexanderZhk added the bug Something isn't working label Jan 20, 2025
@O-J1
Copy link
Collaborator

O-J1 commented Jan 20, 2025

Edit your post to include your config.json please, (just ctrl + f replace your username). This looks like a known bug that occurs when using offload, but I’m not sure why you would be using that with sd 1.5, so I want to make sure.

@AlexanderZhk
Copy link
Author

Edit your post to include your config.json please, (just ctrl + f replace your username). This looks like a known bug that occurs when using offload, but I’m not sure why you would be using that with sd 1.5, so I want to make sure.

Added my config.
Relevant variables from what I see:

    "enable_async_offloading": true,
    "enable_activation_offloading": true,
    "layer_offload_fraction": 0.0,

As you noticed, with SD1.5 I didn't have an intent to offload, but I do not rule out I accidentally turned them on, I don't even see a toggle for offloading in the GUI.👀
I will try setting these two variables to false in the config and test again with the latest version once my current training run is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants