Bugfix : Prompt Inference 실행시, 모델의 device setting이 flag와 맞지 않는 버그 해결 #26

Se-Hun · 2023-01-03T15:03:21Z

버그 발생 상황

GPU 환경에서 기존의 코드를 동작시킬 때 아래와 같은 에러가 발생합니다.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)

해결 방법

위의 에러가 발생하는 원인은 KoGPTInference 클래스의 생성자가 호출될 때 입력 받은 device 변수를 통해 GPT 모델의 device가 설정되지 않아 발생하는 문제로 보입니다.
따라서, 아래와 같은 코드를 추가하여 문제를 해결하였습니다.

class KoGPTInference:
    def __init__(
            self,
            pretrained_model_name_or_path: Optional[Union[str, os.PathLike]],
            revision: str = 'KoGPT6B-ryan1.5b-float16',
            device: str = 'cuda',
            model_parallel: bool = False,
    ):

           ...

            self.model = GPTJForCausalLM.from_pretrained(
                  pretrained_model_name_or_path,  revision=revision,
                  pad_token_id=self.tokenizer.eos_token_id,
                  torch_dtype='auto', low_cpu_mem_usage=True
              ).to(device)

           ...

cosine0

Single GPU를 고려하지 않은 저의 실수네요.

https://github.com/kakaobrain/kogpt/pull/26/files#diff-35c6d888319d17cc536568888b31dbf47b3610492cf2588b2a24af4dc51db2c4L38-L42

이 부분에 분기를 넣은 것은 self.model.to(device) 할 때 메모리가 터질 수 있을 것을 염려해서 self.model.parallelize()로 처음부터 여러 디바이스에 나눠서 올리려 한 것이었습니다.
@Se-Hun 님 말씀대로 이 문제를 해결할 때 모델 선언에서 .to(device)하는 대신 뒷부분에서 깔끔하게

if model_parallel:
    self.model.parallelize()
else:
    self.model.to(device)

이렇게 하거나, CPU 사용시 model_parallelize가 불가능한 점을 반영하여

if self.device != 'cpu' and model_parallelize:
    self.model.parallelize()
else:
    self.model.to(device)

이렇게 하면 좋을 것 같습니다.

bugfix : not matched device setting between model and flags

58962bf

cosine0 suggested changes Feb 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix : Prompt Inference 실행시, 모델의 device setting이 flag와 맞지 않는 버그 해결 #26

Bugfix : Prompt Inference 실행시, 모델의 device setting이 flag와 맞지 않는 버그 해결 #26

Se-Hun commented Jan 3, 2023

cosine0 left a comment •

edited

Loading

Bugfix : Prompt Inference 실행시, 모델의 device setting이 flag와 맞지 않는 버그 해결 #26

Are you sure you want to change the base?

Bugfix : Prompt Inference 실행시, 모델의 device setting이 flag와 맞지 않는 버그 해결 #26

Conversation

Se-Hun commented Jan 3, 2023

버그 발생 상황

해결 방법

cosine0 left a comment • edited Loading

Choose a reason for hiding this comment

cosine0 left a comment •

edited

Loading