ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on.

#35

by madhurjindal - opened Apr 19, 2024

Apr 19, 2024

•

edited Apr 19, 2024

ValueError Traceback (most recent call last)
Cell In[17], line 25
22 print_trainable_parameters(model)
24 # Apply the accelerator. You can comment this out to remove the accelerator.
---> 25 model = accelerator.prepare_model(model)

File /vc_data/shankum/miniconda3/envs/llm2/lib/python3.11/site-packages/accelerate/accelerator.py:1392, in Accelerator.prepare_model(self, model, device_placement, evaluation_mode)
1389 if torch.device(current_device_index) != self.device:
1390 # if on the first device (GPU 0) we don't care
1391 if (self.device.index is not None) or (current_device_index != 0):
-> 1392 raise ValueError(
1393 "You can't train a model that has been loaded in 8-bit precision on a different device than the one "
1394 "you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}"
1395 )
1397 if "cpu" in model_devices or "disk" in model_devices:
1398 raise ValueError(
1399 "You can't train a model that has been loaded in 8-bit precision with CPU or disk offload."
1400 )

ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}

97k

Apr 19, 2024

Are your tokens/ouptut from tokenizer, on the same device on which your model is loaded on?
If your model is on GPU, then make sure you update the token tensors , to GPU as well.

madhurjindal

Apr 20, 2024

This is an accelerate issue where I am using a multi-gpu setup. I have used the same setup with other SLMs like Zephyr, Llama2 and they seem to work

ChristianPalaArtificialy

Apr 21, 2024

•

edited Apr 21, 2024

Playing around with the accelerate settings fixed it for me.

madhurjindal

Apr 22, 2024

@ChristianPalaArtificialy could you share the settings you are using

saireddy

Apr 23, 2024

@madhurjindal what kind of hardware are you using? and how many gpus ?

madhurjindal

Apr 23, 2024

•

edited Apr 23, 2024

@saireddy I am using 8xV100 32GB

saireddy

Apr 23, 2024

@madhurjindal can you try using 4 of those than 8. its funny but this has fixed for me

OnurC

Apr 25, 2024

same issue

madhurjindal

Apr 27, 2024

@saireddy didn't fix it for me

ducknificient

May 14, 2024

•

edited May 14, 2024

Playing around with the accelerate settings fixed it for me.

could you elaborate more, please ? thanks

ybelkada

May 14, 2024

Hi everyone!
In order to fix this issue, you need to make sure to force-load the model into a single GPU and replicate that across all GPUs, to achieve this, please follow the solution proposed here: https://github.com/huggingface/accelerate/issues/1840#issuecomment-1683105994

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment