How to infer using this model? Could you provide an example code
#6
by
liu00
- opened
whhere is the code example/24B/run.sh
liu00
changed discussion status to
closed
Why was this issue closed? I also cannot find this. I have an examples folder but everything is different.
Thanks, I did the docker instructions then realised I need to also do the source code instructions to get the GitHub repo.
Then I got this error:
W0427 18:02:13.684000 135185498919360 torch/distributed/run.py:778] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0427 18:02:13.684000 135185498919360 torch/distributed/run.py:778] *****************************************
/usr/bin/python: can't open file '/workspace/MagiAttention/main.py': [Errno 2] No such file or directory
/usr/bin/python: can't open file '/workspace/MagiAttention/main.py': [Errno 2] No such file or directory
/usr/bin/python: can't open file '/workspace/MagiAttention/main.py': [Errno 2] No such file or directory
/usr/bin/python: can't open file '/workspace/MagiAttention/main.py': [Errno 2] No such file or directory
/usr/bin/python: can't open file '/workspace/MagiAttention/main.py': [Errno 2] No such file or directory
/usr/bin/python: can't open file '/workspace/MagiAttention/main.py': [Errno 2] No such file or directory
/usr/bin/python: can't open file '/workspace/MagiAttention/main.py': [Errno 2] No such file or directory
/usr/bin/python: can't open file '/workspace/MagiAttention/main.py': [Errno 2] No such file or directory
E0427 18:02:13.799000 135185498919360 torch/distributed/elastic/multiprocessing/api.py:832] failed (exitcode: 2) local_rank: 0 (pid: 2182) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.4.0a0+3bcc3cddb5.nv24.7', 'console_scripts', 'torchrun')())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 900, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 891, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
main.py FAILED
Failures:
[1]:
time : 2025-04-27_18:02:13
host : leo-X399-AORUS-Gaming-7
rank : 1 (local_rank: 1)
exitcode : 2 (pid: 2183)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2025-04-27_18:02:13
host : leo-X399-AORUS-Gaming-7
rank : 2 (local_rank: 2)
exitcode : 2 (pid: 2184)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2025-04-27_18:02:13
host : leo-X399-AORUS-Gaming-7
rank : 3 (local_rank: 3)
exitcode : 2 (pid: 2185)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2025-04-27_18:02:13
host : leo-X399-AORUS-Gaming-7
rank : 4 (local_rank: 4)
exitcode : 2 (pid: 2186)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2025-04-27_18:02:13
host : leo-X399-AORUS-Gaming-7
rank : 5 (local_rank: 5)
exitcode : 2 (pid: 2187)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time : 2025-04-27_18:02:13
host : leo-X399-AORUS-Gaming-7
rank : 6 (local_rank: 6)
exitcode : 2 (pid: 2188)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
time : 2025-04-27_18:02:13
host : leo-X399-AORUS-Gaming-7
rank : 7 (local_rank: 7)
exitcode : 2 (pid: 2189)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure):
[0]:
time : 2025-04-27_18:02:13
host : leo-X399-AORUS-Gaming-7
rank : 0 (local_rank: 0)
exitcode : 2 (pid: 2182)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html