Git Product home page Git Product logo

Comments (1)

pipiyapi avatar pipiyapi commented on August 24, 2024

上面问题解决了,是由于我的数据有误,但分布式训练又遇到新问题,分布式只有一张卡工作,但另一张卡也是gpu满的。
(openke) jupyter-xingcheng@dell:~/OpenKE2.0$ python -m torch.distributed.launch --nproc_per_node 2 train_rotate_data_390_dist.py
/home/jupyter-xingcheng/.conda/envs/openke/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Input Files Path : ./benchmarks/data-390/
The toolkit is importing datasets.
The total of relations is 28.
The total of entities is 700324.
Input Files Path : ./benchmarks/data-390/
The toolkit is importing datasets.
The total of relations is 28.
The total of entities is 700324.
The total of train triples is 2849846.
The total of train triples is 2849846.
Input Files Path : ./benchmarks/data-390/
Input Files Path : ./benchmarks/data-390/
The total of test triples is 258712.
The total of valid triples is 1293564.
The total of test triples is 258712.
The total of valid triples is 1293564.
Finish initializing...
0%| | 0/6000 [00:00<?, ?it/s]Finish initializing...
Epoch 0 | loss: 1141.047029: 0%| | 1/6000 [03:04<307:32:46, 184.56s/it

以下是nvidi-smi使用情况:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:3B:00.0 Off | N/A |
| 70% 86C P2 297W / 350W | 22428MiB / 24576MiB | 89% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:D8:00.0 Off | N/A |
| 88% 88C P2 278W / 350W | 22428MiB / 24576MiB | 89% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3311491 C ...cheng/.conda/envs/openke/bin/python 22422MiB |
| 1 N/A N/A 3311492 C ...cheng/.conda/envs/openke/bin/python 22422MiB |
+-----------------------------------------------------------------------------------------+

from openke.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.