Git Product home page Git Product logo

Comments (5)

classicsong avatar classicsong commented on May 20, 2024

Can you check cpu/gpu usage?

from dgl-ke.

vlad43210 avatar vlad43210 commented on May 20, 2024

I have this same issue -- tried it with and without the GPU. CPU usage just quickly drops to 0 and the program hangs there with no error:

DGLBACKEND=pytorch dglke_eval \
> --data_path (data path) --dataset mydataset \
> --data_files entities.tsv relations.tsv all_ctups_2.tsv valid.tsv test.tsv --format udd_hrt \
> --model_name ComplEx \
> --hidden_dim 512 --gamma 175 \
> --num_proc 8 --num_thread 4 \
> --batch_size_eval 1024 --neg_sample_size_eval 10000 --eval_percent 5 \
> --model_path (model path)
Using backend: pytorch
Reading train triples....
Finished. Read 211968 train triples.
Reading valid triples....
Finished. Read 27026 valid triples.
Reading test triples....
Finished. Read 31796 test triples.
/opt/conda/envs/kge/lib/python3.8/site-packages/dgl/base.py:25: UserWarning: multigraph will be deprecated.DGL will treat all graphs as multigraph in the future.
  warnings.warn(msg, warn_type)
|valid|: 27026
|test|: 31796

Package versions:

- dgl==0.4.3post2
- dglke==0.1.2
- torch==1.7.1

from dgl-ke.

mtoles avatar mtoles commented on May 20, 2024

Has anyone discovered a fix? I am experiencing a similar problem.

from dgl-ke.

classicsong avatar classicsong commented on May 20, 2024

Can you provide more details?

from dgl-ke.

PoloWitty avatar PoloWitty commented on May 20, 2024

I meet the same issue, too. But I found it may be because the eval process is too slow.
The cmd are as followed:

dglke_eval \
    --model_name TransE \
    --data_path ~/KGE/data/BIOS \
    --dataset BIOS \
    --format raw_udd_hrt \
    --data_files train.tsv valid.tsv test.tsv \
    --hidden_dim 512 \
    --batch_size_eval 256 \
    --neg_sample_size_eval 256 \
    --model_path ~/KGE/output/TransE_BIOS_2/ \
    --gpu 0

after the process output |valid|: 8449749 |test|: 8449806, it seems that the process have died. But actually the process is still running. I don't know why the eval process is so slow as the train process can be so fast. I also tried to use --batch_size_eval 10000 --neg_sample_size_eval 10000 mentioned #106 here. But the speed is still pretty slow and the GPU util is still as slow as 10% (for A100).
I also tried to use multi-process to eval on cpu, I didn't see any improvement yet.

I wonder if there is any solution to this problem?

from dgl-ke.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.