Git Product home page Git Product logo

Comments (14)

ArdalanM avatar ArdalanM commented on August 25, 2024 11

Had the same issue, here is the fix:

Modify run_epoch cast all counters to numpy values with .detach().numpy() or just .numpy()

Here is the corrected function:

def run_epoch(data_iter, model, loss_compute):
    "Standard Training and Logging Function"
    start = time.time()
    total_tokens = 0
    total_loss = 0
    tokens = 0
    for i, batch in enumerate(data_iter):
        out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask)
        loss = loss_compute(out, batch.trg_y, batch.ntokens)
        total_loss += loss.detach().numpy()
        total_tokens += batch.ntokens.numpy()
        tokens += batch.ntokens.numpy()
        if i % 50 == 1:
            elapsed = time.time() - start
            print("Epoch Step: %d Loss: %f Tokens per Sec: %f" % (i, loss.detach().numpy() / batch.ntokens.numpy(), tokens / elapsed))
            start = time.time()
            tokens = 0
    return total_loss / total_tokens

from annotated-transformer.

DavidZhang88 avatar DavidZhang88 commented on August 25, 2024

@xvdp Have you met this problem? Could you offer me some help? Thank you so much.

from annotated-transformer.

wesg52 avatar wesg52 commented on August 25, 2024

I am having the same issue.

from annotated-transformer.

rchavezj avatar rchavezj commented on August 25, 2024

I'm also having the same issue

from annotated-transformer.

ngarneau avatar ngarneau commented on August 25, 2024

Also having the same issue. Running in python directly give a floating point exception (core dumped).

from annotated-transformer.

v-iashin avatar v-iashin commented on August 25, 2024

The same issue on Ubuntu 16.04, Threadripper 2950X, PyTorch 1.0.1.

UPD: I am not sure but it seems like a deadlock somewhere because I couldn't catch this with a debugger.

from annotated-transformer.

chenjun0210 avatar chenjun0210 commented on August 25, 2024

I'm also having the same issue

from annotated-transformer.

BerenLuthien avatar BerenLuthien commented on August 25, 2024

I am having the same issue.

from annotated-transformer.

rchavezj avatar rchavezj commented on August 25, 2024

@ArdalanM Your MVP

from annotated-transformer.

anantshah200 avatar anantshah200 commented on August 25, 2024

I still have the same issue. It runs fine for a few batches and then gives a floating point exception. Any other suggestions.

from annotated-transformer.

anantshah200 avatar anantshah200 commented on August 25, 2024

@ArdalanM, it runs for about 400-500 batches and then throws a floating point exception. Had you experienced the same type of error? Any suggestions to solve it?

from annotated-transformer.

clived2 avatar clived2 commented on August 25, 2024

I am having this same issue, running pytorch 1.2.0 on a Ubuntu 18.04.3 desktop, every time I try to run a CNN script atthe point where the "training" is invoked. ANN, RNN scripts work without any such issues. It seems that a lot of people are having this problem for quite a while, i am amazed that this issue is still unresolved

from annotated-transformer.

rithikreddy2k2 avatar rithikreddy2k2 commented on August 25, 2024

Firstly Uninstall pytorch as follows:
conda uninstall pytorch
pip uninstall torch ( Run this code twice to check if its uninstalled sucessfully )

Then Freshly install Pytorch as follows:
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
or Check the official website for latest version https://pytorch.org/

This solved the issue for me!
Hope its solves

from annotated-transformer.

canlinzhang avatar canlinzhang commented on August 25, 2024

I just created a new virtual environment in conda (I use anaconda). Then do:

pip install transformers
conda install pytorch torchvision torchaudio -c pytorch

After that re-run your code should work.

from annotated-transformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.