Git Product home page Git Product logo

hw4's Introduction

Homework 4

Public repository and stub/testing code for Homework 4 of 10-714.

hw4's People

Contributors

arav-agarwal2 avatar ashertrockman avatar zwang86 avatar leslierice1 avatar

Stargazers

Bewuw avatar

Watchers

Zico Kolter avatar Tianqi Chen avatar  avatar

hw4's Issues

wrong acc, loss return in `one_iter_of_cifar10_training`

return correct/(y.shape[0]*niter), total_loss/(y.shape[0]*niter)

    for batch in dataloader:
        opt.reset_grad()
        X, y = batch
        X,y = ndl.Tensor(X, device=device), ndl.Tensor(y, device=device)
        out = model(X)
        correct += np.sum(np.argmax(out.numpy(), axis=1) == y.numpy())
        loss = loss_fn(out, y)
        total_loss += loss.data.numpy() * y.shape[0]
        loss.backward()
        opt.step()
        if i >= niter:
            break
        i += 1
    return correct/(y.shape[0]*niter), total_loss/(y.shape[0]*niter)

correct/(y.shape[0]*niter), total_loss/(y.shape[0]*niter) only correct if the size of the last batch is equal to the others. It's maybe not true for all dataset.

Type weirdness in ResNet9 submission code

hw4/tests/hw4/test_conv.py

Lines 471 to 483 in 1ef8c3b

def one_iter_of_cifar10_training(dataloader, model, niter=1, loss_fn=ndl.nn.SoftmaxLoss(), opt=None, device=None):
np.random.seed(4)
model.train()
correct, total_loss = 0, 0
i = 1
for batch in dataloader:
opt.reset_grad()
X, y = batch
X,y = ndl.Tensor(X, device=device), ndl.Tensor(y, device=device)
out = model(X)
correct += np.sum(np.argmax(out.numpy(), axis=1) == y.numpy())
loss = loss_fn(out, y)
total_loss += loss.data.numpy() * y.shape[0]

Is it intended that correct is a float scalar and total_loss is a np array? They're heterogenous but later fed into ndl.Tensor ctor, leading to Error.

The initialization order of modules is crucial in resnet9

The initialization order of modules is crucial and should be explicitly stated.

Due to the random seed set in the test file, the initialization order of different modules is important when constructing the ResNet9 network. It is recommended to clarify this in the documentation of the homework.

For ResNet9:
Correct:

class ResNet9(ndl.nn.Module):
    def __init__(self, device=None, dtype="float32"):
        super().__init__()

        self.module = nn.Sequential(
            ConvBN(3, 16, 7, 4, device=device, dtype=dtype),
            ConvBN(16, 32, 3, 2, device=device, dtype=dtype),
            ndl.nn.Residual(nn.Sequential(
                ConvBN(32, 32, 3, 1, device=device, dtype=dtype),
                ConvBN(32, 32, 3, 1, device=device, dtype=dtype),
            )),

            ConvBN(32, 64, 3, 2, device=device, dtype=dtype),
            ConvBN(64, 128, 3, 2, device=device, dtype=dtype),
            ndl.nn.Residual(nn.Sequential(
                ConvBN(128, 128, 3, 1, device=device, dtype=dtype),
                ConvBN(128, 128, 3, 1, device=device, dtype=dtype),
            )),

            nn.Flatten(),
            nn.Linear(128, 128, device=device, dtype=dtype),
            nn.ReLU(),
            nn.Linear(128, 10, device=device, dtype=dtype),
        )

Incorrect (ResNet 9, train_cifar19 will fail):

class ResNet9(ndl.nn.Module):
    def __init__(self, device=None, dtype="float32"):
        super().__init__()

        residual1 = ndl.nn.Residual(nn.Sequential(
            ConvBN(32, 32, 3, 1, device=device, dtype=dtype),
            ConvBN(32, 32, 3, 1, device=device, dtype=dtype),
        ))

        residual2 = ndl.nn.Residual(nn.Sequential(
            ConvBN(128, 128, 3, 1, device=device, dtype=dtype),
            ConvBN(128, 128, 3, 1, device=device, dtype=dtype),
        ))

        self.module = nn.Sequential(
            ConvBN(3, 16, 7, 4, device=device, dtype=dtype),
            ConvBN(16, 32, 3, 2, device=device, dtype=dtype),
            residual1,

            ConvBN(32, 64, 3, 2, device=device, dtype=dtype),
            ConvBN(64, 128, 3, 2, device=device, dtype=dtype),
            residual2,

            nn.Flatten(),
            nn.Linear(128, 128, device=device, dtype=dtype),
            nn.ReLU(),
            nn.Linear(128, 10, device=device, dtype=dtype),
        )

It seems that the subtraction operator for tensors has an issue.

In the autograd.py file, to implement tensor subtraction, a sub method is defined and set rsub = sub. When a tensor is subtracted from a scalar, it obviously results in the opposite of the expected result.

def __pow__(self, other):
        if isinstance(other, Tensor):
            return needle.ops.EWisePow()(self, other)
        else:
            return needle.ops.PowerScalar(other)(self)

def __sub__(self, other):
        if isinstance(other, Tensor):
            return needle.ops.EWiseAdd()(self, needle.ops.Negate()(other))
        else:
            return needle.ops.AddScalar(-other)(self)


    __radd__ = __add__
    __rmul__ = __mul__
    __rsub__ = __sub__
    __rmatmul__ = __matmul__

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.