wangkuiyi / gotorch Goto Github PK
View Code? Open in Web Editor NEWA Go idiomatic binding to the C++ core of PyTorch
License: MIT License
A Go idiomatic binding to the C++ core of PyTorch
License: MIT License
The current definition is here
gotorch/example/resnet/resnet.go
Lines 31 to 40 in 9010c80
I noticed some issues with this function.
The frequent call to append
might be expensive. Each call might deallocate and reallocate and copy existing slice data. Instead, we can make([]int64, n)
, then copy each generated element to the right place.
This function would crash the program if the parameter n
is too large. It would be safer to add a check to panic if n
is larger than a threshold.
Recently, my two PRs work well on my local mac development environment but fail in CI.
In #60, I use clang-format to format the C++ codes in local already, but it fails the pre-commit checking in CI.
$ pre-commit run -a
go fmt...................................................................Passed
go lint..................................................................Passed
validate toml........................................(no files to check)Skipped
Check files aren't using go's testing package........(no files to check)Skipped
cpplint..................................................................Passed
cppcheck.................................................................Passed
clang-format.............................................................Failed
- hook id: clang-format
- files were modified by this hook
In #64, I find that the macOS and Linux treat int64_t
differently. In macOS, in64_t
is long long
, while long
in Linux.
So, I have to write code (*C.longlong)(unsafe.Pointer(&stride[0]))
in macOS, while (*C.long)(unsafe.Pointer(&stride[0]))
in Linux. This introduce conditional compilation in go codes.
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch/cgotorch $ make -f Makefile.rpi
rm -f libtorch
ln -s rpi/libtorch libtorch
clang++ -std=c++14 \
-I .. \
-I libtorch/include \
-I libtorch/include/torch/csrc/api/include \
-L libtorch/lib \
-fPIC \
-shared \
cgotorch.cc \
-o libcgotorch.so -install_name @rpath/libcgotorch.so \
-Wl,-rpath,libtorch/lib \
-Wl,-force_load libtorch/lib/libc10.so \
-lc10 -ltorch -ltorch_cpu \
-D_GLIBCXX_USE_CXX11_ABI=1
clang: warning: argument unused during compilation: '-install_name @rpath/libcgotorch.so' [-Wunused-command-line-argument]
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch/cgotorch $ cd ..
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch $ go test -v
=== RUN TestPanicMNIST
--- PASS: TestPanicMNIST (0.00s)
=== RUN TestLogSoftmax
--- PASS: TestLogSoftmax (0.00s)
=== RUN ExampleBackward
--- FAIL: ExampleBackward (0.00s)
panic: size mismatch, m1: [17179869187 x 0], m2: [4294967300 x 0] at /home/pi/src/pytorch/aten/src/TH/generic/THTensorMath.cpp:4 [recovered]
panic: size mismatch, m1: [17179869187 x 0], m2: [4294967300 x 0] at /home/pi/src/pytorch/aten/src/TH/generic/THTensorMath.cpp:4
goroutine 1 [running]:
testing.(*InternalExample).processRunResult(0x253de90, 0x0, 0x0, 0x15c6f4, 0x0, 0x28c140, 0x2410848, 0x1)
/home/pi/usr/go/src/testing/example.go:89 +0x488
testing.runExample.func2(0x78b7046f, 0xbfc4a256, 0x7982f7, 0x0, 0x5020e0, 0x2410818, 0x24100d8, 0x241c400, 0x253de90, 0x253dea8)
/home/pi/usr/go/src/testing/run_example.go:58 +0xd4
panic(0x28c140, 0x2410848)
/home/pi/usr/go/src/runtime/panic.go:969 +0x118
github.com/wangkuiyi/gotorch.MustNil(0x2153628)
/home/pi/go/src/github.com/wangkuiyi/gotorch/tensor.go:59 +0x70
github.com/wangkuiyi/gotorch.MM(0x2410838, 0x2410820, 0x2)
/home/pi/go/src/github.com/wangkuiyi/gotorch/tensor.go:171 +0x48
github.com/wangkuiyi/gotorch_test.ExampleBackward()
/home/pi/go/src/github.com/wangkuiyi/gotorch/backward_test.go:13 +0x104
testing.runExample(0x2d0b6c, 0xf, 0x2e64a8, 0x0, 0x0, 0x0, 0x0)
/home/pi/usr/go/src/testing/run_example.go:62 +0x184
testing.runExamples(0x253df70, 0x4ca840, 0x7, 0x7, 0x101)
/home/pi/usr/go/src/testing/example.go:44 +0x104
testing.(*M).Run(0x24512c0, 0x0)
/home/pi/usr/go/src/testing/testing.go:1250 +0x1f8
main.main()
_testmain.go:62 +0x120
exit status 2
FAIL github.com/wangkuiyi/gotorch 0.325s
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch $
We need a standard MNIST example instead of mnist_test.go so that we run it on CPU/GPU or Pi.
NewFunctional
only accepts functions with func(torch.Tensor) torch.Tensor
type.
We could write the following codes:
nn.NewFunctional(torch.Tanh)
However, LeakyRelu
takes two input parameters.
func LeakyRelu(t Tensor, negativeSlope float64) Tensor {
return t.LeakyRelu(negativeSlope)
}
We could not write the following codes directly.
nn.NewFunctional(torch.LeakyRelu(0.2))
Maybe we should borrow more features from the functional programming language, like currying in Haskell.
torch.LeakyRelu(0.2)
will return a function with func(torch.Tensor) torch.Tensor
type. Then, it will work well with NewFunctional
.
There is also a project maxsz/curry which provides a way to support currying in Go.
We need to configure pre-commit on this project to check the Go code.
Just like wring a program to print "Hello World" is our first cause on coding, training a model to implement handwriting recognition on the MNIST database is usually the first course on Deep Learning.
This issue tried to compare various frond-end language on how to train the model with C++, Go, Python, and Go+Torch.
C++ | Go |
#include <torch/torch.h>
#include <cstddef>
#include <cstdio>
#include <iostream>
#include <string>
#include <vector>
struct Net: torch::nn::Module {
Net()
: conv1(torch::nn::Conv2dOptions(1, 10, /*kernel_size=*/5)),
conv2(torch::nn::Conv2dOptions(10, 20, /*kernel_size=*/5)),
dropout1(0.25),
dropout2(0.5),
fc1(320, 50),
fc2(50, 10) {
register_module("conv1", conv1);
register_module("conv2", conv2);
register_module("dropout1", dropout1);
register_module("dropout2", dropout2);
register_module("fc1", fc1);
register_module("fc2", fc2);
}
torch::Tensor forward(torch::Tensor x) {
auto x = conv1->forward(x);
x = torch::relu(x);
x = conv2->forward(x);
x = torch::relu(x);
x = torch::max_pool2d(x, 2);
x = dropout1(x);
x = torch::flatten(x, 1);
x = fc1(x);
x = torch::relu(x);
x = dropout2(x);
auto output = fc2(x);
return torch::log_softmax(x, 1);
}
torch::nn::Conv2d conv1;
torch::nn::Conv2d conv2;
torch::nn::Dropout dropout1;
torch::nn::Dropout dropout2;
torch::nn::Linear fc1;
torch::nn::Linear fc2;
};
auto main() -> int {
Net model;
model.train();
auto sgd = torch::optim::SGD(
model.parameters(), torch::optim::SGDOptions(0.01).momentum(0.5));
sgd.zero_grad();
auto data = torch::rand({2, 3, 224, 224});
auto target = torch::randint(1, 10, {2, });
auto output = model.forward(data);
auto loss = torch::nll_loss(output, target);
loss.backward();
sgd.step();
std::printf("Loss: %.6f", loss.template item<float>());
} |
package main
import (
torch "github.com/wangkuiyi/gotorch"
)
type Net struct {
torch.Module
conv1 torch.Conv2d
conv2 torch.Conv2d
dropout1 torch.Dropout1
dropout2 torch.Dropout2
fc1 torch.Linear
fc2 torch.Linear
}
func NewNet() {
n := &Net{
torch.Model{},
conv1: &torch.Conv2d(1, 10, 5),
conv2: &torch.Conv2d(10, 20, 5),
dropout1: &torch.Dropout1(0.25)
dropout2: &torch.Dropout2(0.5)
fc1: &torch.Linear(9216, 128),
fc2: &torch.Linear(128, 10),
}
n.registerModule()
return m
}
func (n Net) registerModule() {
n.RegisterModule("conv1", n.conv1)
n.RegisterModule("conv2", n.conv2)
n.RegisterModule("dropout1", n.dropout1)
n.RegisterModule("dropout2", n.dropout2)
n.RegisterModule("fc1", n.fc1)
n.RegisterModule("fc2", n.fc2)
}
func (n Net) Forward(x torch.Tensor) torch.Tensor {
x := n.conv1.Forward(x)
x = torch.Relu(x)
x = n.conv2.Forward(x)
x = torch.Relu(x)
x = torch.MaxPool2d(x, 2)
x = n.dropout1(x)
x = torch.Flatten(x, 1)
x = n.fc1(x)
x = torch.Relu(x)
x = n.dropout2(x)
x = n.fc2(x)
output := torch.LogSoftMax(x, 1)
return output
}
func main() {
model := NewNet()
model.Train()
sgd := torch.NewSGD(n.Parameters(), 0.01, 0.5)
sgd.ZeroGrad()
data := torch.Rand({2, 1, 28, 28})
target := torch.RandInt({1,10, {2, }})
output := n.Forward(data)
loss := torch.NllLoss(output, target)
loss.Backward()
sgd.Step()
fmt.Println("Loss:")
} |
Python | Go+Torch |
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
model = Net()
model.train()
optimizer = optim.Adadelta(model.parameters(), lr=0.1)
data = torch.rand((2, 1, 28, 28))
target = torch.randint(1, 10, (2,))
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
print("Loss: {:.6f}".format(loss.item())) |
package main
import (
torch "github.com/wangkuiyi/gotorch"
)
type Net struct {
torch.Module
conv1 torch.Conv2d
conv2 torch.Conv2d
dropout1 torch.Dropout1
dropout2 torch.Dropout2
fc1 torch.Linear
fc2 torch.Linear
}
func NewNet() {
n := &Net {
torch.Model{},
conv1: &torch.Conv2d(1, 10, 5),
conv2: &torch.Conv2d(10, 20, 5),
dropout1: &torch.Dropout1(0.25)
dropout2: &torch.Dropout2(0.5)
fc1: &torch.Linear(9216, 128),
fc2: &torch.Linear(128, 10),
}
return n
}
func (n Net) Forward(x torch.Tensor) torch.Tensor {
x := n.conv1.Forward(x)
x = torch.Relu(x)
x = n.conv2.Forward(x)
x = torch.Relu(x)
x = torch.MaxPool2d(x, 2)
x = n.dropout1(x)
x = torch.Flatten(x, 1)
x = n.fc1(x)
x = torch.Relu(x)
x = n.dropout2(x)
x = n.fc2(x)
output := torch.LogSoftMax(x, 1)
return output
}
model := Net()
model.Train()
sgd := torch.NewSGD(m.Parameters(), 0.01, 0.5)
sgd.ZeroGrad()
data := torch.Rand({2, 1, 28, 28})
target := torch.RandInt({1,10, {2, }})
output := m.Forward(data)
loss := torch.NllLoss(output, target)
loss.Backward()
sgd.Step()
println("Loss: %0.6f", loss.Item()) |
To process the ImageNet dataset, we need
ImageNetDataset
to load ImageNet images with .tar.gz
file.and the following transform functions:
As the discussion on torch.Module
API #28 , I will complete the MNIST e2e example, there are some Pytorch module has port into GoTorch, we also need some others to complete the MNIST example:
That the MNIST example would be like:
import (
torch "github.com/wangkuiyi/gotorch"
)
type Net struct {
fc1, fc2, fc3 torch.Linear
}
func NewNet() torch.Module{
return &Net{
fc1 : torch.Linear(28 * 28, 512, false),
fc2 : torch.Linear(512, 512, false),
fc3 : torch.Linear(512, 10, false),
}
}
func (n Net) Forward(x torch.Tensor) torch.Tensor {
x := torch.View(x)
x = n.fc1(x)
x = torch.Relu(x)
x = n.fc2(x)
x = torch.Relu(x)
return n.fc3(x)
}
func main() {
dataset := torch.NewMNIST(dataDir())
dataset.AddTransforms([]torch.Transform{
torch.NewNormalize(0.1307, 0.3081),
torch.NewStack(),
})
trainLoader := torch.NewDataLoader(dataset, 8)
net := NewNet()
criterion = torch.CrossEntropyLoss()
opt := torch.SGD(0.1, 0, 0, 0, false)
opt.AddParameters(torch.GetParameters(net))
batchIdx := 0
for trainLoader.Scan() {
batch := trainLoader.Batch()
pre := net.Forward(batch.Data)
loss := criterion(pre, batch.Target)
fmt.Println("BatchIdx: [%d], Loss: [%f]", batchIdx, loss.item())
opt.ZeroGrad()
loss.Backward()
opt.Step()
batchIdx++
}
torch.FinishGC()
opt.Close()
torch.CloseModule(net)
}
The ResNet50 model is the classical model in image classification, which used to benchmark distributed training performance, c.f. https://dawn.cs.stanford.edu/benchmark/ImageNet/train.html .
I would like to write a ResNet50 example to compare the code in Python, C++, and Go+ .
reference:
The arch is aarch64.
yi@nvidia:~/go/src/github.com/wangkuiyi/gotorch$ uname -a
Linux nvidia 4.4.38-rt49-tegra #1 SMP PREEMPT RT Tue Jul 25 09:26:02 PDT 2017 aarch64 aarch64 aarch64 GNU/Linux
I downloaded the pre-built libtorch for ARM from https://github.com/ljk53/pytorch-rpi.
yi@nvidia:~/go/src/github.com/wangkuiyi/gotorch$ cgotorch/build.sh
~/go/src/github.com/wangkuiyi/gotorch/cgotorch ~/go/src/github.com/wangkuiyi/gotorch
Building for Raspbian ...
rm -f libtorch
ln -s rpi/libtorch libtorch
g++ -std=c++14 \
-I .. \
-I libtorch/include \
-I libtorch/include/torch/csrc/api/include \
-L libtorch/lib \
-fPIC \
-shared \
optim.cc device.cc mnist_dataset.cc torch.cc pickle.cc functional.cc tensor.cc init.cc \
-O -o libcgotorch.so \
-Wl,-rpath,libtorch/lib \
-Wl,-force_load libtorch/lib/libc10.so \
-lc10 -ltorch -ltorch_cpu \
-D_GLIBCXX_USE_CXX11_ABI=1
libtorch/lib/libc10.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
Makefile:7: recipe for target 'libcgotorch.so' failed
make: *** [libcgotorch.so] Error 1
~/go/src/github.com/wangkuiyi/gotorch
We'd better add a suite of Chinese doc corresponding to our English version.
To get a ResNet50 training baseline of loss value, I running the resnet.py example and got the following logs:
batch: 10, loss: 10.711030, acc1: 0.000000, acc5: 0.000000
batch: 20, loss: 7.499339, acc1: 0.000000, acc5: 0.000000
batch: 30, loss: 7.281894, acc1: 0.000000, acc5: 0.000000
batch: 40, loss: 7.059255, acc1: 0.000000, acc5: 0.000000
batch: 50, loss: 7.000484, acc1: 0.000000, acc5: 0.000000
batch: 60, loss: 6.871602, acc1: 0.000000, acc5: 3.125000
batch: 70, loss: 6.962079, acc1: 0.000000, acc5: 0.000000
batch: 80, loss: 6.872428, acc1: 0.000000, acc5: 0.000000
batch: 90, loss: 6.922100, acc1: 0.000000, acc5: 0.000000
batch: 100, loss: 6.918412, acc1: 0.000000, acc5: 0.000000
batch: 110, loss: 6.880023, acc1: 0.000000, acc5: 0.000000
batch: 120, loss: 6.936709, acc1: 0.000000, acc5: 3.125000
batch: 130, loss: 6.936309, acc1: 0.000000, acc5: 0.000000
batch: 140, loss: 6.923660, acc1: 0.000000, acc5: 0.000000
batch: 150, loss: 6.924109, acc1: 0.000000, acc5: 0.000000
batch: 160, loss: 6.923644, acc1: 0.000000, acc5: 3.125000
...
=== RUN ExampleMNIST
libc++abi.dylib: terminating with uncaught exception of type c10::Error: Error opening images file at ./data/train-images-idx3-ubyte (read_images at ../torch/csrc/api/src/data/datasets/mnist.cpp:66)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x477ff47 in libc10.dylib)
frame #1: torch::data::datasets::MNIST::MNIST(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 3018 (0xb2d046a in libtorch_cpu.dylib)
frame #2: MNIST + 70 (0x465c1e6 in libcgotorch.so)
frame #3: _cgo_2879cf5c9dd9_Cfunc_MNIST + 29 (0x42a0a2d in gotorch.test)
frame #4: runtime.asmcgocall + 112 (0x4067760 in gotorch.test)
It seems that we need to download the dataset and unpack it to ./data
.
Monad is a programming pattern that records the output of each function call in a data structure, so we can free them at once afterward. It applies to many programming languages. Let us see why it is important to Go+Torch.
Go uses the pattern extensively, see https://www.innoq.com/en/blog/golang-errors-monads/ for an example.
We now allocate Tensor
objects using new
to keep the reference count in the shared_ptr
field of the C++ Tensor
class:
Line 16 in 4ade9aa
Tensor
objects new
ed would cause memory leak if we don't recycle them.
Assume that Go has a similar frontend API as C++, then according to the C++ MNIST example, let's think about the following problems.
Tensor
s Created In The Train Loop To Avoid Memory LeakTensor
s Created In the C++ train loop:
The train loop in mnist.cpp is like:
for (auto& batch : data_loader) {
auto data = batch.data.to(device), targets = batch.target.to(device); // `data` and `targets` are `Tensor`s
optimizer.zero_grad();
auto output = model.forward(data); // `output` is a `Tensor`
auto loss = torch::nll_loss(output, targets); // `loss` is a `Tensor`
AT_ASSERT(!std::isnan(loss.template item<float>()));
loss.backward();
optimizer.step();
//...
}
We can see these Tensor
s has to be created:
data
and targets
as the features and labels of the datasetoutput
as the predictions of the data
loss
defer
to destruct Tensor
s in the train loopBecause data
, targets
, output
, and loss
are all stack variables, they are created and destroyed in each iteration of the C++ train loop. This implied the libtorch framework would take ownership of the Tensor
s if necessary. As a result, a naive API of gotorch
can use defer
to recycle the reference-counted Tensor
s. That is, the following imaginary code would work okay.
// We need this nested function to make `defer` works as expected.
func step(batch *Batch) {
// `data`, `targets`, `output`, `loss` are `Tensor`s.
data := batch.Data.To(device)
defer data.Close()
target := batch.Target.To(device)
defer target.Close()
optimizer.zero_grad()
output := model.Forward(data)
defer output.Close()
loss = torch.NllLoss(output, targets)
defer loss.Close()
loss.Backward()
optimizer.Step()
// ...
}
for batch := range data_loader {
step(batch)
}
The defer
s are a bit tedious, maybe we can improve the syntax of Go+ to save typing.
Tensor
s Created In the C++ forward
method
The forward
method is called by the train loop above, in the C++ mnist example, the forward
method looks like:
torch::Tensor forward(torch::Tensor x) {
x = torch::relu(torch::max_pool2d(conv1->forward(x), 2));
x = torch::relu(
torch::max_pool2d(conv2_drop->forward(conv2->forward(x)), 2));
x = x.view({-1, 320});
x = torch::relu(fc1->forward(x));
x = torch::dropout(x, /*p=*/0.5, /*training=*/is_training());
x = fc2->forward(x);
return torch::log_softmax(x, /*dim=*/1);
}
defer
to destruct Tensor
s in the Forward
function (in a tricky way)Similar to the train loop above, x
is a Tensor
on the stack and is destroyed at the end of the function scope. The difference is that x
is reassigned multiple times. So we cannot simply use defer x.Close()
here. A workaround is requiring users to use a different idiom, for a naive example:
func (net *Net) Forward(x torch.Tensor) torch.Tensor { // The argument x is recycled in the train loop
var tensors []Tensor
defer func () {
for t := range tensors {
t.Close()
}
}()
x = torch.Relu(torch.MaxPool2d(net.conv1.Forward(x), 2))
append(tensors, x)
x = torch.Relu(
torch.MaxPool2d(net.conv2_drop.Forward(net.conv2.Forward(x)), 2))
append(tensors, x)
x = x.View([]int{-1, 320})
append(tensors, x)
x = torch.Relu(net.fc1.Forward(x))
append(tensors, x)
x = torch.Dropout(x, /*p=*/0.5, /*training=*/is_training())
append(tensors, x)
x = net.fc2.Forward(x)
append(tensors, x)
return torch.LogSoftmax(x, /*dim=*/1) // The return value is recycled in the train loop
}
Obviously, this is not very elegant.
Tensors
in C++?A better way is keeping the tensors
array in C++ rather than in Go, for example, we can use std::vector
to record each C++ Tensor
created by Go API, and provide a torch.CleanTensors
for users to call at the end of the train loop. However, this solution is harder to design properly, for example, we have to take goroutine
s into consideration so as to avoid corrupting the std::vector
.
Few functions in libtorch have the noexcept
tag. This implies that most of the functions in C++ may throw an exception. We have to expose an error
return type for these functions' wrappers in Go. Recall the step
function above:
func step(batch *Batch) {
// `data`, `targets`, `output`, `loss` are `Tensor`s.
data := batch.Data.To(device)
defer data.Close()
// ...
}
It may become the following in production code:
func step(batch *Batch) error {
// `data`, `targets`, `output`, `loss` are `Tensor`s.
data, err := batch.Data.To(device)
if err != nil {
return ...
}
defer data.Close()
// ...
}
That is, the user should check whether there's an error on each line. This may be tedious too. Go+ has a neat syntax to unwrap errors, but I cannot think of an elegant way to solve the problem for the time being. See previous discussions also: goplus/gop#307 (comment), goplus/gop#307 (comment)
Examples in C++ and Python are in https://github.com/pytorch/vision/tree/master/torchvision/csrc/models and https://github.com/pytorch/vision/tree/master/torchvision/models
Prerequisites:
torch::adaptive_avg_pool2d
in libtorchtorch::max_pool2d
in libtorchtorch::nn::functional::group_norm
in libtorch (the C++ version of resnet doesn't use group_norm
at all, the python version use nn.GroupNorm only in a conditional statement)Go code:
x = torch.View(x, []int64{-1, 28 * 28})
Go+ code:
x = torch.View(x, {-1, 28 * 28})
Go code:
loss := F.NllLoss(pred, target, torch.Tensor{}, -100, "mean")
Go+ code:
loss := F.NllLoss(pred, target)
Go code:
for epoch := 0; epoch < epochs; epoch++ {
}
Go+ code:
for epoch <- range(epochs) {
}
We want to compare the DCGAN example with different frontend languages: Python/C++/Go/Go+
Python version: https://github.com/pytorch/examples/blob/master/dcgan/main.py
C++ version: https://github.com/pytorch/examples/blob/master/cpp/dcgan/dcgan.cpp
We will add the Go version and Go+ version later.
MobileNetV2 (Inverted Residuals and Linear Bottlenecks) is a vision model for mobile devices.
pytorch has official implementations in both Python(mobilenet.py) and C++(mobilenet.h,
mobilenet.cpp), with about the same amount of code (162 lines v.s. 185 lines).
The following table compares the Go version and the Python version in terms of lines, as expected, they have about the same amount of code, too:
Go | Python |
package vision
import (
"math"
"torch"
"torch/nn"
"torch/nn/init"
)
func max(x, y int64) int64 { // math.max only works on floats
if x > y {
return x
}
return y
}
func makeDivisible(value float64, divisor int64, minValue *int64) int64 {
if minValue == nil {
min_value = divisor
}
newValue := max(*minValue, (int64(value+float64(divisor)/2)/divisor)*divisor)
if newValue < .9*value {
newValue += divisor
}
return newValue
}
type ConvBNReLU struct {
nn.Sequential
}
func NewConvBNReLU(in_planes, out_planes, kernel_size, stride, groups int64) ConvBNReLU {
ret := ConvBNReLU{nn.NewSequential()}
options := nn.Conv2dOptions{in_planes, out_planes, kernel_size}
ret.PushBack(nn.NewConv2d(
options.stride(stride).padding(padding).groups(groups).bias(false)))
ret.PushBack(nn.BatchNorm2d{out_planes})
ret.PushBack(nn.Functional{nn.ReLU})
}
func (net *ConvBNReLU) Forward(x torch.Tensor) torch.Tensor {
return net.Sequential.Forward(x)
}
type MobileNetInvertedResidual struct {
nn.Module
stride int64
useResConnect bool
conv nn.Sequential
}
func NewMobileNetInvertedResidual(
input, output, stride int64, expandRatio float64) MobileNetInvertedResidual {
net := MobileNetInvertedResidual{
Module: nn.NewModule(),
stride: stride,
useResConnect: stride == 1 && input == output,
conv: nn.NewSequential()}
net.stride = stride
net.useResConnect = stride == 1 && input == output
net.conv = nn.NewSequential()
doubleCompare := func(a, b float64) {
return math.Abs(a-b) < 1e-20
}
torch.CHECK(stride == 1 || stride == 2)
hiddenDim := int64(math.Round(float64(input) * expandRatio))
if !doubleCompare(expandRatio, 1) {
conv.PushBack(NewConvBNReLU(input, hiddenDim, 1, 3, 1, 1))
}
net.conv.PushBack(NewConvBNReLU(hiddenDim, hiddenDim, 3, stride, hiddenDim, 1))
options := nn.Conv2dOptions{hiddenDim, output, 1}
net.conv.PushBack(nn.NewConv2d(options.stride(1).padding(0).bias(false)))
net.RegisterModule("conv", net.conv)
return net
}
func (net *MobileNetInvertedResidual) Forward(x torch.Tensor) torch.Tensor {
if net.useResConnect {
return net.Add(x + net.conv.Forward(x))
}
return net.conv.Forward(x)
}
type MobileNetV2 struct {
nn.Module // nn.Module is a monadic type
lastChannel int64
features, classifier nn.Sequential
}
func NewMobileNetV2(
numClasses int64,
widthMult float64,
invertedResidualSettings [][]int64,
roundNearest int64) MobileNetV2 {
net := MobileNetV2{
Module: nn.NewModule(),
features: nn.NewSequential(),
classfier: nn.NewSequential()}
var inputChannel int64 = 32
var lastChannel int64 = 1280
if invertedResidualSettings == nil || len(invertedResidualSettings) == 0 {
invertedResidualSettings := [][]int64{
// t, c, n, s
{1, 16, 1, 1},
{6, 24, 2, 2},
{6, 32, 3, 2},
{6, 64, 4, 2},
{6, 96, 3, 1},
{6, 160, 3, 2},
{6, 320, 1, 1},
}
}
torch.CHECK(
len(invertedResidualSettings[0]) == 4,
"inverted_residual_settings should contain 4-element vectors")
inputChannel := makeDivisible(inputChannel*widthMult, roundNearest, nil)
net.lastChannel =
makeDivisible(lastChannel*math.max(1.0, widthMult), roundNearest, nil)
net.features.PushBack(NewConvBNReLU(3, inputChannel, 3, 2))
for setting := range invertedResidualSettings {
outputChannel := makeDivisible(setting[1]*widthMult, roundNearest, nil)
for i := 0; i < setting[2]; i++ {
stride := 1
if i == 0 {
stride = setting[3]
}
features.PushBack(
NewMobileNetInvertedResidual(
inputChannel, outputChannel, stride, setting[0]))
inputChannel = outputChannel
}
}
net.features.PushBack(NewConvBNReLU(inputChannel, net.lastChannel, 1, 3, 1, 1))
classifier.PushBack(nn.Dropout(0.2))
classifier.PushBack(nn.Linear(net.lastChannel, net.numClasses))
net.RegisterModule("features", net.features)
net.RegisterModule("classifier", net.classifier)
for module := range net.Modules(false) {
switch M := module.(type) {
case nn.Conv2d:
init.KaimingNormal(M.Weight, 0, torch.kFanOut)
if M.options.Bias {
init.zeros(M.Bias)
}
case nn.BatchNorm2d:
init.Ones(M.Weight)
init.Zeros(M.Bias)
case nn.Linear:
init.Normal(M.Weight, 0, 0.01)
init.Zero(M.Bias)
}
}
return net
}
func (net *MobileNetV2) Forward(x torch.Tensor) torch.Tensor {
x = net.features.Forward(x)
x = net.Mean(x, []int{2, 3})
x = net.classifier.Forwart(x)
return x
} |
from torch import nn
def _make_divisible(v, divisor, min_value=None):
"""
This function is taken from the original tf repo.
It ensures that all layers have a channel number that is divisible by 8
It can be seen here:
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
:param v:
:param divisor:
:param min_value:
:return:
"""
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
class ConvBNReLU(nn.Sequential):
def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None):
padding = (kernel_size - 1) // 2
if norm_layer is None:
norm_layer = nn.BatchNorm2d
super(ConvBNReLU, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
norm_layer(out_planes),
nn.ReLU6(inplace=True)
)
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None):
super(InvertedResidual, self).__init__()
self.stride = stride
assert stride in [1, 2]
if norm_layer is None:
norm_layer = nn.BatchNorm2d
hidden_dim = int(round(inp * expand_ratio))
self.use_res_connect = self.stride == 1 and inp == oup
layers = []
if expand_ratio != 1:
# pw
layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer))
layers.extend([
# dw
ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
norm_layer(oup),
])
self.conv = nn.Sequential(*layers)
def forward(self, x):
if self.use_res_connect:
return x + self.conv(x)
else:
return self.conv(x)
class MobileNetV2(nn.Module):
def __init__(self,
num_classes=1000,
width_mult=1.0,
inverted_residual_setting=None,
round_nearest=8,
block=None,
norm_layer=None):
"""
MobileNet V2 main class
Args:
num_classes (int): Number of classes
width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount
inverted_residual_setting: Network structure
round_nearest (int): Round the number of channels in each layer to be a multiple of this number
Set to 1 to turn off rounding
block: Module specifying inverted residual building block for mobilenet
norm_layer: Module specifying the normalization layer to use
"""
super(MobileNetV2, self).__init__()
if block is None:
block = InvertedResidual
if norm_layer is None:
norm_layer = nn.BatchNorm2d
input_channel = 32
last_channel = 1280
if inverted_residual_setting is None:
inverted_residual_setting = [
# t, c, n, s
[1, 16, 1, 1],
[6, 24, 2, 2],
[6, 32, 3, 2],
[6, 64, 4, 2],
[6, 96, 3, 1],
[6, 160, 3, 2],
[6, 320, 1, 1],
]
# only check the first element, assuming user knows t,c,n,s are required
if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
raise ValueError("inverted_residual_setting should be non-empty "
"or a 4-element list, got {}".format(inverted_residual_setting))
# building first layer
input_channel = _make_divisible(input_channel * width_mult, round_nearest)
self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
features = [ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer)]
# building inverted residual blocks
for t, c, n, s in inverted_residual_setting:
output_channel = _make_divisible(c * width_mult, round_nearest)
for i in range(n):
stride = s if i == 0 else 1
features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer))
input_channel = output_channel
# building last several layers
features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1, norm_layer=norm_layer))
# make it nn.Sequential
self.features = nn.Sequential(*features)
# building classifier
self.classifier = nn.Sequential(
nn.Dropout(0.2),
nn.Linear(self.last_channel, num_classes),
)
# weight initialization
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out')
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.zeros_(m.bias)
def _forward_impl(self, x):
# This exists since TorchScript doesn't support inheritance, so the superclass method
# (this one) needs to have a name other than `forward` that can be accessed in a subclass
x = self.features(x)
# Cannot use "squeeze" as batch-size can be 1 => must use reshape with x.shape[0]
x = nn.functional.adaptive_avg_pool2d(x, 1).reshape(x.shape[0], -1)
x = self.classifier(x)
return x
def forward(self, x):
return self._forward_impl(x) |
174 lines | 162 lines |
The C++ version has 185 lines because it strictly follows the 80 character line length limit. The code amount of C++, Go, Python is comparable.
I use the ToTensor
transform to read the same image in GoTorch and PyTorch:
The last three Tensor value in GoTorch:
0.0788 0.0936 0.0936
In PyTorch:
0.0784, 0.0941, 0.0941
There is a little diff.
With the PyTorch API, users can specify the runtime device easily just like the following code:
Device = torch.Device("cuda") # create CUDA device instance
x = torch.randn((2,3)).to(device) # assign Tensor memory to CUDA device
net = MNISTNet()
net.to(device) # assign module parameters to CUDA device
In GoTorch, we would like to provide the same API Tensor.To
and Module.To
so that users can train/pred a model on various device.
Device := torch.NewDevice("cuda") // create a CUDA device instance
x := torch.RandN([]int64{2,3}, false).To(device) // assign Tensor x memory to CUDA device
net = NewMNISTNet()
net.To(device) // assign parameters memory to CUDA device
For the Tensor.To
API, we just port device
and Tensor::to
function to Go.
For the Module.To
API, we should assign parameters to device recursive, c.f. https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/include/torch/nn/module.h#L676
TODO list:
device
to Go to implement Tensor.To
API.Dockerfile.gpu
and Makefile.gpu
to build Gotorch with CUDA version.Module.To
API.From the definition as the following, I understand the only purpose for user-defined modules to call Module.Init
in their newers is to let each sub-module know about its parent.
Lines 46 to 64 in 047d424
Is this purpose due to the requirement that when the user calls a module's To
or ZeroGrad
method, we can trace up to the top ancestor of the sub-module hierarchy and make sure that all modules in the hierarchy move to the specified device or have all parameter gradients cleared?
If this is the reasoning behind Module.Init
, I am afraid that the implementation of To
and ZeroGrad
are not tracing up to the root; instead, I see them simply call m.outer
.
Lines 97 to 101 in 047d424
=== RUN ExampleBackward
fatal error: checkptr: unsafe pointer arithmetic
goroutine 1 [running]:
runtime.throw(0x4504143, 0x23)
/usr/local/Cellar/go/1.14.6/libexec/src/runtime/panic.go:1116 +0x72 fp=0xc0001499b0 sp=0xc000149980 pc=0x4036842
runtime.checkptrArithmetic(0xc0000100a0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.14.6/libexec/src/runtime/checkptr.go:43 +0xb5 fp=0xc0001499e0 sp=0xc0001499b0 pc=0x4008c15
github.com/wangkuiyi/gotorch.Optimizer.AddParameters.func1(0xc000010098, 0xc0000100a0, 0xc000149a98)
/Users/yi/go/src/github.com/wangkuiyi/gotorch/optim.go:45 +0x70 fp=0xc000149a28 sp=0xc0001499e0 pc=0x41c6150
github.com/wangkuiyi/gotorch.Optimizer.AddParameters(0xc000010098, 0xc000149af0, 0x1, 0x1)
/Users/yi/go/src/github.com/wangkuiyi/gotorch/optim.go:45 +0x1d4 fp=0xc000149ac0 sp=0xc000149a28 pc=0x41c4c94
github.com/wangkuiyi/gotorch_test.ExampleBackward()
/Users/yi/go/src/github.com/wangkuiyi/gotorch/backward_test.go:10 +0x110 fp=0xc000149b28 sp=0xc000149ac0 pc=0x44128a0
testing.runExample(0x44fc6be, 0xf, 0x4511ed0, 0x0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.14.6/libexec/src/testing/run_example.go:62 +0x275 fp=0xc000149c68 sp=0xc000149b28 pc=0x4149eb5
testing.runExamples(0xc000149ed8, 0x47e60e0, 0x7, 0x7, 0x101)
/usr/local/Cellar/go/1.14.6/libexec/src/testing/example.go:44 +0x212 fp=0xc000149d68 sp=0xc000149c68 pc=0x4147c52
testing.(*M).Run(0xc000140080, 0x0)
/usr/local/Cellar/go/1.14.6/libexec/src/testing/testing.go:1250 +0x4f4 fp=0xc000149f00 sp=0xc000149d68 pc=0x414ed84
main.main()
_testmain.go:62 +0x224 fp=0xc000149f88 sp=0xc000149f00 pc=0x4414f54
runtime.main()
/usr/local/Cellar/go/1.14.6/libexec/src/runtime/proc.go:203 +0x1fa fp=0xc000149fe0 sp=0xc000149f88 pc=0x4038eaa
runtime.goexit()
/usr/local/Cellar/go/1.14.6/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000149fe8 sp=0xc000149fe0 pc=0x406a431
goroutine 9 [runnable]:
testing.runExample.func1(0xc000010080, 0xc00005a540)
/usr/local/Cellar/go/1.14.6/libexec/src/testing/run_example.go:35
created by testing.runExample
/usr/local/Cellar/go/1.14.6/libexec/src/testing/run_example.go:35 +0x1c7
FAIL github.com/wangkuiyi/gotorch 0.391s
An alternative to having gotorch/aten and gotorch/torch is that we have only gotorch/, which includes files like tensor.go and optim/adam.go and optim/sgd.go.
I am afraid that in the future, we are going to use the XLA backend of PyTorch https://github.com/pytorch/xla, and there might be a vague boundary between aten
and torch
.
The definition is here
Lines 21 to 27 in 66df6a4
After converting the C pointer err
into a Go string msg
, we free err
. Is msg
still a valid Go object with its underlying C pointer freed?
PyTorch provides Dataset
and DataLoader
API to load and generate data to train the model with, c.f. https://pytorch.org/tutorials/advanced/cpp_frontend.html#loading-data.
We would like to provide the same way in GoTorch.
As described https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference
It seems that
Module.GetNamedParameters
into Module.StateDict
.Module.LoadStateDict
.We are trying to compare the GPU memory consumption between GoTorch and PyTorch with the Resnet50 model. The scripts locate at https://github.com/wangkuiyi/gotorch/tree/develop/example/resnet.
The GPU card is P100 with 16G memory.
Experiment 1:
Following is the result, it's measured with nvidia-smi
command.
Only Forward | Forward and Backward | |
---|---|---|
PyTorch | 3719 MiB | 2545 MiB |
GoTorch | 2447 MiB | 2767 MiB |
We remove three-line codes in Only Forward scenario:
# optimizer.zero_grad()
# loss.backward()
# optimizer.step()
Experiment 2:
GPU memory with different batch size:
Batch Size | 16 | 128 | 160 |
---|---|---|---|
PyTorch | 2545 MiB | 13161 MiB | 15295 MiB |
GoTorch | 2767 MiB | 14755 MiB | OOM |
When I run the dcgan example in GPU, I get the following error message:
terminate called after throwing an instance of 'c10::Error'
what(): can't optimize a non-leaf Tensor
Exception raised from add_param_group at ../torch/csrc/api/src/optim/optimizer.cpp:80 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f52653a7eb9 in /go/src/github.com/wangkuiyi/gotorch/cgotorch/libtorch/lib/libc10.so)
I find the same issue in PyTorch community: https://discuss.pytorch.org/t/tensor-to-device-changes-is-leaf-causing-cant-optimize-a-non-leaf-tensor/37659
test = torch.zeros((10,10)).requires_grad_(True)
print(test.is_leaf) # True
test = test.to(data.device)
print(test.is_leaf) # False
The To
operation returns a new tensor, so, test
becomes a non-leaf tensor.
We should keep the original reference. Instead calling v.Set(reflect.ValueOf(t.To(device, t.Dtype())))
, we should call t.SetData(t.To(device, t.Dtype()))
in Go code.
LibTorch MNIST example got the loss 0.0269 after 5 epochs:
Epoch 0, Loss: 0.1280
Epoch 1, Loss: 0.0659
Epoch 2, Loss: 0.0396
Epoch 3, Loss: 0.0304
Epoch 4, Loss: 0.0269
GoTorch MNIST example got the loss 1.4148 after 5 epochs:
2020/08/12 22:37:41 Epoch: 0, Loss: 4.8264
2020/08/12 22:37:46 Epoch: 1, Loss: 5.9624
2020/08/12 22:37:52 Epoch: 2, Loss: 2.4493
2020/08/12 22:37:58 Epoch: 3, Loss: 0.9619
2020/08/12 22:38:04 Epoch: 4, Loss: 1.4148
I noticed the failure due to TravsiCI failed with #191. TravisCI is configured to run on macOS VM.
=== RUN TestTensorString
TestTensorString: tensor_test.go:84:
Error Trace: tensor_test.go:84
Error: Not equal:
expected: " 1.0000 1.1000 1.2000\n 2.0000 3.0000 4.0000\n[ CPUFloatType{2,3} ]"
actual : " 0.0141 2.0000 512.0001\n 0.0000 0.0000 0.0000\n[ CPUDoubleType{2,3} ]"
Diff:
--- Expected
+++ Actual
@@ -1,3 +1,3 @@
- 1.0000 1.1000 1.2000
- 2.0000 3.0000 4.0000
-[ CPUFloatType{2,3} ]
+ 0.0141 2.0000 512.0001
+ 0.0000 0.0000 0.0000
+[ CPUDoubleType{2,3} ]
Test: TestTensorString
As the title description, the crashed logs as the following:
=== RUN TestExampleMNIST
libc++abi.dylib: terminating with uncaught exception of type c10::Error: Error opening images file at ./unsdsdf/train-images-idx3-ubyte (read_images at ../torch/csrc/api/src/data/datasets/mnist.cpp:66)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x4759f47 in libc10.dylib)
frame #1: torch::data::datasets::MNIST::MNIST(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 3018 (0xc8ab46a in libtorch_cpu.dylib)
frame #2: MNIST + 73 (0x4630859 in libcgotorch.so)
frame #3: _cgo_e3c33f78a9c2_Cfunc_MNIST + 29 (0x42920dd in gotorch.test)
frame #4: runtime.asmcgocall + 112 (0x405efa0 in gotorch.test)
SIGABRT: abort
PC=0x7fff6a37533a m=0 sigcode=0
goroutine 0 [idle]:
runtime: unknown pc 0x7fff6a37533a
stack: frame={sp:0x7ffeefbfec88, fp:0x0} stack=[0x7ffeefb80718,0x7ffeefbff780)
00007ffeefbfeb88: 00007ffeefbff1a0 00007ffeefbfebb0
00007ffeefbfeb98: 0000000009900acb 0000000000000000
00007ffeefbfeba8: 0000000000000041 00007ffeefbff130
00007ffeefbfebb8: 00007ffeefbfebf0 000000000000037f
00007ffeefbfebc8: 0000000000000000 0000000032aaaba2
00007ffeefbfebd8: 0000000000000000 0000000000000000
00007ffeefbfebe8: 00007ffeefbfed20 0000000000000000
....
The command go test -v
in the container complains that it cannot find symbols including the MNIST dataset. It is weird that go test -v
works with macOS.
root@a483ce0b3e5d:/go/src/github.com/wangkuiyi/gotorch# go test -v
# github.com/wangkuiyi/gotorch
./cgotorch/libcgotorch.so: undefined reference to `torch::data::datasets::MNIST::MNIST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::data::datasets::MNIST::Mode)'
./cgotorch/libcgotorch.so: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
./cgotorch/libcgotorch.so: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
collect2: error: ld returned 1 exit status
FAIL github.com/wangkuiyi/gotorch [build failed]
Anyway, we can merge this PR and fix the problem in future PRs.
Originally posted by @wangkuiyi in #58 (comment)
By grepping the official DCGAN example program, we see the following modules need to be ported before we can run DCGAN with GoTorch.
$ curl -Ls https://raw.githubusercontent.com/pytorch/examples/master/dcgan/main.py | grep 'nn\.'
import torch.nn.parallel
cudnn.benchmark = True
torch.nn.init.normal_(m.weight, 0.0, 0.02)
torch.nn.init.normal_(m.weight, 1.0, 0.02)
torch.nn.init.zeros_(m.bias)
class Generator(nn.Module):
self.main = nn.Sequential(
nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf),
nn.ReLU(True),
nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),
nn.Tanh()
output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
class Discriminator(nn.Module):
self.main = nn.Sequential(
nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 2),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 4),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 8),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
nn.Sigmoid()
output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
criterion = nn.BCELoss()
There are three levels of abstractions:
Level 1: native function is a low-level API. There are many basic mathematical operations in it.
Level 2: nn.functional is a middle-level API. It's more close to deep learning. It uses basic mathematical operations to compose a complex neural network operation.
Level 3: nn.module is a high-level API. A module contains many states, such as parameters and buffers. It's a C++ class.
Let's take padding
operator as an example:
In native function, there is a cat
function
In nn.functional, there is a pad
function, which calls cat
function in level 1.
In nn.module, there are ZeroPad2d
class, ReplicationPad3d
class, which call pad
function in level 2.
expose to Go | API | contain state | |
---|---|---|---|
native function | C++ function, easy | low-level API, flexible, few users may use it | No |
nn.functional | C++ function, easy | middle-level API, most users use it | No |
nn.module | C++ class, hard | high-level API, most users use it | Yes, parameters and buffers |
There is another interesting thing, nn.functional
will try to fuse some basic native functions. Here is an example of nn.function.linear.
I am wondering which levels of abstractions in C++ I am supposed to expose to Go?
PyTorch APi has a key concept -- torch.nn.Module. Many builtin and user-defined models are classes derived from torch.nn.Module. The only method to override is forward(x).
Usually, a torch.nn.Module-derived class has data members representing the model parameters. For example, nn.Linear, the PyTorch implementation of the fully-connected layer has W and B -- the weights and the bias respectively.
In Go/Go+, the concept corresponds to a base class in Python is an interface. So, we provide type Module interface
to mimic torch.nn.Module.
Then, we need a solution to free up tensors when a model's life is over.
gotorch/example/dcgan/visualize_pickle.py
Line 39 in f875d63
What does the range(64)
mean?
After #98, GoTorch builds and runs on Raspbian 10, but the test ExampleTrainMNIST takes forever.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.