Comments (9)
Thanks for the report. However, I have a quick look and see 2-3 things that may cause memory blown up:
- Your func
getTensor
last operation should betensor = tensor.MustUnsqueeze(0, true)
(true) to delete existing tensor before assigning to new one otherwise mem leak here. - In go routine for loop, you run
net.Forward(tensor)
, which return a tensor, that tensor should be deleted after being used bylog.Println()
otherwise leak here as well. - When doing
forward
in inference mode, you should put insidets.NoGrad()
otherwise, autograd will build up (not really a memory leak but hidden tensors).
Please try those things to see how thing are going. Thanks.
from gotch.
Thank you for your response. I have adjusted my code according to your suggestions, but the memory usage still keeps increasing. Below is my latest code:
package main
import (
"encoding/json"
"os"
"time"
"github.com/sugarme/gotch"
"github.com/sugarme/gotch/nn"
"github.com/sugarme/gotch/pickle"
"github.com/sugarme/gotch/ts"
"github.com/sugarme/gotch/vision"
)
func getModel() (net nn.FuncT) {
modelName := "resnet18"
url, ok := gotch.ModelUrls[modelName]
if !ok {
panic("Unsupported model name")
}
modelFile, err := gotch.CachedPath(url)
if err != nil {
panic(err)
}
vs := nn.NewVarStore(gotch.CPU)
net = vision.ResNet18NoFinalLayer(vs.Root())
err = pickle.LoadAll(vs, modelFile)
if err != nil {
panic(err)
}
return
}
func getTensor() (tensor *ts.Tensor) {
b, err := os.ReadFile("test.data")
if err != nil {
panic(err)
}
var data []float32
err = json.Unmarshal(b, &data)
if err != nil {
panic(err)
}
tensor = ts.MustOfSlice(data).MustView([]int64{3, 224, 224}, true)
tensor = tensor.MustUnsqueeze(0, true)
return
}
func main() {
net := getModel()
tensor := getTensor()
defer tensor.MustDrop()
var goroutineNum = 10
for i := 0; i < goroutineNum; i++ {
go func(net nn.FuncT) {
for {
ts.NoGrad(func() {
result := net.ForwardT(tensor, false)
result.MustDrop()
})
}
}(net)
}
time.Sleep(5 * time.Minute)
}
from gotch.
When calling the model in multiple goroutines, a lot of warning messages appear, as follows:
2023/10/30 11:54:50 WARNING: Probably double free tensor "Conv2d_000235087". Called from "ts.Drop()". Just skipping...
2023/10/30 11:54:50 WARNING: Probably double free tensor "BatchNorm_000235091". Called from "ts.Drop()". Just skipping...
2023/10/30 11:54:50 WARNING: Probably double free tensor "Relu_000235100". Called from "ts.Drop()". Just skipping...
2023/10/30 11:54:50 WARNING: Probably double free tensor "Relu_000235098". Called from "ts.Drop()". Just skipping...
2023/10/30 11:54:50 WARNING: Probably double free tensor "BatchNorm_000235215". Called from "ts.Drop()". Just skipping...
2023/10/30 11:54:50 WARNING: Probably double free tensor "Relu_000235245". Called from "ts.Drop()". Just skipping...
2023/10/30 11:54:50 WARNING: Probably double free tensor "Relu_000235395". Called from "ts.Drop()". Just skipping...
2023/10/30 11:54:50 WARNING: Probably double free tensor "Conv2d_000235566". Called from "ts.Drop()". Just skipping...
2023/10/30 11:54:50 WARNING: Probably double free tensor "Relu_000235609". Called from "ts.Drop()". Just skipping...
from gotch.
Probably you should create a model for each go routine then. Actually, I have never tried to do concurrency on one model like that. I guess, there will be a lot of data collision as all go routines feed into a single model.
from gotch.
I created a model for each goroutine, and used the corresponding model when calling within the goroutine, but there are still issues.
package main
import (
"encoding/json"
"os"
"time"
"github.com/sugarme/gotch"
"github.com/sugarme/gotch/nn"
"github.com/sugarme/gotch/pickle"
"github.com/sugarme/gotch/ts"
"github.com/sugarme/gotch/vision"
)
func getModel() (net nn.FuncT) {
modelName := "resnet18"
url, ok := gotch.ModelUrls[modelName]
if !ok {
panic("Unsupported model name")
}
modelFile, err := gotch.CachedPath(url)
if err != nil {
panic(err)
}
vs := nn.NewVarStore(gotch.CPU)
net = vision.ResNet18NoFinalLayer(vs.Root())
err = pickle.LoadAll(vs, modelFile)
if err != nil {
panic(err)
}
return
}
func getTensor() (tensor *ts.Tensor) {
b, err := os.ReadFile("test.data")
if err != nil {
panic(err)
}
var data []float32
err = json.Unmarshal(b, &data)
if err != nil {
panic(err)
}
tensor = ts.MustOfSlice(data).MustView([]int64{3, 224, 224}, true)
tensor = tensor.MustUnsqueeze(0, true)
return
}
func main() {
var goroutineNum = 10
var nets []nn.FuncT
for i := 0; i < goroutineNum; i++ {
nets = append(nets, getModel())
}
tensor := getTensor()
defer tensor.MustDrop()
for i := 0; i < goroutineNum; i++ {
net := nets[i]
go func(net nn.FuncT) {
for {
ts.NoGrad(func() {
result := net.ForwardT(tensor, false)
result.MustDrop()
})
}
}(net)
}
time.Sleep(5 * time.Minute)
}
from gotch.
I will try to reproduce your problem when having time by this week. However, your latest go func()
should not have input then.
What about some thing like this:
for i := 0; i < goroutineNum; i++ {
go func() {
net := getModel()
tensor := getTensor()
ts.NoGrad(func() {
result := net.ForwardT(tensor, false)
result.MustDrop()
})
tensor.MustDrop()
}
}()
}
from gotch.
The memory usage still keeps increasing, the key code is as follows:
for i := 0; i < goroutineNum; i++ {
go func() {
// goroutine model
net := getModel()
// test input tensor
tensor := getTensor()
defer tensor.MustDrop()
// stress test to observe memory increase
for {
ts.NoGrad(func() {
result := net.ForwardT(tensor, false)
// drop result tensor
result.MustDrop()
})
}
}()
}
from gotch.
I understand now, I seem to have found a bug in tensor.go that causes some Tensors not to be released.
this is old code:
atomic.AddInt64(&TensorCount, 1)
nbytes := x.nbytes()
atomic.AddInt64(&AllocatedMem, nbytes)
lock.Lock()
if _, ok := ExistingTensors[name]; ok {
name = fmt.Sprintf("%s_%09d", name, TensorCount)
}
ExistingTensors[name] = struct{}{}
lock.Unlock()
change to:
tensorCount := atomic.AddInt64(&TensorCount, 1)
nbytes := x.nbytes()
atomic.AddInt64(&AllocatedMem, nbytes)
lock.Lock()
if _, ok := ExistingTensors[name]; ok {
name = fmt.Sprintf("%s_%09d", name, tensorCount)
}
ExistingTensors[name] = struct{}{}
lock.Unlock()
I just realized that you had made a fix for this issue last week, but I didn't use your latest code. The problem is resolved now, it can be closed.
from gotch.
Thanks for reporting.
from gotch.
Related Issues (20)
- v2.0 support HOT 3
- Possible Memory Leak From C.malloc(0) HOT 3
- how to load model pytorch_model.bin HOT 5
- Indexing documentation HOT 1
- Can't build project with gotch v0.9.0 HOT 3
- Production use for gotch just for inference HOT 3
- Concurrency issue in generating tensor name in newTensor HOT 3
- Can memory leak in tensor-generated.go because of malloc(0) ? HOT 8
- how can i convert gocv.Mat image data to Tensor ? HOT 2
- `*ts.CModule` does not implement `ts.Module` interface correctly
- Cannot Run the Application with Cgo HOT 1
- Cannot Run the Application using Libtorch 2.1 (CPU) Docker Image
- Consider using build tags instead of a bash script HOT 6
- TestOptimizer is flaky
- install steps on macos HOT 3
- Cgo Memory Leak HOT 14
- Q) Is there a function or method to clear the cached memory? HOT 1
- Float64Values() shows an error 'Unsupported Go type: []float64' HOT 2
- Releasing tensor causes segmentation fault error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gotch.