Hi, I encountered memory allocation problem during initialization (so

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Memory allocation problem about fpn HOT 7 CLOSED

xmyqsh commented on June 16, 2024

Memory allocation problem

from fpn.

Comments (7)

xmyqsh commented on June 16, 2024

@leighton613
Please pull my latest code.
FPN is memory efficient because its feature is diverse and its Fast-RCNN head is light.
I just use 5428MiB memory for training, 1 image, shorter side 800, batch size 512, rpn batch size 256

from fpn.

leighton613 commented on June 16, 2024

Thanks @xmyqsh , I haven't check your latest update, which must be very efficient.

Just for comparison, my implementation is adapted from faster-rcnn and is pretty similar to your last version.

faster-rcnn consumes 5000+ MiB
fpn w/ only p2 consumes 8600+ MiB
fpn w/ all layers consumes > 12189 MiB and get ResourceExhaustedError.

Any suggestions maybe....?

from fpn.

xmyqsh commented on June 16, 2024

@leighton613
Your Fast-RCNN consumes 5000+ MiB, so your base model should be ResNet50, right?
And your FPN consumes another 8600+ MiB, I think you haven't share base model between Fast-RCNN and FPN. Meanwhile, your RPN‘s head is incredible big, pay attention to FPN's RPN conv's output is 256, not 512 as in faster-rcnn. And I think yours should larger than 512. P2's feature map which is RPN conv's input is 64 times larger than P5 whose feature map size is same as C5's feature map, this makes RPN conv's output sensitive.
Big P2's feature map also make number of anchors increasing large when anchor scale is increasing. The box-regression layer (reg) and box-classification layer (cls)'s ouput should also increase linearly as number of anchors scale increase.

your RPN's conv size / faster-rcnn's RPN's conv size = (256 / 2048) * 64 * (your RPN conv's output / faster-rcnn's RPN's output)
your rgs/cls layer size / faster-rcnn's rgs/cls layer size = (your RPN conv's output / faster-rcnn's RPN's output) * 64 * (your number of anchor's scale / faster-rcnn's RPN's anchor's scale)
and
your RPN's conv size / my RPN's conv size = (your RPN conv's output / ((1 + 1/4 + 1/16 + 1/64) * 256))
your rgs/cls layer size / my rgs/cls layer size = ((your RPN conv's output * 1 * your number of anchor's scale) / (256 * (1 + 1/4 + 1/16 + 1/64) * 1))

In a word, I think you could try to decrease your RPN conv's output and your number of anchor's scale as well as share base model among FPN and Fast-RCNN.

Anyway, all of these above seems not the main problem, there must be other causes before I saw your source code.

from fpn.

leighton613 commented on June 16, 2024

Thanks @xmyqsh ! That's very thoughtful and detailed analysis. I actually use VGG as base for convenience but that doesn't matter at this time. The problem was resolved by changing tf configuration, and let this program to use all the GPU...

Also, I read your updated code, and the one shared anchor_target_layer (instead of having four) is helpful to save some RAM of course ;) However, the (possible) downside is, some of the four roi-pooling layers get empty input (no proposals for this scale) and then get cudaCheckError() failed. Wonder if anyone encountered this (or I mis-implemented some part)?

from fpn.

xmyqsh commented on June 16, 2024

@leighton613
I have encountered the same cudaCheckError() as yours.
I just hack the roi_pooling_op_gpu L100 and L205 to cope with it.

from fpn.

leighton613 commented on June 16, 2024

@xmyqsh Thanks! I'll take some time into op later... For now I changed _calc_level related function to ensure there won't be any empty rois.

from fpn.

jiaxiaoharry commented on June 16, 2024

@xmyqsh Hi, I've been testing your code but got the same issue "cudaCheckError() failed in ROIPoolForward: invalid device function". Could you show me more details of how to hack the roi_pooling_op_gpu L100 and L205 to cope with it? Thank you.

from fpn.

Memory allocation problem about fpn HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent