cfzd / fcanet Goto Github PK
View Code? Open in Web Editor NEWFcaNet: Frequency Channel Attention Networks
License: MIT License
FcaNet: Frequency Channel Attention Networks
License: MIT License
我的输入尺寸是(64,48,48)若使用原始c2wh = dict([(64,56), ( 128,28), (256,14) ,(512,7)]) ,再用adaptive_avg_pool2d() 由48变为56(小尺寸变大尺寸),会不会损害性能呢,应该怎么处理较好呢?麻烦赐教
Hi, I noticed that in your paper you computed FCAnet model FLOPs.
I wonder how do you compute the FLOPs of 2d dct? Could you provide your formula or code?
Thanks!
it is a great work!
BTW, where can i find the supplementary file?
您好,请问如果想要在三维网络中应用FCA应该如何改动
我想问一下这个可以直接提取图片的低频分量吗,会比可学习的DCT更好吗
Thanks for your work!!
Have you tried using fcanet to train classification tasks on cifar10 or cifar100?. If you have tried, what is the frequency components setting?
您好,我已拜读了您的文章,其中提供了关于二维的证明公式。
那么请问一维的全局平均池化(GAP)是否可以被视为一维离散余弦变换(DCT)的特例呢?
您好,非常欣赏您的idea,所以尝试跑一下您的分类模型。
我下载了ImageNet2012数据集之后,尝试启动您的模型,遇到了以下问题,想请教一下是否我的哪些设置出错了?
错误信息如下:
Traceback (most recent call last):
File "main.py", line 643, in
main()
File "main.py", line 389, in main
avg_train_time = train(train_loader, model, criterion, optimizer, epoch, logger, scheduler)
File "main.py", line 471, in train
prec1, prec5 = accuracy(output.data, target, topk=(1, 5))
File "main.py", line 631, in accuracy
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
大于7X7大小的图像,是如何选取在某个频域上的值呢?
例如:1414大小图像经过dct得到1414的dct结果,如果选取最低频的分量,那么实际的选取是,最左上角的2*2大小结果的平均值吗?
没有理解代码里的计算方式,希望得到答疑
i can not find your model init code? can any body tell me?thanks
Hi,I've read your paper. It's a good job .
Could you provide your visualization code of papar(FcaNet) Figure 5 & papar Figure 6 ?
Thanks a lot!
As in your code, the FcaBottleneck expansion is 4 and FcaBasicBlock is 1, FcaBottleneck has one more layer of convolution than FcaBasicBlock, so how should I choose which module to use ?
Hello, could you please tell me how to calculate the indices of the three methods in the code?thank you!
输入要经过下采样和上采样,输入图片也就需要缩小和放大好几次,那我是不是要在__init__就把input size传进去,有什么办法能实时获取输入大小吗?
我有尝试在forward函数里获取x的大小,但后面训练很慢,我就意识到dct应该需要在搭建网络时预生成。
您好,请问如何进行LF、TS、NAS几个实验呢?尤其是TS,没有太明白code里是怎么实现的
Hi, I want to know how did you select the frequency components like Figure6?
I want to select 1, 3, 6, 10 frequencies like zigzag DCT.
And, I want to know the meaning of the numbers in the layer.py.
num_freq = int(method[3:])
if 'top' in method:
all_top_indices_x = [0,0,6,0,0,1,1,4,5,1,3,0,0,0,3,2,4,6,3,5,5,2,6,5,5,3,3,4,2,2,6,1]
all_top_indices_y = [0,1,0,5,2,0,2,0,0,6,0,4,6,3,5,2,6,3,3,3,5,1,1,2,4,2,1,1,3,0,5,3]
mapper_x = all_top_indices_x[:num_freq]
mapper_y = all_top_indices_y[:num_freq]
elif 'low' in method:
all_low_indices_x = [0,0,1,1,0,2,2,1,2,0,3,4,0,1,3,0,1,2,3,4,5,0,1,2,3,4,5,6,1,2,3,4]
all_low_indices_y = [0,1,0,1,2,0,1,2,2,3,0,0,4,3,1,5,4,3,2,1,0,6,5,4,3,2,1,0,6,5,4,3]
mapper_x = all_low_indices_x[:num_freq]
mapper_y = all_low_indices_y[:num_freq]
elif 'bot' in method:
all_bot_indices_x = [6,1,3,3,2,4,1,2,4,4,5,1,4,6,2,5,6,1,6,2,2,4,3,3,5,5,6,2,5,5,3,6]
all_bot_indices_y = [6,4,4,6,6,3,1,4,4,5,6,5,2,2,5,1,4,3,5,0,3,1,1,2,4,2,1,1,5,3,3,3]
mapper_x = all_bot_indices_x[:num_freq]
mapper_y = all_bot_indices_y[:num_freq]
else:
raise NotImplementedError
return mapper_x, mapper_y
Hi, @cfzd
I have paid attention to FCANet from last year, so is this official code?
你好,我是一名深度学习初学者,我添加了两个FCA模块使原模型的mIOU提升了2.3,效果很好;
然而对于通道分组,我有一些其他的看法;
如果分组的通道中表示不同的信息,每个分组再使用不同的频率分量,这似乎会造成更多的信息丢失吧,因为DCT可以看作是一种加权和,可以从论文中看到除了GAP是对每个通道上像素的一视同仁,其他的都是对空间上某一个或几个部分注意的更多,这显然是存在偏颇的,这似乎也能解释为什么单个频率分量实验中GAP的效果最好;在这种情况下,对通道进行分组,或许会造成更多的信息损失?
我仔细思考了下,我认为FCAwork的原因主要是存在通道冗余以及DCT加权形成的一种“互补”
因为存在通道冗余,进行通道分组时可能某些分组中的信息相近,并且这些分组的权重是“互补”的,比如一个权重矩阵更注重左半边,一个更注重右半边这样。似乎模块学习这种‘稀疏’的关系效果会更好。
可以认为FAC比SE更充分的使用了冗余的通道。
考虑了两个实验来证明,
不对减小输入的通道数,将FCA与原模型或是SE进行对比,当通道减少到一定程度时,信息没有那么冗余,这时应该会有大量的信息丢失,精度相较于原模型更低;
关于频率分量的选择,选取某些“对称”“互补”的权重矩阵,而不是通过单个频率分量的性能的来选择,并且去除那些"混乱”的权重矩阵,因为单个频率分量证明这种混乱的权重并没有简单分块的效果好
另外可以在大通道数使用大的分组,在小通道数使用小的分组,来检验是否会获得更好的性能
不能完全表达我的意思,如有错误,恳请指出!
Hi, 请问,如果通道不分组,是不是用以下代码就可以?
def get_dct_filter(self, tile_size_x, tile_size_y, mapper_x, mapper_y, channel):
dct_filter = torch.zeros(channel, tile_size_x, tile_size_y)
for i in range(channel):
for _, (u_x, v_y) in enumerate(zip(mapper_x, mapper_y)):
for t_x in range(tile_size_x):
for t_y in range(tile_size_y):
dct_filter[i, t_x, t_y] = self.build_filter(t_x, u_x, tile_size_x) * self.build_filter(
t_y, v_y, tile_size_y)return dct_filter
你好,我想请教关于split的问题。论文里说将输入channels分成n个部分来分别应用不同频率的DCTfilter,这里分成n个部分是出于什么目的?为什么不是对所有channels都应用一遍不同的DCTfilter?是出于计算量的考虑吗?
在layer.py中有:
class MultiSpectralAttentionLayer(torch.nn.Module):中有
self.dct_layer = MultiSpectralDCTLayer(dct_h, dct_w, mapper_x, mapper_y, channel)
可见dct_h在前, dct_w在后 就是h在前,w在后
而在class MultiSpectralDCTLayer(nn.Module):中
def init(self, width, height, mapper_x, mapper_y, channel):
可见 width在前,height在后,就是w在前,h在后
请问这有什么说处么?我晕了
self.register_buffer('weight', self.get_dct_weights())
self.fc = nn.Sequential(
nn.Linear(c2, c2 // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(c2 // reduction, c2, bias=False),
nn.Sigmoid()
)
这个函数get_dct_weights()的参数如何设置
这个模块不知道怎么使用呀?有建议吗
您好,看了您的代码,但是没有找到通道切分是在哪里进行的,方便赐教一下吗?
在这个类中MultiSpectralAttentionLayer有以下部分。
if h != self.dct_h or w != self.dct_w:
x_pooled = torch.nn.functional.adaptive_avg_pool2d(x, (self.dct_h, self.dct_w))
# If you have concerns about one-line-change, don't worry. :)
# In the ImageNet models, this line will never be triggered.
# This is for compatibility in instance segmentation and object detection.
如果我的任务是目标检测,我该怎么设置self.dct_h和self.dct_w?
get_freq_indices函数中的列表内容是用什么方法预先定义的呢,因为我想把您的工作迁移到一维数据上面
你好!
在使用
# learnable DCT init
self.register_parameter('weight', self.get_dct_filter(height, width, mapper_x, mapper_y, channel))
# learnable random init
self.register_parameter('weight', torch.rand(channel, height, width))
这两种初始化方法时,会出现以下bug:
TypeError: cannot assign 'torch.FloatTensor' object to parameter 'weight' (torch.nn.Parameter or None required)
help plz :)
不是只要修改一行吗,怎么里面还要设置这么多参数
您好,请问下get_freq_indices这个函数中那些indices数组是什么作用阿,没太看懂其中规律。
感谢!
How can I set dct_h and dct_w if i want to add FCA layer into another model. My feature maps for the layer I want to inset Fca layer are 160x160, 80x80, 40x40, 20x20
Please advise.
In my model, the output feature map shape is (512, 16, 16), but I am worried that the adaptive_avg_pool2d() operation in the layer.py file will cause information loss. So I want to ask if the parameter: c2wh = dict([(64,56), ( 128,28), (256,14) ,(512,7)]) need to be changed?
elif 'bot' in method:
all_bot_indices_x = [6,1,3,3,2,4,1,2,4,4,5,1,4,6,2,5,6,1,6,2,2,4,3,3,5,5,6,2,5,5,3,6]
all_bot_indices_y = [6,4,4,6,6,3,1,4,4,5,6,5,2,2,5,1,4,3,5,0,3,1,1,2,4,2,1,1,5,3,3,3]
Hi author, thank for you great work. I'm implementing MultiSpectralAttentionLayer using Tensorflow, but I having some trouble with MultiSpectralAttentionLayer(MSA) making the trainning process quite slow, I think there was a mistake in re-implementing MSA. I cannot find alter for register_buffer
to create fixed DCT init in Tensorflow so it make problem. Can you review it?
def get_freq_indices(method):
assert method in ['top1', 'top2', 'top4', 'top8', 'top16', 'top32',
'bot1', 'bot2', 'bot4', 'bot8', 'bot16', 'bot32',
'low1', 'low2', 'low4', 'low8', 'low16', 'low32']
num_freq = int(method[3:])
if 'top' in method:
all_top_indices_x = [0, 0, 6, 0, 0, 1, 1, 4, 5, 1, 3, 0, 0, 0, 3, 2, 4, 6, 3, 5, 5, 2, 6, 5, 5, 3, 3, 4, 2, 2, 6, 1]
all_top_indices_y = [0, 1, 0, 5, 2, 0, 2, 0, 0, 6, 0, 4, 6, 3, 5, 2, 6, 3, 3, 3, 5, 1, 1, 2, 4, 2, 1, 1, 3, 0, 5, 3]
mapper_x = all_top_indices_x[:num_freq]
mapper_y = all_top_indices_y[:num_freq]
elif 'low' in method:
all_low_indices_x = [0, 0, 1, 1, 0, 2, 2, 1, 2, 0, 3, 4, 0, 1, 3, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4]
all_low_indices_y = [0, 1, 0, 1, 2, 0, 1, 2, 2, 3, 0, 0, 4, 3, 1, 5, 4, 3, 2, 1, 0, 6, 5, 4, 3, 2, 1, 0, 6, 5, 4, 3]
mapper_x = all_low_indices_x[:num_freq]
mapper_y = all_low_indices_y[:num_freq]
elif 'bot' in method:
all_bot_indices_x = [6, 1, 3, 3, 2, 4, 1, 2, 4, 4, 5, 1, 4, 6, 2, 5, 6, 1, 6, 2, 2, 4, 3, 3, 5, 5, 6, 2, 5, 5, 3, 6]
all_bot_indices_y = [6, 4, 4, 6, 6, 3, 1, 4, 4, 5, 6, 5, 2, 2, 5, 1, 4, 3, 5, 0, 3, 1, 1, 2, 4, 2, 1, 1, 5, 3, 3, 3]
mapper_x = all_bot_indices_x[:num_freq]
mapper_y = all_bot_indices_y[:num_freq]
else:
raise NotImplementedError
return mapper_x, mapper_y
class MultiSpectralAttentionLayer(tf.keras.layers.Layer):
def __init__(self, channel, dct_h, dct_w, reduction=16, freq_sel_method='top16'):
super(MultiSpectralAttentionLayer, self).__init__()
self.reduction = reduction
self.dct_h = dct_h
self.dct_w = dct_w
mapper_x, mapper_y = get_freq_indices(freq_sel_method)
self.num_split = len(mapper_x)
mapper_x = [temp_x * (dct_h // 7) for temp_x in mapper_x]
mapper_y = [temp_y * (dct_w // 7) for temp_y in mapper_y]
self.dct_layer = MultiSpectralDCTLayer(dct_h, dct_w, mapper_x, mapper_y, channel)
self.fc = tf.keras.Sequential([
tf.keras.layers.Dense(channel // reduction, use_bias=False),
tf.keras.layers.ReLU(),
tf.keras.layers.Dense(channel, use_bias=False),
tf.keras.layers.Activation('sigmoid')
])
def call(self, x):
n, h, w, c = x.shape
x_pooled = x
if h != self.dct_h or w != self.dct_w:
x_pooled = tf.image.resize(x, (self.dct_h, self.dct_w))
y = self.dct_layer(x_pooled)
y = self.fc(y)
y = tf.expand_dims(tf.expand_dims(y, axis=1), axis=1)
return x * y
class MultiSpectralDCTLayer(tf.keras.layers.Layer):
def __init__(self, height, width, mapper_x, mapper_y, channel):
super(MultiSpectralDCTLayer, self).__init__()
assert len(mapper_x) == len(mapper_y)
assert channel % len(mapper_x) == 0
self.num_freq = len(mapper_x)
self.height = height
self.width = width
self.mapper_x = mapper_x
self.mapper_y = mapper_y
self.channel = channel
self.weight = tf.Variable(initial_value=self.get_dct_filter(), trainable=False, name='weight') # In your model, you used self.register_buffer to create fixed DCT init and I cannot find alter in Tensorflow
def call(self, x):
x = x * self.weight
result = tf.reduce_sum(x, axis=[2, 3])
return result
def build_filter(self, pos, freq, POS):
result = math.cos(math.pi * freq * (pos + 0.5) / POS) / math.sqrt(POS)
if freq == 0:
return result
else:
return result * math.sqrt(2)
def get_dct_filter(self):
dct_filter = np.zeros((self.height, self.width, self.channel))
c_part = self.channel // self.num_freq
for i, (u_x, v_y) in enumerate(zip(self.mapper_x, self.mapper_y)):
for t_x in range(self.height):
for t_y in range(self.width):
dct_filter[t_x, t_y, i * c_part: (i + 1) * c_part] = \
self.build_filter(t_x, u_x, self.height) * self.build_filter(t_y, v_y, self.width)
return tf.constant(dct_filter, dtype=tf.float32)
如题。如何理解和解释,固定的DCT比可学习的方式更好?也就是说,网络无法学习或者很难学习到一个更好的结果,来作为频率分量模板?
请问top里面的最优频率分量 是不是只适用于77,一旦改变尺寸为4848 最优频率分量是否需要更换,还是仍起作用?请指点一下
不是说和SENet相比就修改一行代码吗 , 而且找不到get_dct_weights()这个函数
top1和top2操作后出来的向量维度是一样的吗
您好,非常感谢您的杰出工作,我使用了您的代码和模型在imagenet上FcaNet50得到的验证集准确率为78.39,请问是正常的吗,环境是Ubuntu20.04,cuda11.6,pytorch1.10,4卡3090,期待您的回复
大家好,我是做风机叶片的缺陷检测,YOLOX网络可以使用Fca吗?
在代码里怎么验证[0,0]分量就是gap
想请问一下,这些频率分量是怎么确定的呀?
作者您好:
您是否方便提供一下论文表1中的ResNet50您训练所得的top-1为77.27的结果权重呢?万分感谢。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.