pilab-cau / computervision-2401 Goto Github PK

View Code? Open in Web Editor NEW

9.0 16.0 3.0 13.62 MB

Computer Vision Course 2024-01

License: Apache License 2.0

Python 9.30% Jupyter Notebook 90.70%

computervision-2401's Introduction

ComputerVision-2401

Computer Vision Course 2024-01

computervision-2401's People

Contributors

Stargazers

Watchers

Forkers

muxiddin19 imsongpasimin jleem99

computervision-2401's Issues

[lecture11][0610] About CutMix or Cutout

In the lecture, @yjyoo3312 explained that random cropping of images may exclude object regions, which can decrease accuracy.

However, I think the Cut-Mix or Cut-Out also excludes object regions, as shown in the image below. If the object region is small, the Cut-Out randomly cuts a patch that might contain the object region. Cut-Mix has a similar issue as Cut-Out. Is it necessary to additionally annotate where the object is and its size to avoid excluding the object region?

[Lecture2][0403] About impulse functions

Hello Professor Youngjoon Yoo

This is a question I asked you in person in a class quite a while ago, and I wanted to write down my question and share it with other students.

The question I asked at the time was, "You said that the impulse function can check the value of the kernel, but in computer vision, we already know the value of the kernel, so why do we need to use the impulse function?

My understanding of the answer was as follows.
'Actually, impulse functions are not only used in computer vision, but also in signal processing, and there is a reason to use impulse functions because they are used in situations where the kernel is not known, but in computer vision, the impulse function is only a mathematical function because it is imported from signal processing.

I'm raising this issue to confirm that my understanding is correct.

Hyunwoong LIM

[Lecture4][0328] Proof of Harris Detector Formulation's approximation

Here's the proof of Harris Detector Formulation's approximation by using Taylor expansion

[Lecture9][0616] Degree to reduce the parameters of depthwise/pointwise convolution

Hello, professor @yjyoo3312. I understand that both the pointwise convolution and the depthwise convolution can reduce the parameters because they reduce the dimensions. So, are the two convolutions different in the degree to which they reduce the parameters?
-Kim JiHyeon

[lecture12-2][0612] Small typo in the lecture slide

There's a really really minor typo in the lecture slide p27.

As you can see, above slide is incorrectly written as (B, BN), even though it should calculate the IoU of B1 and BN.

Thank you.

[lecture12][0607] Inverted Bottleneck

The reason for using the traditional bottleneck architecture is to reduce the number of parameters and computational cost. The inverted bottleneck, as I understand it, is illustrated on the right. I am curious if this inverted bottleneck structure, despite the intermediate expansion of channels, affects the number of parameters or computational cost.

Sumin Park
Thank you.

[Lecture10][0616] About nesterov

I know that "Nesterov Momentum anticipates the next step's location by first applying momentum in the current direction from the current position, then computing the gradient at that position, and moving in that direction once more."

I think I have some understanding of what "nesterov" is. However, I don't understand what is the status of "if nesterov" in the pseudo code in the image above.
My understanding is that if nesterov is applied, it should be applied in every step, and if not, it should not be applied at all. I don't know how the if nesterov state can exist..

I would be grateful if you could tell me about "if nesterov"...

[Lecture3][0321] Small error in lecture slide

Hello, professor. I'm junhee han , taking a computer vision lecture.
There is a small error in lecture#3 slide p42. I think the (2,3) on the second matrix should be zero.
Thank you.

[Lecture4][0416] Question about difference between Canny edge operator and Harris corner detector

@yjyoo3312 Hello Professor, my name is Jeon Yonghyeon, and I am currently enrolled in your class. I am writing to inquire about the reasons behind the differences in the algorithms of the Canny edge operator and the Harris corner detector.

In the image aboves, the algorithm for the Canny edge operator applies a Sobel filter to a Gaussian-smoothed image, whereas for the Harris corner detector, it seems that the gradient is first calculated using the Sobel filter, followed by Gaussian smoothing. What is the reason for these differences in the order of operations between the two algorithms?

My hypothesis is that the Harris corner detector necessitates the calculation of the window function (Gaussian smoothing) later in the process due to transformations in the formula.

However, in Lecture 3, there is an image showing the problems that arise when calculating the gradient before smoothing. Does the Harris corner detector not encounter these problems? Or would it be acceptable for the Canny edge detector to also compute the gradient before applying Gaussian smoothing?

Thanks for watching my issue!

[lecture10][0605] 3x3 kernels in AlexNet

I'm curious about why AlexNet use a 3x3 convolution corresponding to the red range that I checked on the image. Also, why is the last convolution of that part has fewer filters than the previous two convolutions? (384>256)

Minjung Kim

[lecture9][0502] Why can a group convolution get a 1/G computation reduction (including misleading statements)?

This issue is about group convolution getting a computation reduction.
We would appreciate it if you could confirm that this math is correct and let us know if you have any corrections.

The idea of a group convolution is to divide the number of channels.
However, one of the common misconceptions is that since you're dividing the channels, you're doing G many convolutions, so there's no computational gain.

The reason for this misconception is that you need to divide not only the channels, but also the channels in the kernel into groups, which is sometimes well illustrated with pictures, but is easy to miss.

Therefore, I have attached the math for this in the photo below.

Thank you.

Hyunwoong Lim

[lecture12][0523] About getting the weight parameter of Bounding Box Regression

@yjyoo3312

Hi, professor! I have question about getting the weight parameter w of Bounding Box Regression.

When calculating w, the equation seems like using MSE loss and ridge regression form.

But you mentioned that the BBox regression is not easy process, so it cannot be solved using just MSE Loss.

I am confusing about this part. So can you explain a little bit more about this issue?

Thank you!!!

[Lecture5][0405] Understanding the array appended to A in Homography's function

When I was trying to find the homography in the picture above, I was told that the array that A.appends to would make sense if I tried it.
I created an issue to confirm my understanding and to share it with other students.
Here's what I want to write

the calculation for A.append([-x.-y,-1,0,0,0,0,xu,yv,u])
why only 4 points are needed as a minimum

Also, as I understand it, I would like to know why it is okay to leave h_22 at 1.

probably a picture will explain the formula, but I'll use English again.
Light blue writing -> A matrix of A with n dots appended, and h is the value of zero.
If we take the first row of A as an example, it looks like this
What we need to be satisfied with is that H*(src point) = (dst point), and if we arrange the expressions, we get the three expressions on the far right of the picture.
You can see that the first row has the expression for the condition in the calculation.
So if we calculate it, we can see in red that -u_1+1*u_1=0.

If we extend this to v_1, we can see that the equation is satisfied.

So why are there at least 4 possible points?
There are a total of 9 unknowns in H that we need to know. But since you said it's okay to let h_22 =1 here, we have a total of 8 unknowns.
For every point, we have 2 given points, so we need at least 4 points.

However, I still don't understand why it's okay to have h_22 = 1.

I would like you to confirm if the following understanding and explanation is correct.

Thank you.

[Lecture0][0305] How to upload questions?

When you upload the issue, please follow the title format in this issue.

[Lecture10][0509] Any personal tips for finding where the high loss came from?

Hello, @yjyoo3312. I have a question �about training the model.

When training AI model, many factors besides the model architecture determine its performance.
It can be challenging to identify whether high loss is due to issues with the architecture itself or the training setting(including the optimizer, hyperparameters, learning rate scheduler, or training duration)

Very often I can't decide if the architecture is a problem or if the training setting is a problem.

Are there any personal tips for finding where the high loss came from??

Thank you!!

[lecture9][0502] Convolutional Layers + Activation

It seems like convolution layers produce the desired number of outputs for the specified kernels without ReLU function. Then, for page 55, what's the necessity of the ReLU function here? I'm curious about what output would be produced if the ReLU function computation is added compared to the output which doesn't contain that function.

Minjung Kim (20210172)

[Lecture4][0326] Questions about fast approximation in Harris corner detection

@yjyoo3312 I have a question about fast approximation in harris corner detection!

As you can see in this figure's highlighted formula, the theta value is computed using eigenvalues of matrix M.

But you mentioned that the benefit of this fast approximation is We do not need to calculate Eigenvalue.

How can I understand this part?

Thank you! :)

[Lecture1][0418] Regarding the Bhattacharya distance of histograms

Hello, @yjyoo3312. My name is JongHan Leem.

Upon reviewing Lecture 1, the part where we discuss the similarity of two images using Bhattacharya distance of their histograms, I noticed that the formula for Bhattacharya distance might be incorrect.

According to the definition of Bhattacharyya distance,
The formula for Bhattacharya distance is given by:
$$D_B = -\ln(BC(P,Q))$$

where $BC$ is the Bhattacharya coefficient, which is calculated as:
$$BC = \sum_{x \in \mathcal{X}} \sqrt{P(x)Q(x)}$$

However, in the code, the Bhattacharya coefficient is calculated as:

bc = np.sqrt(np.sum(hist1 * hist2))

And also, since the Bhattacharya distance measures the similarity between two probability distributions, the input histograms should be normalized, such that they can be treated as probability distributions. i.e. the histograms should be normalized using the L1 norm so that the sum of the elements in the histogram is equal to 1. (Currently, the code normalizes histograms using L2 norm)

It might be a subtle thing, but I will create a PR about this!
I also created a Jupyter Notebook file to show some visualizations :)

Before fix

After fix

[Lecture3][0409] Question of Sobel fileter in lecture 3

@yjyoo3312
I have a questions in lecture 3
In lecture slide, Sobel filter of Sx is $\begin{bmatrix}-1&0&1\\-2&0&2\\ -1&0&1 \end{bmatrix}$ and Sy is $\begin{bmatrix}1&2&1\\0&0&0\\ -1&-2&-1 \end{bmatrix}$
But in the source code of lecture 3, you used filter2D without filpping the Sobel kernel. I heard that cv2.filter2D does actually compute correlation, not the convolution. OpenCV doc: https://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#filter2d

h_x = (1/8)*np.array([[-1.0, 0.0, 1.0],
                      [-2.0, 0.0, 2.0],
                      [-1.0, 0.0, 1.0]])

h_y = (1/8)*np.array([[1.0, 2.0, 1.0],
                      [0.0, 0.0, 0.0],
                      [-1.0, -2.0, -1.0]]) 

lenna_grad_x = cv2.filter2D(lenna_gaussian, -1, h_x, borderType=cv2.BORDER_CONSTANT)
lenna_grad_y = cv2.filter2D(lenna_gaussian, -1, h_y, borderType=cv2.BORDER_CONSTANT)

So I was wondering if the kernel of the filter doesn't have to flip?

[Lecture12][0617] About Deep Residual Network Architecture

lec 11 45p (with minor typo)

lec 12 15p

Hello,

I am writing to ask about the differences between the ResNet architectures shown in lecture 12-1 (page 15) and lecture 11 (page 45). It appears that there are discrepancies in the detailed structure of ResNet between the two slides, particularly in terms of the block structure and the filter size of the pooling layer, as highlighted by the boxes in the images. Could you explain the reasons behind these differences?

Thank you for your time.

[lecture10][0610] questions about hidden layer notation

I have a question about the notation in Fully Connected Layer slide, on page 6 of lecture 10. I think the notation "# hidden layer" should be "# hidden units" or "# hidden neurons", because we are calculating the dimensions within a layer.
Is my opinion correct? Or, do we use the two notations interchangeably?

[lecture12][0614] Inquiry Regarding "Black Dog at Center" Classified as False Negative

I am writing to inquire about the specific reason why the "Black dog at Center" example on page 34 of our lecture slides is classified as a false negative.

Could you please clarify the following points regarding this classification?

Thank you.

[Lecture11][0517] Is residual block itself leads to an increase in computation?

I understand the idea of the residual block.

For an input x, the plain block trained on the output, H(x).
However, in the residual block, we define and train a new residual: F(x)+x.

Since it is very difficult to learn the ideal mapping function H(x), we use F(x), which is a slightly more trainable form.
By adding the input x to F(x), the residual method brings stability to the optimization process by learning only the additional information it needs, while retaining what it has already learned.

Intuitively, it seems that adding only x to identical blocks would lead to more computation.
However, the overall residual network has rather fewer FLOPs.

So my question is:

Is it true that "residual blocks themselves" bring optimization stability compared to plain blocks, but lead to an increase in computation?
If it's true, is the computation increase in residual blocks largely prevented "only by the bottleneck blocks", so that the overall residual network has a computation decrease?

Thank you!
HaeSeong Kim

[lecture2][0410] Rotation&Scaling != Scaling&Rotation

We know that Translating&Scaling is not same as Scaling&Translating.
So I was curious that Rotation&Scaling is also different with Scaling&Rotation

So Rotation&Scaling != Scaling&Rotation But if Sx == Sy: can be same

thanks for reading!

[lecture12][0521] About ReLU6, ShuffleNet, and More

Hello, professor, I'm raising this issue because I'd like to review what we covered in class today about ReLU6, ShuffleNet, etc. and make sure I'm correct. I'd also like to ask if you know anything about the Shift method for computational reduction.

This issue is a complication of other questions, so I hope that this doesn't confuse anyone, as only the second question is included in the class, and the third question is for private study.

Here's the question in a nutshell (the questions are labeled with numbers)

"ReLU"

ReLU6 is min(max(0, x), 6), where the value above a certain point is 6.

I was told that the value of 6 is an experimental result and not because 6 is a special number, but I wonder if this is correct. Also, when I searched for it, I found that it is used as 8 in certain experiments.(link1)

"ShuffleNet"

I understood that ShuffleNet is a groupwise convolution that I learned before, and after 1x1 among the groups, the feature maps that came out are distributed one per group to create new groups, and this is the convolution, but I wonder if it is correct.

"Shift"

This is something I came across while looking for methods for computational reduction, and the original paper referenced the following. (Wu, Bichen, et al. "Shift: A zero flop, zero parameter alternative to spatial convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.)

In a nutshell, the convolution is performed by shifting the feature map that comes out of the convolution up, down, left, and right, and excluding the parts that are beyond the original range.

I'm wondering if this method has been used in practice, and what are some other methods for computational reduction other than shufflenet and shift that you've used or know of?

Thank you.

Hyunwoong LIM

[lecture12][0521] Curious about residual network and inverted residual network

Regarding page 28 of yesterday's lecture slides,

On the left side, you mentioned that in the inverted residual network, the activation function is not used in the final 1x1 part due to the risk of information loss.
So I thought the performance of this network should generally be better.
However, on the right table, it shows that the performance of ReLU6 bottleneck is lower.
I'm wondering if there is something wrong with my understanding on this part.

Minjung Kim (20210172)

[lecture9][0614] About the type of convolution

Why is Conv_channel1 on the first slide different from Conv_channel1 on the second slide, but the type of Convolution (name?) that represents them? (channelwise Conv, pointwise Conv) They have the same components.
Next, I was wondering what determines the characteristics of each convolution. At first, I thought it was the components that make up each convolution (like C_in, C_out, ...). I guess the position where the convolution is located has affected it. I look forward to hearing from the expert :)

[Lecture10][0611] Small error in lecture slide

There is a small error in lecture#10 slide p29. I think the sigmoid function formula should be e^(-x).
Thank you.

[Lecture2][0312] More clear derivation of Rotation Matrix R

Hello everyone, in discussing how the Transformation Matrix R is derived, we've noticed the explanation might be a bit confusing, specifically regarding clockwise rotation.

Thanks for bringing up this question; we'll provide a revised explanation.

The essential idea is that to rotate a point by \theta, we must rotate the coordinate frame by -\theta.

Hence, the derivation process will be adjusted accordingly.

Thank you for the comment! and the slide including the changes will be updated in eclass.

[lecture4][0412] Regarding rotation invariant in Window functions in Harris corner

Thanks for the comment in the lecture.

I thought about the rotation invariance of the two window functions in:

Certainly, the uniform window itself does not guarantee the rotation invariance (but it is right that we will find corners by seeing eigenvalues, which will implicitly concern the rotation invariance).

Also, for a continuous 2D-Gaussian filter, it is a rotation invariant, but strictly saying not for 3x3 Gaussian filter.

(rotation invariant)

(actually not)

So, Forget about rotation invariance in this slide, it would be controversial in my opinion:)

[lecture12][0607] About skipping skip-connection in MobileNet v2 (stride =2)

In MobileNet v2, when the stride is 2, I thought there would be more feature loss than when the stride is 1, so I thought it was necessary to add the input at the end of the part. But why is skip-connection not applied when stride is 2?

Minjung Kim

[Lecture1][0307] Fixes in the Lecture

Thank you for attending today's course! we have two things to be fixed in the slide.

Slides 47-49

After the comment from ..(sorry, I forgot to ask your name), I checked again.

OpenCV depict the pixels in [row, col, channel] order. I always confuse and than you for the comment.

Slide 70,71

The order of sqrt and sqrt has been reversed! Thank you for the comment (sorry, I forgot to ask your name, too).

How to get alarm

[Lecture1][0307] Question about domain of image function in slide number 47.

This set equation above might for mapping two-dimensional set to one-dimensional set for the domain of image function and i'd like to change little bit for accurate correspondence.

Is it right to change like above?
Thank you!

[Lecture10][0507] Small error in the lecture slide

I found a really minor error in the lecture slide p25 & p32.

As you can see, PReLU is incorrectly written as 'max(0, x)'.

I think it can be revised as
'max(x, αx) for α ∈ (0,1)'.

Thank you.
Jaeyung Kim

[Lecture2][0415] About Filters and Convolution self study 2

@yjyoo3312, when I'm attempting to solve the self-study materials, specifically on problem 2, I have concerns that the proof regarding shift invariance may not be properly constructed. Could you please advise if the proof I have formulated is sufficient? If not, could you suggest which properties should be utilized?

[Lecture 12][0521] Comparison of the ResNet's and MobileNet v2's block design in PyTorch implementation

As professor @yjyoo3312 pointed out in the lecture, one of the key architectural changes in MobileNet v2 is the use of a linear bottleneck structure.
So here I examined the official PyTorch implementation of MobileNet v2 and compared it to ResNet.

ResNet's Basic Block and BottleNeck Block

ResNet's block design uses ReLU activation in its output.

MobileNet v2's Inverted Residual Block

As you can see here, MobileNet v2's block does not use an activation function in its final point-wise convolution, employing a linear activation.

Additionally, you can notice that the block uses skip-connection when the spatial dimensions and channels of the input and output match.

[Lecture3][0421] Regarding dangling operation of Filtering, Convolution, Correlation

As many have noticed, terms like filtering, convolution, and correlation are often used ambiguously in computer vision. The ambiguity in this field often arises because much of the implementation follows the conventions from widely used libraries, such as OpenCV. For example, OpenCV's cv2.filter2D operation is implemented as a correlation by default, a common assumption in the field. This leads to older filters being set up for correlation.

Similarly, OpenCV uses BGR instead of RGB as its default color format, and while points are accessed in (x, y) order, matrix elements are accessed in (row, col) order, which can be quite confusing, even though there might be valid reasons for this.

Given this, many libraries in the field implement convolution and filtering (which is typically correlation) in different ways, so I often check the results to see how the operation actually works. If it's working as intended, I use it; otherwise, I flip the kernel and try again.

However, I need to be extra careful when it comes to exams; I'll be sure to define notations precisely in the questions. As long as you're aware of the difference between convolution and correlation, you should be fine.

Thanks for the comment! @nshuhsn

[lecture9][0521] KNN, an equal number of samples for each class label

Hi, professor! I have a question about K-nearest neigbor classifier.

When k = 4, the number of blue label and red label sample is same, as 2.

Then, what is the output of this classifier? Or, how we classify in this case? Is there any additional criteria to determine?

Thank you!

[Lecture3][0321]Questions about LoG, DoG

@yjyoo3312, I have two questions about Laplacian of Gaussian (LoG)!

When using LoG, if edge detection is based on zero-crossings, how do we differentiate between case 1 and case 2 in Figure, which are both zero-crossing but one is on edge and the other is not?
I heard that the reason for approximating LoG with DoG is due to computational complexity. However, since convolutional filters are fixed in advance, it doesn't seem necessary to worry about computation. Are there cases where LoG is computed multiple times?

Thank you very much!

[lecture13][0608] Faster R-CNN: TradeOff between number of anchor boxes and FPS

Hi professor!

I have question about setting number of anchor boxes.

By default, we set it as 9 in usual case.

However, If we set it to lower value, I can expect more FPS without losing performance loss.

Is there any research or results about adjusting this value??

Thank you!

[Lecture9][0614] Demension of fully connected layer

First, I checked issue #39. I'm curious about the dimension of W. Like (3x32x32)x(#hidden=10)=30K, I'm curious about what 3 is, what is the first 32 and what is the second 32? I've never studied neural networks, so please understand that I lack relevant knowledge.

[Lecture4][0412] 4 Questions about Harris Corner, SIFT

@yjyoo3312, I have 4 questions about Harris Corner and SIFT.

This question is about The issue #5 . I want to double-check if my understanding is correct.
There are two ways to blur an image: applying Gaussian blur or performing downsampling and resampling subsequently. If blurring is done using the former method, it results in a Gaussian pyramid, while using the latter method results in a Laplacian pyramid, is it right?

It seems like the meaning of "scale" differs between the first and second images. In the first image, it appears to represent octaves, while in the second image, it seems to represent layers, i.e. different sigmas. Are both concepts related to the term "scale" and thus referred to as "scale"?

SIFT does not use Harris Corner Detector on the 'space' axis; instead, it employs DoG (Difference of Gaussians). As learned in the previous lecture, DoG approximates the Laplacian. Therefore, does SIFT treat the local maxima of the Laplacian as keypoints? And is it reasonable? I'm not sure what the local maxima of the Laplacian means.

In SIFT, has DoG on the 'scale' axis been replaced by Scale Space Extrema?

Thank you for reading my lengthy text!

[Lecture 10-2][0612] Meaning of "Weight decay removes the effect of old parameters"

I don't quite understand the idea that "weight decay removes the effect of old parameters." I know that weight decay is a regularization technique used to prevent overfitting by reducing large weights, which helps the model generalize better. Could you explain what it means by "weight decay removes the effect of old parameters"? Does it simply mean that weight decay reduces the values of large weights?

[lecture13][0612] Inquiry on the Specific Limitations of YOLO-v1 Model

I am writing to inquire about the specific limitations of the YOLO-v1 model as discussed in our recent lecture. YOLO-v1, while being an innovative and efficient object detection model, is known to have several limitations that impact its performance. I would like to understand these limitations better and verify their validity.

Could you please elaborate on the following points regarding YOLO-v1's limitations?

Detection of Multiple Objects in a Single Grid Cell: It has been noted that YOLO-v1 struggles to detect multiple objects within a single grid cell. How does this limitation affect the model’s performance in dense object scenarios?
Handling of Small Objects: The model reportedly has difficulties with small object detection due to its grid cell approach favoring larger objects. What are the specific challenges YOLO-v1 faces with small objects, and are there any particular cases where this limitation is most evident?
Bounding Box Regression Issues: YOLO-v1’s bounding box predictions can sometimes be inaccurate, leading to poor localization. How significant is this issue in practical applications, and are there known methods to mitigate it?

I would appreciate a detailed explanation of these points to understand the limitations of YOLO-v1 better. Additionally, if there are any insights or counterarguments that might provide a more balanced view, I would be very interested in hearing them.

Thank you.

[lecture9][0603] Convolutional network parameters

What exactly does this represent? Is this formula correct to indicate how to calculate the output value from the Convolutional Layer and calculate the value at each output location?

Thank you.
Sumin Park

[Lecture11][0614] Small typo in slide & questions about dimension order in Pytorch

typo

p.5 men -> mean
p.12 dtaset -> dataset
p.14 RandomResize"d"Crop

question

p.20

NCWH -> NCHW
In PyTorch, the dimension order of input values is known to be NC"HW". However, in the CutMix slide, it seems to be in NC"WH" order. So I think in W = size[2], 2 should be changed to 3 and in H = size[3], 3 should be changed to 2.
Additionally, in the cutmix_data function on the left, since the third dimension is H and the fourth dimension is W, I think the order of bbx and bby should be swapped. Am I correct?
np.int deprecated
I knew that np.int has been deprecated in PyTorch. Currently, np.int32 or np.int64 is being used. Could you please check this?

[Lecture4][0424] Question about the representation in ppt for the SIFT scale

I know that the larger the scale, the smaller the image size should be, as shown below.

I also �think that the larger the octave and scale, the smaller the image size should be, as shown below.

However, in our PPT, the scale at 0 octave is represented as the largest.

I think this problem can be solved by reversing the direction of the arrow for scale.

Is my way of thinking correct?
I apologize if this has already been mentioned in class.

Thank you!
HaeSeong Kim

[Lecture05][0420] Ransac and homography

When we calculate H in Ransac, we set h22 = 1 because it is used to maintain 1 in translation movement. So my question is why h20 and h21 is not zero? We don't need h20 and h21 in Scaling, Rotation, Translation, so in lecture 2 we set h20 and h21 as zero.
Are there any reasons we cannot set h20 and h21 as zero?

[lecture2][0412] Question about self study

In this problem b is a vector and b1,b2,... is a scalar as far as I understand.
Then I think there should not exist transpose sign in f(x) to make multiplication work!

[lecture11][0606] Inquiry About the Role of Auxiliary Fully Connected Layers in GoogleNet

I am writing to inquire about the specific ways in which the auxiliary fully connected (FC) layers in GoogleNet help mitigate the gradient vanishing problem during training. The gradient vanishing issue is a significant challenge in deep neural network training, where gradients become progressively smaller as they are backpropagated through the layers, leading to slow learning or even a complete halt in learning for the earlier layers.

Could you please elaborate on how these auxiliary FC layers address this problem within the context of GoogleNet? I am particularly interested in understanding the mechanisms by which these layers influence the backpropagation process, the strategic placement of these layers within the network, and any additional benefits they provide beyond mitigating the gradient vanishing issue.

Thank you.

pilab-cau / computervision-2401 Goto Github PK

computervision-2401's Introduction

ComputerVision-2401

computervision-2401's People

Contributors

Stargazers

Watchers

Forkers

computervision-2401's Issues

Before fix

After fix

How to get alarm

ResNet's Basic Block and BottleNeck Block

MobileNet v2's Inverted Residual Block

typo

question

Recommend Projects

Recommend Topics

Recommend Org