Git Product home page Git Product logo

siamese-fc's Introduction

IMPORTANT. At CVPR'17 we presented CFNet, which uses a slightly modified version of SiamFC (which I have been calling v2 or baseline-conv5) to compare against that paper's Correlation Filter Network. The difference is simply that it has only 32 output channel instead of 256 and it has activations with higher spatial resolutions. Results are slightly better, speed is slightly worse. For this reason, if you are starting fresh it makes much more sense to use the more recent code from the CFNet repository, which is also a bit cleaner I think. However, if you have started with this repo, no worries. Things are just marginally different so there is no much use in switching.

Fully-Convolutional Siamese Networks for Object Tracking


Project page: http://www.robots.ox.ac.uk/~luca/siamese-fc.html

The code in this repository enables you to reproduce the experiments of our paper. It can be used in two ways: (1) tracking only and (2) training and tracking.


pipeline image


If you find our work and/or curated dataset useful, please cite:

@inproceedings{bertinetto2016fully,
  title={Fully-Convolutional Siamese Networks for Object Tracking},
  author={Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip H S},
  booktitle={ECCV 2016 Workshops},
  pages={850--865},
  year={2016}
}

[ Tracking only ] If you don't care much about training, simply plug one of our pretrained networks to our basic tracker and see it in action.

  1. Prerequisites: GPU, CUDA drivers, cuDNN, Matlab (we used 2015b), MatConvNet (we used v1.0-beta20).
  2. Clone the repository.
  3. Download one of the pretrained networks from http://www.robots.ox.ac.uk/~luca/siamese-fc.html
  4. Go to siam-fc/tracking/ and remove the trailing .example from env_paths_tracking.m.example, startup.m.example and run_tracking.m.example, editing the files as appropriate.
  5. Be sure to have at least one video sequence in the appropriate format. You can find an example here in the repository (siam-fc/demo-sequences/vot15_bag).
  6. siam-fc/tracking/run_tracking.m is the entry point to execute the tracker, have fun!

[ Training and tracking ] Well, if you prefer to train your own network, the process is slightly more involved (but also more fun).

  1. Prerequisites: GPU, CUDA drivers, cuDNN, Matlab (we used 2015b), MatConvNet (we used v1.0-beta20).
  2. Clone the repository.
  3. Follow these step-by-step instructions, which will help you generating a curated dataset compatible with the rest of the code.
  4. If you did not generate your own, download the imdb_video.mat (6.7GB) with all the metadata and the dataset stats.
  5. Go to siam-fc/training/ and remove the trailing .example from env_paths.m.example, startup.m.example and run_experiment.m.example editing the files as appropriate.
  6. siam-fc/training/run_experiment.m is the entry point to start training. Default hyper-params are at the start of experiment.m and can be overwritten by custom ones specified in run_experiment.m.
  7. By default, training plots are saved in siam-fc/training/data/. When you are happy, grab a network snapshot (net-epoch-X.mat) and save it somewhere convenient to use it for tracking.
  8. Go to point 4. of Tracking only and enjoy the result of the labour of your own GPUs!

siamese-fc's People

Contributors

bertinetto avatar shi-yan avatar zhangliliang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

siamese-fc's Issues

Suffering from problems while implementing the algorithm

Hi, Luca Bertinetto. I am doing the similar thing and just found that you've done a great job in that regard. I've read the paper and watched the video (one of your demos is just what I am working on). As far as I know, one of advantage of your model is that you avoid repeated computation by using fully convolution. I am trying to incorporating your idea to my network, but confronted with some challenges. Therefore, I would like to ask for your opinions.

When we're tracking the object, we calculate the max scores and multiply by strides. The first problem is that we can't estimate scores for all locations in the next frame because some strides in the network are larger than 1, and thus some scores of locations can't be calculated. Do you solve this problem by upsampling score maps ? The second problem is that we will need a large search images as we increase the number of downsampling layers (either in conv layer or pooling layer). In the original paper, there are three layers with stride of 2, which results in the stride of 8. If we increase the stride of the network (e.g to 32), the search image will increase largely. Is there anyway we can prevent the network from suffering this problem?

cannot reproduce your pre-trained model

Hi, @bertinetto

I have rerun your training code for 50 epochs without any modification, but I cannot reproduce your released model for color image. Specifically, it performs worse in the OTB50 benchmark compared to your released color model in all aspects BC, DEF, IV, OCC, etc.

Could you kindly point out in which way I may have done wrong?

Attached the training metrics. net-train.pdf

about negative pairs

hi,
In the paper,Negative pairs were chosen with probability 0.25, however in the experiment.m,

neg_eltwise = []; % no negative pairs at the moment

Does it mean that no negative pairs are used in the experiment or I misunderstand this?

Thanks

about the pretrained .net.mat file

Hi Luca,

is there any files can help to use the pretrained .mat file like '2016-08-17.net.mat' for training my own dataset?

thank you!

load_pretrained

Hey, I tried to run the pre-trained model and find that in [~,xcorrId] = find_layers_from_type(net, 'xcorr'); xcorrId would be [ ] empty and the next line xcorrId = xcorrId{1}; would encounter a error. I don't if there is any problem ?

Best regards

How about gaussian weights?

Different label weights strategy has been implemented in the code, however, intuitively, gaussian weights should work better, since we should emphasize on the center of the object, weight in the center should have a larger value. So, How about the performance of gaussian weights, @bertinetto.Has anyone tried this? Or have any ideas about why balanced weights work better than the gaussian weights?

how to deal with shared siamese during bp

Hi Bertinetto,

When you define the shared siamese net, in the backpropagation process.
When it reach the first BatchNorm, after the norm, it will set obj.moments=[] (See matconvnet/matlab/+dagnnn/BatchNorm.m, the backward() function).
Then the second BatchNorm has an empty obj.moments, it will cause error.

How did you deal with that, did you modified the code somehow?

SiamFC-R architecture design

Hi, Luca.
SiamFC is a great framework for high-speed visual tracking. The default feature extractor is similar to AlexNet.
To get a competitive preformance to recent state-of-the-art visual tracker, it's helpful to release the ResNet version.
Since SiamFC is a Fully-Convolutional Network with zero-padding, it's hard to use identity mapping.

I try to use 2 max pooling(3x3) layers in the identity mapping branch to get a compatible resolution, but the performance is degraded a lot.(0.48 on OTB2013)

load_pretrained

Hey, I tried to used your pre-trained network and there are issues coming. I start with run the tracker function and find the one network layer type is Xcorr and it would not work with error and I modified it all to 'xcorr' and find that it would return
Error using xcorr (line 66)
Not enough input arguments.
Error in dagnn.DagNN.loadobj (line 27)
block = constr() ;
I spent a lot of time and it seems not work
Best regards

Training on Negative Samples

Hi! First of all, I wanna appreciate the great work you have done in object tracking!
Could you please clarify me on training the siamese network with negative samples? i.e. images from two different imagenet sequences
In your paper, you mention something like,

We employ a discriminative approach, training the network on positive and
negative pairs and adopting the logistic loss

In line 178 of experiment.m,
neg_eltwise = []; % no negative pairs at the moment
you specify that no negative pairs are used at the moment.

Is there a reason why negative pairs were not included during training? Thanks in advance!

on model update

Hi,

One interesting aspect of your work is that no model update is employed during tracking. As the paper states: "We found that updating(the feature representation of) the exemplar online through simple strategies, such as linear interpolation, does not gain much performance and thus we keep it fixed."

So my question is,

  1. How did you conduct your experiments on model update? Can you elaborate on that?
  2. Specifically, how did you extract the feature of target on later frames that is used for model update?
  3. Is there any numerical results compared to without model update?

Thanks

error run_tracker

Hi @bertinetto

I got the following error:

run_tracker('demo-sequences/vot15-bag', true);
Warning: No stats found at /path/to/ILSVRC15-VID/stats.mat
In tracker at 49
In run_tracker at 11
No public field numChannels exists for class dagnn.BatchNorm.

Error in dagnn.Layer/load (line 195)
obj.(f) = s.(f) ;

Error in dagnn.DagNN.loadobj (line 17)
block.load(struct(s.layers(l).block)) ;

Error in load_pretrained (line 25)
net = dagnn.DagNN.loadobj(net);

Error in tracker (line 53)
net_z = load_pretrained([p.net_base_path p.net], p.gpus);

Error in run_tracker (line 11)
tracker(params);

About xCorr layer

function outputs = forward(obj, inputs, params)
        assert(numel(inputs) == 2, 'two inputs are needed');

        z = inputs{1}; % exemplar
        x = inputs{2}; % instance (search region)

        assert(ndims(z) == ndims(x), 'z and x have different number of dimensions');
        assert(size(z,1) <= size(x,1), 'exemplar z has to be smaller than instance x');
        
        
        %c->channel b->channel
        [wx,hx,cx,bx] = size(x);
        x = reshape(x, [wx,hx,cx*bx,1]);
        o = vl_nnconv(x, z, []);
        [wo,ho,co,bo] = size(o);
        assert(co==bx);
        outputs{1} = reshape(o, [wo,ho,bo,co]);
    end

Before call vl_nnconv( x,z,[] ) to compute the score, you reshape the x to [wx,hx,cx*bx,1], but after this operation, the shape of x is not the same with z,so I was confused that how can vl_nnconv can operate reasonable.

transfer standard vot format to the suitable format

base_path = '/home/qiangwang/Desktop/vot/vot15';
goal_path = '/home/qiangwang/Desktop/vot/vot15_SiameseFC';
dirs = dir(base_path);
videos_name = {dirs.name};
videos_name(strcmp('.', videos_name) | strcmp('..', videos_name) | ...
    ~[dirs.isdir]) = [];

mkdir(goal_path);
for video = videos_name
disp(video)
mkdir([goal_path '/vot15_' video{1}]);
mkdir([goal_path '/vot15_' video{1} '/imgs']);
copyfile([base_path '/' video{1} '/*.jpg'],[goal_path '/vot15_' video{1} '/imgs']);
copyfile([base_path '/' video{1} '/groundtruth.txt'],[goal_path '/vot15_' video{1} '/groundtruth.txt']);
n = numel(dir([base_path '/' video{1} '/*.jpg']));
txt_name = ['vot15_' video{1} 'frames.txt'];
fid =fopen([goal_path '/vot15_' video{1} '/' txt_name],'w');
fprintf(fid,'%d,%d',1,n);
fclose(fid);
end

Poor tracking performance

I've successfully installed your program, but the tracking performance poor. I'm pretty confident this is just an error somewhere in my configuration. Perhaps the default hyperparameters in tracker.m don't match the pretrained network I'm using?

I am using the pretrained weights in 2016-8-17.net.mat. I've also tried adding the stats file ILSVRC15.stats.mat to the configuration, but this didn't seem to affect the behavior of the tracker. I'm using matlab r2017b and I installed matconvnet from its github repo today; I presume that's the most recent version (v1.0-beta25).

Specifically, the tracker tends to drift to the left in most of the videos, which seems like a strange and non-random characteristic.

Any help here is appreciated. Thanks so much for making this work public.

Difference instanceSize when training and testing

Hi. Thanks for sharing such a good code for studying.

I have a problem when reading your code.

In the testing procedure, the instanceSize set to 255. However, during the training time the instanceSize is set to 255 - 2*8.

Why they are different? And what is the meaning of subtracting 2*8 during the training time?

Wait for your response.

what is deep-one-shot?

hi bertinetto,
In the file env_paths_training.m there is a line of code that is "addpath(genpath('/path/to/deep-one-shot/util'));" I dont know what deep-one-shot is, is this code necessary? thank you

Multi-object tracking?

Thanks for you works. I'm interested if your code can be used for multi-object tracking at the same time? The demo videos provided are all single object tracking.

adjust layer

Nice work! But I have a question why should Siamese network add the adjust layer after xccor? f*xccor+b seems has no effect on the maximum position?

About exemplar image

why the size of exemplar image is more than the true target? Obviously, this would bring in extra background in exemplar image besides the target in first frame. Would it influence the divergence of the siamese network? Why not just take the true target region in first frame as exemplar image?

net-train graph issue

Hey, thanks a lot for your work. And I follow your instruction and try to produce my own model for tracking. However, the train graph looks really weird since the errmax is almost 0 from the start of training
net-train.pdf. I train this model with around 3500 frames from video. Just wonder if you have any clue for the training.

Reference to non-existent field 'dilate'

I am using matlab 2017a and matconvnet-1.0-beta20. When I execute run_experiment, after loading imdb_video, I encounter this problem:

loading imdb video...
construct network
Reference to non-existent field 'dilate'.

Error in vl_simplenn_display (line 82)
      ks = (ks - 1) .* ly.dilate + 1 ;

Error in vid_create_net>add_block (line 60)
info = vl_simplenn_display(net) ;

Error in vid_create_net>modified_alexnet (line 141)
    net = add_block(net, opts, '2', 5, 5, 48, 256, strides(3), 0) ;

Error in vid_create_net (line 28)
    net = modified_alexnet(struct(), opts) ;

Error in make_siameseFC (line 22)
    branch = vid_create_net(...

Error in experiment>make_net (line 136)
    net = make_siameseFC(opts);

Error in experiment (line 69)
    net = make_net(opts);

Error in run_experiment (line 17)
	experiment(imdb_video, opts);

I have tried matconvnet-1.0-beta20, matconvnet-1.0-beta21, matconvnet-1.0-beta24 and all of them end with the same problem.
How can fix it?

What if the input examplar image is less than 127*127?

Hi, the tracker needs to resize the examplar image to 127127 and the next frame to 255255, however, if I want to track small object (e.g. less than 32*32), do you think is it possible? Or any hints on modifying the tracking code? Thanks.

Step 3 modification

The folder structure for Annotations also needs to be changed along with Data folder structure. In clear terms, we need to have 'Annotations/VID/train/{a,b,c,d,e}' along with 'Data/VID/train/{a,b,c,d,e}'.

Broken links

Hello,
Thanks for making this work public - just wanted to ask as to the broken links provided in the repository, can they be reset or maybe they have they been removed ?

Specifically I am talking about the links mentioned in this line:
"Run vid_setup_data.m to generate your own imdb_video.mat. Otherwise download the one we have already created - here for the one used for the ECCV'16 SiamFC, here for the one used for the CVPR'17 CFNet. 7bis. (only for CVPR'17 CFNet code) Add a field .set "

Thanks,
Abbas

about tracking speed

What decides the speed of siamese-fc? I notice that the siameseFC-ResNet still run 25fps (report in ECCV2016 paper), but we all know that ResNet network have far more parameters than AlexNet.

all training pair are positive?

Recently I am convert the matconvnet training code to pytorch version, but meet some questions.
I find that in "training/vid_get_random_batch.m", the labels here are all 1 because here just considering positive pairs. But in "training/experiment.m", in the function get_batch.m, the label_inputs will always be like a gaussian shape(because the center is 1, and the other is 0 etc). So this might train the network output a gaussian shape score map all the time.
I train it in my pytorch code, also find the loss is constant in convergence.
image

I guess this might because the label'shape is constant and the output is trained to constant

Error during training step when running run_experiment(imdb_video)

Hi @bertinetto and other siamese-fc users,
I am trying to train the network by following the instructions.
However I get the following error when I ran run_experiment(imdb_video)

>> run_experiment(imdb_video.imdb_video);
construct network

ly = 

  struct with fields:

            type: 'conv'
            name: 'conv1'
         weights: {[11×11×3×96 single]  [96×1 single]}
          stride: 2
             pad: 0
    learningRate: [1 2]
     weightDecay: [1 0]
            opts: {'CudnnWorkspaceLimit'  [1.0737e+09]}

Reference to non-existent field 'dilate'.

Error in vl_simplenn_display (line 82)
      ks = (ks - 1) .* ly.dilate + 1 ;

Error in vid_create_net>add_block (line 60)
info = vl_simplenn_display(net) ;

Error in vid_create_net>modified_alexnet (line 141)
    net = add_block(net, opts, '2', 5, 5, 48, 256, strides(3), 0) ;

Error in vid_create_net (line 28)
    net = modified_alexnet(struct(), opts) ;

Error in make_siameseFC (line 22)
    branch = vid_create_net(...

Error in experiment>make_net (line 137)
    net = make_siameseFC(opts);

Error in experiment (line 69)
    net = make_net(opts);

Error in run_experiment (line 17)
	experiment(imdb_video, opts);

I understand error is raising because there is not dilate property in the layers. However, how to fix this?
Should I simply change the line 82 in vl_simplenn_display to
ks = (ks - 1) + 1 ;
by removing the dilate? Or does it has to do something with the version of MatConvNet that I am using?

Thanks,
Jumabek

Error in training phase

I try to train the network but meeting an error.

train: epoch 01:   1/5985:Error using vl_nnbnorm
The MOMENTS size does not match the DATA depth.

Error in dagnn.BatchNorm/backward (line 29)
        vl_nnbnorm(inputs{1}, params{1}, params{2}, derOutputs{1}, ...

Error in dagnn.Layer/backwardAdvanced (line 120)
      [derInputs, derParams] = obj.backward ...

Error in dagnn.DagNN/eval (line 117)
  obj.layers(l).block.backwardAdvanced(obj.layers(l)) ;

Error in cnn_train_dag>processEpoch (line 253)
      net.eval(inputs, params.derOutputs, 'holdOn', s < params.numSubBatches) ;

Error in cnn_train_dag (line 105)
    [net, state] = processEpoch(net, state, params, 'train') ;

Error in experiment (line 102)
    [net, stats] = cnn_train_dag(net, imdb, batch_fn, opts.train);

I use matconvnet-1.0-beta25.

can any help provided ?

Thanks

configurations for duplicating color+gray model

Hi,

As the paper states, for OTB benchmark, 25% color images are randomly converted to gray images. The pretrained model is named color+gray in the project homepage:
http://www.robots.ox.ac.uk/~luca/stuff/siam-fc_nets/2016-08-17_gray025.net.mat.

Currently, I am trying to replicate training this model using configurations like:

    opts.augment.grayscale = 0.25

all other configurations remains untouched. But somehow, after training 50 epoches, I only get AUC: 0.577 using 3 scales and AUC: 0.584 using 5 scales at the OTB cvpr13 test set.

However, using the pretrained model, I get AUC: 0.582 using 3 scales, AUC: 0.605 using 5 scales.

Is there anything I am missing?

evaluate on VOT benchmark

Hi, luca

Recently, I am trying to reproduce your results in the VOT2016 benchmark. By using provided result files, the computed EAO is 0.2905.

However, evaluating the pretrained color model with 3 search scales, using VOT benchmark toolkits myself, I only get EAO 0.2247.

Below is the evaluation script. Is there anything I am missing?

function sfc_vot
    % *************************************************************
    % VOT: Always call exit command at the end to terminate Matlab!
    % *************************************************************
    cleanup = onCleanup(@() exit() );

    % *************************************************************
    % VOT: Set random seed to a different value every time.
    % *************************************************************
    RandStream.setGlobalStream(RandStream('mt19937ar', 'Seed', sum(clock)));

    % *************************************************************
    % SFC: Set tracking parameters
    % *************************************************************
    p.numScale = 3;
    p.scaleStep = 1.0375;
    p.scalePenalty = 0.9745;
    p.scaleLR = 0.59; % damping factor for scale update
    p.responseUp = 16; % upsampling the small 17x17 response helps with the accuracy
    p.windowing = 'cosine'; % to penalize large displacements
    p.wInfluence = 0.176; % windowing influence (in convex sum)
    p.net = '2016-08-17.net.mat';
    %% execution, visualization, benchmark
    p.gpus = 1;
    p.fout = -1;
    %% Params from the network architecture, have to be consistent with the training
    p.exemplarSize = 127;  % input z size
    p.instanceSize = 255;  % input x size (search region)
    p.scoreSize = 17;
    p.totalStride = 8;
    p.contextAmount = 0.5; % context amount for the exemplar
    p.subMean = false;
    %% SiamFC prefix and ids
    p.prefix_z = 'a_'; % used to identify the layers of the exemplar
    p.prefix_x = 'b_'; % used to identify the layers of the instance
    p.prefix_join = 'xcorr';
    p.prefix_adj = 'adjust';
    p.id_feat_z = 'a_feat';
    p.id_score = 'score';
% -------------------------------------------------------------------------------------------------

    startup;

    % Get environment-specific default paths.
    p = env_paths_tracking(p);
    % Load ImageNet Video statistics
    if exist(p.stats_path,'file')
        stats = load(p.stats_path);
    else
        warning('No stats found at %s', p.stats_path);
        stats = [];
    end
    % Load two copies of the pre-trained network
    net_z = load_pretrained([p.net_base_path p.net], p.gpus);
    net_x = load_pretrained([p.net_base_path p.net], []);

    % Divide the net in 2
    % exemplar branch (used only once per video) computes features for the target
    remove_layers_from_prefix(net_z, p.prefix_x);
    remove_layers_from_prefix(net_z, p.prefix_join);
    remove_layers_from_prefix(net_z, p.prefix_adj);
    % instance branch computes features for search region x and cross-correlates with z features
    remove_layers_from_prefix(net_x, p.prefix_z);
    zFeatId = net_z.getVarIndex(p.id_feat_z);
    scoreId = net_x.getVarIndex(p.id_score);
    
    % **********************************
    % VOT: Get initialization data
    % **********************************
    [handle, first_image, region] = vot('rectangle');

    % If the provided region is a polygon ...
    if numel(region) > 4
      x1 = round(min(region(1:2:end)));
      x2 = round(max(region(1:2:end)));
      y1 = round(min(region(2:2:end)));
      y2 = round(max(region(2:2:end)));
      region = round([x1, y1, x2 - x1, y2 - y1]);
    else
      region = round([round(region(1)), round(region(2)), ... 
                      round(region(1) + region(3)) - round(region(1)), ...
                      round(region(2) + region(4)) - round(region(2))]);
    end;

    irect = region
    targetPosition = [irect(2) + (1 + irect(4)) / 2 irect(1) + (1 + irect(3)) / 2];
    targetSize = [irect(4) irect(3)];

    startFrame = 1;
    % get the first frame of the video
    im = gpuArray(single(imread(first_image)));
    % if grayscale repeat one channel to match filters size
    if(size(im, 3)==1)
        im = repmat(im, [1 1 3]);
    end
    % get avg for padding
    avgChans = gather([mean(mean(im(:,:,1))) mean(mean(im(:,:,2))) mean(mean(im(:,:,3)))]);

    wc_z = targetSize(2) + p.contextAmount*sum(targetSize);
    hc_z = targetSize(1) + p.contextAmount*sum(targetSize);
    s_z = sqrt(wc_z*hc_z);
    scale_z = p.exemplarSize / s_z;
    % initialize the exemplar
    [z_crop, ~] = get_subwindow_tracking(im, targetPosition, [p.exemplarSize p.exemplarSize], [round(s_z) round(s_z)], avgChans);
    if p.subMean
        z_crop = bsxfun(@minus, z_crop, reshape(stats.z.rgbMean, [1 1 3]));
    end
    d_search = (p.instanceSize - p.exemplarSize)/2;
    pad = d_search/scale_z;
    s_x = s_z + 2*pad;
    % arbitrary scale saturation
    min_s_x = 0.2*s_x;
    max_s_x = 5*s_x;

    switch p.windowing
        case 'cosine'
            window = single(hann(p.scoreSize*p.responseUp) * hann(p.scoreSize*p.responseUp)');
        case 'uniform'
            window = single(ones(p.scoreSize*p.responseUp, p.scoreSize*p.responseUp));
    end
    % make the window sum 1
    window = window / sum(window(:));
    scales = (p.scaleStep .^ ((ceil(p.numScale/2)-p.numScale) : floor(p.numScale/2)));
    % evaluate the offline-trained network for exemplar z features
    net_z.eval({'exemplar', z_crop});
    z_features = net_z.vars(zFeatId).value;
    z_features = repmat(z_features, [1 1 1 p.numScale]);

    % start tracking
    i = startFrame;
    while true
        % **********************************
        % VOT: Get next frame
        % **********************************
        [handle, image] = handle.frame(handle);

        if isempty(image)
          break;
        end;

        if i>startFrame
            % load new frame on GPU
            im = gpuArray(single(imread(image)));
            % if grayscale repeat one channel to match filters size
            if(size(im, 3)==1)
              im = repmat(im, [1 1 3]);
            end
            scaledInstance = s_x .* scales;
            scaledTarget = [targetSize(1) .* scales; targetSize(2) .* scales];
            % extract scaled crops for search region x at previous target position
            x_crops = make_scale_pyramid(im, targetPosition, scaledInstance, p.instanceSize, avgChans, stats, p);
            % evaluate the offline-trained network for exemplar x features
            [newTargetPosition, newScale] = tracker_eval(net_x, round(s_x), scoreId, z_features, x_crops, targetPosition, window, p);
            targetPosition = gather(newTargetPosition);
            % scale damping and saturation
            s_x = max(min_s_x, min(max_s_x, (1-p.scaleLR)*s_x + p.scaleLR*scaledInstance(newScale)));
            targetSize = (1-p.scaleLR)*targetSize + p.scaleLR*[scaledTarget(1,newScale) scaledTarget(2,newScale)];
        else
            % at the first frame output position and size passed as input (ground truth)
        end
        i = i + 1;
        rectPosition = [targetPosition([2,1]) - targetSize([2,1])/2, targetSize([2,1])];
        % output bbox in the original frame coordinates
        oTargetPosition = targetPosition; % .* frameSize ./ newFrameSize;
        oTargetSize = targetSize; % .* frameSize ./ newFrameSize;
        region  = [oTargetPosition([2,1]) - oTargetSize([2,1])/2, oTargetSize([2,1])];

        % **********************************
        % VOT: Report position for frame
        % **********************************
        handle = handle.report(handle, region);
    end
      
    % **********************************
    % VOT: Output the results
    % **********************************
    handle.quit(handle);
end

Why keep the original ratio of the object when generating the exemplar

Hi,

When generating the exemplar for the tracking object, the code try to add 50% background as the context information while keep the original ratio of the object.

In my opinion, this strategy might be harmful especially when the tracking object ratio is very small or very big. (e.g. the pedestrians, whose ratios are around 0.4). In this situation, the strategy might involve too much background (might over 70% or more) into the exemplar. And then, the tracker might be pay more attention to the background instead of the tracking object, which might hurt the robustness of the tracker.

In my opinion, I prefer the brute-force strategy which is adopted in the R-CNN. It ignores the ratio of the object and just resizes the object to a fixed size, and then adds fixed size background as the context information.

Could you kindly give some comments for comparing these two strategies?

A little problem in save_crops.m

Hi, I meet a problem about padding. In save_crops.m the code calculate the padding coordinates about xmin, xmax, ymin, ymax. But there only considering the left_pad and top_pad. I think here should calculate right_pad, down_pad for xmax, ymax respectively. How do you think?

form save_crops.m 91row
%check out-of-bounds coordinates, and set them to avg_chans
context_xmin = round(pos(2) - c(2));
context_xmax = context_xmin + sz(2) - 1;
context_ymin = round(pos(1) - c(1));
context_ymax = context_ymin + sz(1) - 1;
left_pad = double(max(0, 1-context_xmin));
top_pad = double(max(0, 1-context_ymin));
right_pad = double(max(0, context_xmax - im_sz(2)));
bottom_pad = double(max(0, context_ymax - im_sz(1)));

context_xmin = context_xmin + left_pad;
context_xmax = context_xmax + left_pad;
context_ymin = context_ymin + top_pad;
context_ymax = context_ymax + top_pad;

Unable to obtain the full original ImageNet Video dataset (the 86 GB archive).

Hi,

I am trying to create my own dataset from the ImageNet Video Dataset mentioned. However, I don't seem to find a way to download that 86 GB archive on the ImageNet website as the 2015 competition is over and when I register, I could not seem to find the link to the 86GB archive.

May I ask that is there another place where I can get this 86GB archive or is this archive actually hosted somewhere on http://www.image-net.org?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.