visualcomputinginstitute / triplet-reid Goto Github PK

View Code? Open in Web Editor NEW

761.0 56.0 216.0 584 KB

Code for reproducing the results of our "In Defense of the Triplet Loss for Person Re-Identification" paper.

Home Page: https://arxiv.org/abs/1703.07737

License: MIT License

Python 98.31% Shell 1.69%

triplet-reid's Introduction

Triplet-based Person Re-Identification

Code for reproducing the results of our In Defense of the Triplet Loss for Person Re-Identification paper.

We provide the following things:

The exact pre-trained weights for the TriNet model as used in the paper, including some rudimentary example code for using it to compute embeddings. See section Pretrained models.
A clean re-implementation of the training code that can be used for training your own models/data. See section Training your own models.
A script for evaluation which computes the CMC and mAP of embeddings in an HDF5 ("new .mat") file. See section Evaluating embeddings.
A list of independent re-implementations.

If you use any of the provided code, please cite:

@article{HermansBeyer2017Arxiv,
  title       = {{In Defense of the Triplet Loss for Person Re-Identification}},
  author      = {Hermans*, Alexander and Beyer*, Lucas and Leibe, Bastian},
  journal     = {arXiv preprint arXiv:1703.07737},
  year        = {2017}
}

Pretrained TensorFlow models

For convenience, we provide the pretrained weights for our TriNet TensorFlow model, trained on Market-1501 using the code from this repository and the settings form our paper. The TensorFlow checkpoint can be downloaded in the release section.

Pretrained Theano models

We provide the exact TriNet model used in the paper, which was implemented in Theano and Lasagne.

As a first step, download either of these pre-trained models:

TriNet trained on MARS (md5sum: 72fafa2ee9aa3765f038d06e8dd8ef4b)
TriNet trained on Market1501 (md5sum: 5353f95d1489536129ec14638aded3c7)

Next, create a file (files.txt) which contains the full path to the image files you want to embed, one filename per line, like so:

/path/to/file1.png
/path/to/file2.jpg

Finally, run the trinet_embed.py script, passing both the above file and the pretrained model file you want to use, like so:

python trinet_embed.py files.txt /path/to/trinet-mars.npz

And it will output one comma-separated line for each file, containing the filename followed by the embedding, like so:

/path/to/file1.png,-1.234,5.678,...
/path/to/file2.jpg,9.876,-1.234,...

You could for example redirect it to a file for further processing:

python trinet_embed.py files.txt /path/to/trinet-market1501.npz >embeddings.csv

You can now do meaningful work by comparing these embeddings using the Euclidean distance, for example, try some K-means clustering!

A couple notes:

The script depends on Theano, Lasagne and OpenCV Python (pip install opencv-python) being correctly installed.
The input files should be crops of a full person standing upright, and they will be resized to 288x144 before being passed to the network.

Training your own models

If you want more flexibility, we now provide code for training your own models. This is not the code that was used in the paper (which became a unusable mess), but rather a clean re-implementation of it in TensorFlow, achieving about the same performance.

This repository requires at least version 1.4 of TensorFlow.
The TensorFlow code is Python 3 only and won't work in Python 2!

💥 🔥 ❗ If you train on a very different dataset, don't forget to tune the learning-rate and schedule ❗ 🔥 💥

If the dataset is much larger, or much smaller, you might need to train much longer or much shorter. Market1501, MARS (in tracklets) and DukeMTMC are all roughly similar in size, hence the same schedule works well for all. CARS196, for example, is much smaller and thus needs a much shorter schedule.

Defining a dataset

A dataset consists of two things:

An image_root folder which contains all images, possibly in sub-folders.
A dataset .csv file describing the dataset.

To create a dataset, you simply create a new .csv file for it of the following form:

identity,relative_path/to/image.jpg

Where the identity is also often called PID (Person IDentity) and corresponds to the "class name", it can be any arbitrary string, but should be the same for images belonging to the same identity.

The relative_path/to/image.jpg is relative to aforementioned image_root.

Training

Given the dataset file, and the image_root, you can already train a model. The minimal way of training a model is to just call train.py in the following way:

python train.py \
    --train_set data/market1501_train.csv \
    --image_root /absolute/image/root \
    --experiment_root ~/experiments/my_experiment

This will start training with all default parameters. We recommend writing a script file similar to market1501_train.sh where you define all kinds of parameters, it is highly recommended you tune hyperparameters such as net_input_{height,width}, learning_rate, decay_start_iteration, and many more. See the top of train.py for a list of all parameters.

As a convenience, we store all the parameters that were used for a run in experiment_root/args.json.

Pre-trained initialization

If you want to initialize the model using pre-trained weights, such as done for TriNet, you need to specify the location of the checkpoint file through --initial_checkpoint.

For most common models, you can download the checkpoints provided by Google here. For example, that's where we get our ResNet50 pre-trained weights from, and what you should pass as second parameter to market1501_train.sh.

Example training log

This is what a healthy training on Market1501 looks like, using the provided script:

The Histograms tab in tensorboard also shows some interesting logs.

Interrupting and resuming training

Since training can take quite a while, interrupting and resuming training is important. You can interrupt training at any time by hitting Ctrl+C or sending SIGINT (2) or SIGTERM (15) to the training process; it will finish the current batch, store the model and optimizer state, and then terminate cleanly. Because of the args.json file, you can later resume that run simply by running:

python train.py --experiment_root ~/experiments/my_experiment --resume

The last checkpoint is determined automatically by TensorFlow using the contents of the checkpoint file.

Performance issues

For some reason, current TensorFlow is known to have inconsistent performance and can sometimes become very slow. The current only known workaround is to install google's performance-tools and preload tcmalloc:

env LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 python train.py ...

This fixes the issues for us most of the time, but not always. If you know more, please open an issue and let us know!

Out of memory

The setup as described in the paper requires a high-end GPU with a lot of memory. If you don't have that, you can still train a model, but you should either use a smaller network, or adjust the batch-size, which itself also adjusts learning difficulty, which might change results.

The two arguments for playing with the batch-size are --batch_p which controls the number of distinct persons in a batch, and --batch_k which controls the number of pictures per person. We usually lower batch_p first.

Custom network architecture

TODO: Documentation. It's also pretty straightforward.

The core network

The network head

Computing embeddings

Given a trained net, one often wants to compute the embeddings of a set of pictures for further processing. This can be done with the embed.py script, which can also serve as inspiration for using a trained model in a larger program.

The following invocation computes the embeddings of the Market1501 query set using some network:

python embed.py \
    --experiment_root ~/experiments/my_experiment \
    --dataset data/market1501_query.csv \
    --filename market1501_query_embeddings.h5

The embeddings will be written into the HDF5 file at ~/experiments/my_experiment/test_embeddings.h5 as dataset embs. Most relevant settings are automatically loaded from the experiment's args.json file, but some can be overruled on the commandline.

If the training was performed using data augmentation (highly recommended), one can invest a some more time in the embedding step in order to compute augmented embeddings, which are usually more robust and perform better in downstream tasks.

The following is an example that computes extensively augmented embeddings:

python embed.py \
    --experiment_root ~/experiments/my_experiment \
    --dataset data/market1501_query.csv \
    --filename market1501_query_embeddings_augmented.h5 \
    --flip_augment \
    --crop_augment five \
    --aggregator mean

This will take 10 times longer, because we perform a total of 10 augmentations per image (2 flips times 5 crops). All individual embeddings will also be stored in the .h5 file, thus the disk-space also increases. One question is how the embeddings of the various augmentations should be combined. When training using the euclidean metric in the loss, simply taking the mean is what makes most sense, and also what the above invocation does through --aggregator mean. But if one for example trains a normalized embedding (by using a _normalize head for instance), The embeddings must be re-normalized after averaging, and so one should use --aggregator normalized_mean. The final combined embedding is again stored as embs in the .h5 file, as usual.

Evaluating embeddings

Once the embeddings have been generated, it is a good idea to compute CMC curves and mAP for evaluation. With only minor modifications, the embedding .h5 files can be used in the official Market1501 MATLAB evaluation code, which is exactly what we did for the paper.

For convenience, and to spite MATLAB, we also implemented our own evaluation code in Python. This code additionally depends on scikit-learn, and still uses TensorFlow only for re-using the same metric implementation as the training code, for consistency. We verified that it produces the exact same results as the reference implementation.

The following is an example of evaluating a Market1501 model, notice it takes a lot of parameters 😄:

./evaluate.py \
    --excluder market1501 \
    --query_dataset data/market1501_query.csv \
    --query_embeddings ~/experiments/my_experiment/market1501_query_embeddings.h5 \
    --gallery_dataset data/market1501_test.csv \
    --gallery_embeddings ~/experiments/my_experiment/market1501_test_embeddings.h5 \
    --metric euclidean \
    --filename ~/experiments/my_experiment/market1501_evaluation.json

The only thing that really needs explaining here is the excluder. For some datasets, especially multi-camera ones, one often excludes pictures of the query person from the gallery (for that one person) if it is taken from the same camera. This way, one gets more of a feeling for across-camera performance. Additionally, the Market1501 dataset contains some "junk" images in the gallery which should be ignored too. All this is taken care of by excluders. We provide one for the Market1501 dataset, and a diagonal one, which should be used where there is no such restriction, for example the Stanford Online Products dataset.

❗ Important evaluation NOTE ❗

The implementation of mAP computation has changed from sklearn v0.18 to v0.19. The implementation in v0.18 and earlier is exactly the same as in the official Market1501 MATLAB evaluation code, but is wrong. The implementation in v0.19 and later leads to a roughly one percentage point increase in mAP score. It is not correct to compare values across versions, and again, all values in our paper were computed by the official Market1501 MATLAB code. The evaluation code in this repository simply uses the scikit-learn code, and thus the score depends on which version of scikit-learn you are using. Unfortunately, almost no paper mentions which code-base they used and how they computed mAP scores, so comparison is difficult. Other frameworks have the same problem, but we expect many not to be aware of this.

We provide evaluation code that computes the mAP as done by the Market-1501 MATLAB evaluation script, independent of the scikit-learn version. This can be used by providing the --use_market_ap flag when running evaluate.py.

Independent re-implementations

These are the independent re-implementations of our paper that we are aware of, please send a pull-request to add more:

Open-ReID (PyTorch, MIT license)
https://github.com/huanghoujing/person-reid-triplet-loss-baseline (PyTorch, no license)
https://github.com/CoinCheung/triplet-reid-pytorch (PyTorch, Apache 2.0 license)

Not technically independent re-implementation, but open-sourced works which use this code in some way that we are aware of, and again pull-requests to add more are welcome:

https://github.com/VisualComputingInstitute/towards-reid-tracking is our own work exploring the integration of ReID and tracking (Code for the paper Towards a Principled Integration of Multi-Camera Re-Identification and Tracking through Optimal Bayes Filters)
https://github.com/cftang0827/human_recognition is a simple wrapper, combining this with an OpenCV detector. See also #47

triplet-reid's People

Contributors

Stargazers

Watchers

Forkers

yemika facegen dengshuo kirawxz 646677064 ml-lab yaokeepmoving december-boy hyzcn huixianglin deep0learning arasharchor amena6490 winper001 tpys gaojie0105 hsdong2012 zhyj3038 redhat12345 linranran ethanyhzhang xun-yang panxipeng daijucug zhfwyy sillytheif redskyymao irfanicmll zhangxujinsh trantorrepository choiyeren zouliangyu 3born sorenbouma n-mca smivv pilotbear kakoedlinnoeslovo hxl1990 xtanitfy morty2049 bnu-wangxun jianqiangq yichunher absorbguo lochappy ccrainy heiheiyy nirvanalan jackeywang777 mahfujau sunshinezhihuo rhapso crusader183 miracle-fmh pandoro maxisme cm-unit leeshd phoebe-star ked19 ims94 emad-monier zjc798719285 robinsonko wubin1836 samangel93 sunxz lucasb-eyer cupwater rjs211 wanjinchang wanggcong songbaihust garygaryry codegank apearlriverwater alexdoney johnhush liyuanyaun rahuja123 doriswzg sequenz-fang deecamp5 afcarl jereo meimeiainaonao statml ahuirecome amoliu arcelien swearos fendaq solomon1588 george-zhu wanglc2008 hariag cyfff kongyitian hitmit123

triplet-reid's Issues

Error while trying to use pretrained tensorflow model

Hey, thanks for the nice work!
I was trying to use the TF checkpoint (provided here ).
Steps used:

Create a ./files.txt in which each line is a path to an image to be embedded
Downloaded the checkpoint provided in Releases and put in './checkpoints/
Ran the following command:
python embed.py --experiment_root ./checkpoint/ --dataset ./files.txt --filename test_embeddings.h5
However, the following error comes:

raise IOError('args.json could not be found in: {}'.format(args_file))
OSError: args.json could not be found in: ./checkpoint/args.json

Could you please help on how to use the checkpoint of Market to extract features from images?

train for apparel embedding

Hi, I use this project for my apparel embedding project. Basically, the main for my project is to query similar clothes by an given cloth. I take the DeepFashion dataset as the train data. The positive samples for the triplet loss are the images under same sub directory, they are similar but not exactly the same. Now, I have already test several different structures for training, but the result is not satisfactory. To be specific, it sometimes leads to trivial solution, the embedding for each cloth is almost the same. And the active count never drop and it converges very fast at somehow high loss value. Is there any suggestion for training? Thanks in advance.

Problems wiht converting this Loss to a video-based re-id training

Hello, authors,

I'm trying to use the effective training strategies( like batch hard ) proposed in your paper to do some video-based re-id training. I noticed that in your work you use single frame as a training sample of a person. I want to use videos like N consecutive frames as the training sample of a person, however, I found it is hard to train and get a poor result.

So if this Triplet Loss is sensitive to image or video input?

low mAP as the low batch_p?

I have trained the Market1501 dataset, and only change the batch_p to 12 as the OOM of my GPU. The final evaluation result is truly low as below:
mAP: 47.96% | top-1: 65.53%
I have also evaluate the pretrained model and the result is close to the given result (mAP: 66.88% | top-1: 83.40%).
I have noticed that someone using batch_p to 18 still get the reasonable result. Is my result caused by the low batch_p?
"batch_k": 4, "batch_p": 12, "checkpoint_frequency": 1000, "crop_augment": false, "decay_start_iteration": 15000, "detailed_logs": false, "embedding_dim": 128, "experiment_root": "./experiments/my_experiment", "flip_augment": false, "head_name": "fc1024", "image_root": "/home/tensor-2/triplet-reid/Market-1501-v15.09.15", "initial_checkpoint": null, "learning_rate": 0.0003, "loading_threads": 8, "loss": "batch_hard", "margin": "soft", "metric": "euclidean", "model_name": "resnet_v1_50", "net_input_height": 256, "net_input_width": 128, "pre_crop_height": 288, "pre_crop_width": 144, "resume": false, "train_iterations": 25000, "train_set": "data/market1501_train.csv"

The training step seems normal except the convergency is later than given plot.

IndexError: boolean index did not match indexed array along dimension 1; dimension is 3368 but corresponding boolean dimension is 19732

when I followed the README.md to evaluate the embedings, i meet an error which is IndexError: boolean index did not match indexed array along dimension 1; dimension is 3368 but corresponding boolean dimension is 19732.
can you tell me how to solve it?
what's more, i am confused about these lines.

--gallery_embeddings and --query_embeddings are use the same path of h5 files?
thanks.

ValueError: Unable to configure handler 'stderr': bad argument type for built-in operation

I follow the README.md，but i can't run the train.py and
the error is ValueError: Unable to configure handler 'stderr': bad argument type for built-in operation

can you tell me how to use it? thank

Lunet (train from scatch)

Nice work. I wonder to know if it is possible to release the implementation of the proposed LuNet? Thanks.

Questions about the results of the experiment

Hi，whether data augmentation is used to get the result in your experiment?
such as Table 2 in your paper
Market-1501 SQ,
mAP :69.14%
rank-1:84.92%

why the same embedding result when two input image?

`img = cv2.imread('./test/004.jpg')
img = cv2.resize(img, (128,256))
x = np.expand_dims(img,axis=0)

sess = tf.Session()
saver = tf.train.import_meta_graph('market1501_weights/checkpoint-25000.meta')
saver.restore(sess, 'market1501_weights/checkpoint-25000')

input_x = sess.graph.get_tensor_by_name("sub:0")
emb_weights = sess.graph.get_tensor_by_name('emb/weights:0')
feature1 = sess.run(emb_weights, feed_dict={input_x: x})
print feature1`

I have tested the tensorflow code using pretrained model, but it outputs the same result when I tested it on two images. I don't know why. Are there some errors in my code?

Besides, I have tested the pytorch version, and the embedding shape is (1,2048) but your code is (1024, 128). To my knowledge, the traditional embedding features shape is like pytorch version. Is there some difference in both version?

Sensitivity to unclean datasets?

I've been doing some experiments with your batch hard triplet loss function and different architectures/datasets. On MARS I manage to reproduce the results from your paper (network seems to converge), but with many other datasets I get stuck at a loss of ~0.6931 which is softplus(0). Looking at the embedding it seems like the network starts to yield the same embeddings for all different classes.

Worth to know is that a center loss formulation works quite well for generating usable embeddings for these datasets, I've tried with me-celeb-1m (after cleaning it up), and with casia-webface.

My interpretation of these results is that the batch hard triplet loss function is really sensitive to mislabeled datasets, and it might get stuck in a local minima if the dataset contains mislabeled images. I've tried some hyperparameter tuning (e. g. changing lr and optimizer), but I haven't managed to avoid the local minimum.

Have you seen similar results in your work when experimenting with different datasets?

Performance on different dataset not good

Hello,
I used the pre-trained model given in this repository on my own image dataset to calculate embeddings and then calculated eucledian distance between the embeddings. Testing on same person with different view, postures, gives bad result(eucledian distance is high for some images and low for some). Link to sample images of a person i tested on https://drive.google.com/open?id=15QKkosOP6sRFB8xzJz8O36OXEvEfegeI.

NOTE:
I have used the pre-trained model given.

How to use pretrained tensorFlow models? (question)

Sorry to bother you, but I have no idea that the check point you listed here is about which net (resnet_v1, resnet_v1_50 or resnet_v1_100)? Thank you.

combine with re-ranking

Hi, Thanks for the code, I'm new to this subject.
In the paper, you combine with re-ranking. I add a little code to compute gallery-gallery distances, but error: OOM when allocating tensor with shape[19732,19732,128].
So could you teow you combine the code with re-ranking.
Thank you!

Data augmentation

Hello again,

I have just been looking through the project again and looking at ways to improve the accuracy of my model for facial recognition.

I am now looking at ways to generate more training data by performing some data augmentation. In your pipeline you implement an (optional) random flip and crop to the dataset but should you not append this to the dataset opposed to replacing?

Also since adding a few more images (2,469,064 cropped faces of 9491 people) my model has become a lot harder to convolve.

I realise this is kind of a broad ML question and not really todo with this repo but would love to get your input!

Thank you

Can't achieve high performance as listed in your paper.

Hi,
I carefully follow your paper, using a pre-trained ResNet50 and hard triplets mining in my implementation. And only got rank1 73% and map 54%. I have adjust many different hyper-parameters, and still cann't achieve the high performance as listed in your paper.

Is there any implementation details or tricks should I consider in my experiments?

Thanks.

Some error occured on computing embeddings

Hello!
In order to try the model out I trained the model on market-1501 on my laptop using all the default parameters, and for convenience I only went through 10 iterations(I use python3.5.4 and tensorflow1.8.0 without gpu). When I used the slightly-trained model to compute embeddings(also on market-1501), the cmd prompted an error like this: "AttributeError: 'str' object has no attribute 'get_shape' ". Could you tell me how to solve the problem? Thanks a lot.

Here is the screenshot of the error message:

Failed in training fine-grained categorization dataset CUB-200-2011

I use the same triplet loss (with BatchHard, Euclidean distance and Soft-margin) on the fine-grained categorization dataset CUB-200-2011. It aims to distinguish different species of birds (200 categorization, 5994 image in all). I know fine-grained categorization is a kind of classification task. But I want to see if it is possible to treat it as an image retrieval problem (Or Person reid).

However, when I use VGG16 (pre-trained on ImageNet) to extract features for images, and train the whole model with your triplet loss. It does not converge. All images' activations of conv5_3 are negative values, and the activations become 0 after the following relu layer. It outputs same features (the last fc layer) for different images at last.

I follow your instructions but use another dataset. The loss drops at first and then keep at 0.7. Nonzero triplets never decreases.

Problem with evauate.py

Thank you

Not working when testing on our own image dataset! [without training]

Hello,
I used the pre-trained model given in this repository on my own image dataset to calculate embeddings and then calculated co-sine similarity between the embeddings. Testing on same person with different view, postures, I am getting very less co-sine similarity. But, testing on person with different view of CUHK dataset, the co-sine similarity is very high.

NOTE:

I have used the pre-trained model given.
I have not performed any training on my dataset.

Question about evaluation

Hello, i wish to evaluate your pre-trained model with market dataset. I created the embendings of the test set in a csv file. How can i use the csv file to calculate mAP and rank-1 statistics? Do i have to convert the csv to h5? If yes then how? Thank you.

Want to try Margin Sample Mining Loss (MSML)

Hi @lucasb-eyer and @Pandoro , I want to try a different loss from arxiv.org/abs/1710.00478. According to my understanding, I only need to change the line

diff = furthest_positive - closest_negative

diff = tf.reduce_max(furthest_positive) - tf.reduce_min(closest_negative)

Is this understanding correct? Thanks in advance!

Issue with embed.py

ssur@suresure:~$ python3 /home/ssur/triplet-reid-master/embed.py --experiment_root /home/ssur/experiments/my_experiment --dataset /home/ssur/triplet-reid-master/data/market1501_query.csv --filename test_embeddings.h5
Loading args from /home/ssur/experiments/my_experiment/args.json.
Evaluating using the following parameters:
aggregator: None
batch_k: 4
batch_p: 32
batch_size: 256
checkpoint: None
checkpoint_frequency: 1000
crop_augment: None
dataset: /home/ssur/triplet-reid-master/data/market1501_query.csv
decay_start_iteration: 15000
detailed_logs: False
embedding_dim: 128
experiment_root: /home/ssur/experiments/my_experiment
filename: /home/ssur/experiments/my_experiment/test_embeddings.h5
flip_augment: False
head_name: fc1024
image_root: /home/ssur/Market-1501-v15.09.15
initial_checkpoint: None
learning_rate: 0.0003
loading_threads: 8
loss: batch_hard
margin: soft
metric: euclidean
model_name: resnet_v1_50
net_input_height: 256
net_input_width: 128
pre_crop_height: 288
pre_crop_width: 144
quiet: False
resume: False
train_iterations: 25000
train_set: /home/ssur/triplet-reid-master/data/market1501_train.csv
Traceback (most recent call last):
File "/home/ssur/triplet-reid-master/embed.py", line 249, in
main()
File "/home/ssur/triplet-reid-master/embed.py", line 160, in main
num_parallel_calls=args.loading_threads)
File "/home/ssur/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 853, in map
return ParallelMapDataset(self, map_func, num_parallel_calls)
File "/home/ssur/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1870, in init
super(ParallelMapDataset, self).init(input_dataset, map_func)
File "/home/ssur/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1839, in init
self._map_func.add_to_graph(ops.get_default_graph())
File "/home/ssur/.local/lib/python3.6/site-packages/tensorflow/python/framework/function.py", line 484, in add_to_graph
self._create_definition_if_needed()
File "/home/ssur/.local/lib/python3.6/site-packages/tensorflow/python/framework/function.py", line 319, in _create_definition_if_needed
self._create_definition_if_needed_impl()
File "/home/ssur/.local/lib/python3.6/site-packages/tensorflow/python/framework/function.py", line 336, in _create_definition_if_needed_impl
outputs = self._func(*inputs)
File "/home/ssur/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1827, in tf_map_func
ret, [t.get_shape() for t in nest.flatten(ret)])
File "/home/ssur/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1827, in
ret, [t.get_shape() for t in nest.flatten(ret)])
AttributeError: 'str' object has no attribute 'get_shape'

I am trying to re-implement the code. for the training, it went well. but when I am trying to test... I am getting a problem with this code embed.py ............. please help me out.

Thank you

args.json file not available in Market-1501 TF Checkpoint Release

Please provide the 'args.json' file which was used during training in Market-1501 TF Checkpoint Release, so that 'embed.py' fiile could be used for generating embeddings.

Questions on using Inception_ResNet_v1 and test accuracy

Hi, I am new to deep learning, and thus may not understand your paper fully, hope is all right with you. I tried to implement the batch_hard using Inception_resnet_v1 and trained from scratch using market1501 dataset. The rank 1 cmc is only about 70%. I did not implement re-ranking and augmented test. Do you think this model is able to get rank 1 cmc above 80%?

The second problem I faced was that the test results fluctuate a lot. The rank 1 value can range from 60% to 70%. Can you shade some lights on the test strategy? or point me to papers online? I am using 100 identities to verify the trained model

Thanks!

Code does not utilize GPU

After running your code I noticed that it does not run in GPU even after asking it explicitly with THEANO_FLAGS=device=cuda0 python trinet_embed.py

Could you please explain why this happens?
Thank you

Unable to approach loss of less than 0.7 even when testing multiple learning rates.

I have tried many different learning rates and optimizers but I have not once seen a min loss drop below 0.69.

If I use learning_rate = 1e-2:

iter:    20, loss min|avg|max: 0.713|2.607|60.013, batch-p@3: 4.43%, ETA: 6:01:18 (0.87s/it)
iter:    40, loss min|avg|max: 0.696|1.204|21.239, batch-p@3: 5.99%, ETA: 5:58:53 (0.86s/it)
iter:    60, loss min|avg|max: 0.696|1.643|25.543, batch-p@3: 4.69%, ETA: 5:36:32 (0.81s/it)
iter:    80, loss min|avg|max: 0.695|1.679|42.339, batch-p@3: 7.03%, ETA: 5:58:01 (0.86s/it)
iter:   100, loss min|avg|max: 0.694|1.806|47.572, batch-p@3: 6.51%, ETA: 6:08:57 (0.89s/it)
iter:   120, loss min|avg|max: 0.695|1.200|21.791, batch-p@3: 4.43%, ETA: 6:14:15 (0.90s/it)
iter:   140, loss min|avg|max: 0.694|2.744|87.940, batch-p@3: 5.47%, ETA: 6:21:29 (0.92s/it)

If I use learning_rate = 1e-6:

iter:    20, loss min|avg|max: 0.741|14.827|440.151, batch-p@3: 1.04%, ETA: 6:23:26 (0.92s/it)
iter:    40, loss min|avg|max: 0.712|9.662|146.125, batch-p@3: 2.86%, ETA: 6:03:24 (0.87s/it)
iter:    60, loss min|avg|max: 0.697|3.944|100.707, batch-p@3: 4.17%, ETA: 6:10:44 (0.89s/it)
iter:    80, loss min|avg|max: 0.695|2.408|75.002, batch-p@3: 2.86%, ETA: 5:44:48 (0.83s/it)
iter:   100, loss min|avg|max: 0.694|2.272|67.504, batch-p@3: 2.86%, ETA: 6:03:45 (0.88s/it)
iter:   120, loss min|avg|max: 0.694|1.091|17.292, batch-p@3: 2.86%, ETA: 5:42:45 (0.83s/it)
iter:   140, loss min|avg|max: 0.693|1.069|15.975, batch-p@3: 5.73%, ETA: 5:46:48 (0.84s/it)
...
iter:   900, loss min|avg|max: 0.693|0.694| 0.709, batch-p@3: 2.08%, ETA: 5:15:00 (0.78s/it)
iter:   920, loss min|avg|max: 0.693|0.693| 0.701, batch-p@3: 2.34%, ETA: 5:39:12 (0.85s/it)
iter:   940, loss min|avg|max: 0.693|0.694| 0.704, batch-p@3: 5.99%, ETA: 5:46:12 (0.86s/it)
iter:   960, loss min|avg|max: 0.693|0.693| 0.705, batch-p@3: 2.86%, ETA: 5:24:59 (0.81s/it)
iter:   980, loss min|avg|max: 0.693|0.693| 0.700, batch-p@3: 3.65%, ETA: 5:39:47 (0.85s/it)
iter:  1000, loss min|avg|max: 0.693|0.693| 0.698, batch-p@3: 3.39%, ETA: 5:27:59 (0.82s/it)
iter:  1020, loss min|avg|max: 0.693|0.693| 0.700, batch-p@3: 6.51%, ETA: 5:36:38 (0.84s/it)
iter:  1040, loss min|avg|max: 0.693|0.694| 0.699, batch-p@3: 2.86%, ETA: 5:22:05 (0.81s/it)
...
iter:  1640, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 2.60%, ETA: 5:09:58 (0.80s/it)
iter:  1660, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 2.08%, ETA: 5:48:27 (0.90s/it)
iter:  1680, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 4.43%, ETA: 5:23:23 (0.83s/it)
iter:  1700, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 6.51%, ETA: 5:25:04 (0.84s/it)
iter:  1720, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 3.12%, ETA: 5:39:08 (0.87s/it)

What does this affectively mean? "Nonzero triplets never decreases" - not quite sure what that means?

I am using the vgg dataset with the file structure like this:

class_a/file.jpg
class_b/file.jpg
class_c/file.jpg
...

I set the pids, fids = [], [] like this:

classes = [path for path in os.listdir(DATA_DIR) if os.path.isdir(os.path.join(DATA_DIR, path))]
for c in classes:
    for file in glob.glob(DATA_DIR+c+"/*.jpg"):
        pids.append(c)
        fids.append(file)

where DATA_DIR is the directory of the vgg dataset.

Problem with the Tensorflow version

Hello everybody,
I'm trying to run embed.py by following step by step the readme.txt but i'm always getting an error related to the tensorflow version, if i use tensorflow==0.12 i get this error
" File "embed.py", line 153, in main
dataset = tf.data.Dataset.from_tensor_slices(data_fids)
AttributeError: 'module' object has no attribute 'data' "
and i use tensorflow==1.5 i get this error
"AttributeError: 'str' object has no attribute 'get_shape' "
someone can help me please?

Performance not as good on MARS [soln: combine by avg]

Hi, we have been doing some experiments to reproduce your results on the Market1501 and MARS dataset, and when using the exactly the same hyperparameters and training strategy in your paper, we have successfully reproduce the results on the Market1501 dataset. However, we could not reproduce the result on MARS under the exactly same settings, and the rank1 CMC is only 75. Do you have any ideas on this?
Thanks!

what is tensorflow's pretrained checkpoint output node?

i want to use pre-trained model in deploy, so, i want to transfer checkpoint to pb file, but when i use emb/biases node, there only freeze this node to pb, so, which node is the output for object tracking inference

How could I use multi-gpu

It seems that the model only runs on a single gpu no matter how many gpus are available. If the space the model takes up is more than the volume of one gpu, there would be oom error. I can train the model on a single gpu with default configuration, but once I double the batch size and use two gpus, there is oom errors. How could I use multi-gpu in this case ?

Performance on CUHK03

Hello, authors.
I was wondering, if you could provide some extra details about training on CUHK03. There is third-party re-implementation of your work. This implementation shows almost the same performance on Market1501, according to their benchmarks they did not use Test-time data augmentation. However, your performance on chuk03 is a little bit far away from theirs. Why? Can Test-data augmentation influence final result that much? By the way, did you use only one GPU fro training?

Discussion on batch fetch strategy

In your TF codes, you first shuffle person ids and then repeat them forever. In training, you choose batch_p from the dataset according to the queue. For one person, you randomly choose batch_k examples each time. Am I right? For this situation, I have two questions:

The order of person ids is repeated. It means that each person will only be compared within batch_p=25 people around them. You know, some person are easier to identify. You just maximize the margins in this small group.
How about choosing examples of one person in a repeated way ? It enables every example to be trained repeated. I know that randomly choose first batch_k is theoretically OK. What about their differences on performance? Are them totally equivalent?

By the way, based on your codes, I try to implement a ResNet-50 fine-tuning baseline (just modify the last FC layer) for image classification (on CUB-200-2011). For testing, I feed data to the model in a normal ordered way. But the test accuracy (for classification) is only 22%. (It is supposed to be abound 81%). The training accuracy rises to 100% and the loss drops to 0.03 in 5000 iterations (about 100 epochs). Though training accuracy is of nonsense in this situation. What might be wrong? Is is due to the sampling strategy? Thank you.

some question about triplet selection and training logs

Thanks your work firstly.
I have some question want to ask you:
(1) In your paper 3.3 , your set batch size to 72 containing 18 persons with 4 images. I want to know whether you mean there are 72 triplet(if use batch hard) in one iteration during training and whether you update the network parameter when choose new batch in every iteration.
(2) In your supplementary material, you provided the typical training logs. I can't understand the Subfigure (b) (blue plots) in Figures 6, 7, 8, and 9. Whether the 0,5,50,95,and 100-percenties within a mini-batch stand for the diffenerent distance between anchor-positive and anchor-nagetive?

Looking forward to your reply.

soft-margin formula question

Hi,
I am new to this subject, in the paper the soft-margin formula is ln(1+exp(x)), I don't understand what the x stand for? how to caculate it ?
thank you very much !

Questions about the re-implementation of lifted loss.

Sorry it's not the key point of your paper.
After succesfully getting result of your batch hard loss function. I tried to check the result of lifted loss.
When I use the euclidean distance, the distance between negative pairs tends to keep increasing. Finally the distance between negative pairs is much larger than the margin. While the distance between positive pairs is not small enough.
When I use cosine similarity, I set the range of distance between 0 to 1. The problem I met is the final loss keeps at a high level. I tried to change the learning rate and momentum. But it didn't work.
I didn't use any hard mining process.

Do you have any idea which part of the settings is wrong?
Best wishes.

two questions L2norm and zero losses

I have replicated your papar : << Defense of Triplet Loss >> , on the Market1501 dataset by caffe. It has good performance just as your paper said.

As I can see, the Batch hard loss without softplus function, will be 0 mostly when at the last iterations. So I want ask do you have tried any other type of hard mining (In your discussion section : Notes on network training )? If you have done, I want to hear more detail about your experiments performance .

Secondly, I also tried to add a L2 norm layer for the embedding , the training is not stable and the result is very poor. I read your explanation about that, but I think it can not explain such phenomenon. Because as I know, some other types of metric learning losses have good performance with L2 norm such as << DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer >> . I want to ask whether you have any deeper thinking on such phenomenon.

What should I do if I want to train VIPeR

Hello, I want to train VIPeR, and set the batch-k to 2,but when I test, the top-1 is 25.33%. I want to know did you trained the VIPeR?Thank you

how to create dataset

Hi,I'm a new learner.I do not understand how to prepare dataset for train,eventhough I've read your instruction.I download market1501 dataset,and I create a folder named images to include all imgaes of dataset.Well,there are so many pictures in dataset,I don't understand how to create a new .csv file to describe dataset.I cloned all your project files in a local folder where I see .csv files in data folder.Thus,I pass the step.I got a problem as follows:
"PS F:\tf\ReID> python train.py --train_set data/market1501_train.csv --image_root images
--experiment_root experiments/my_experiment
Training using the following parameters:
batch_k: 4
batch_p: 32
checkpoint_frequency: 1000
crop_augment: False
decay_start_iteration: 15000
detailed_logs: False
embedding_dim: 128
experiment_root: experiments/my_experiment
flip_augment: False
head_name: fc1024
image_root: images
initial_checkpoint: None
learning_rate: 0.0003
loading_threads: 8
loss: batch_hard
margin: soft
metric: euclidean
model_name: resnet_v1_50
net_input_height: 256
net_input_width: 128
pre_crop_height: 288
pre_crop_width: 144
resume: False
train_iterations: 25000
train_set: data/market1501_train.csv
Traceback (most recent call last):
File "train.py", line 427, in
main()
File "train.py", line 226, in main
pids, fids = common.load_dataset(args.train_set, args.image_root)
File "F:\tf\ReID\common.py", line 132, in load_dataset
csv_file, image_root, missing_count, len(fids)))
OSError: Using the data/market1501_train.csv file and images as an image root 12936/12936 images are missing"
Please tell me how to do.Thanks for your time and your reply!

A simple api wrapper of triplet-reid

Hi, thanks for the excellent work, it helps me a lot.
Because I need to deploy the model in my project, so I made a simple wrapper to do human detection and human embedding by using your model.

here's the link
https://github.com/cftang0827/human_recognition

And I am also training the Mobilenet version, after I finished, I will release in my repository.
Thank you very much :)

Where is the models defined?

Good day,

Can I ask which part of the code is the training model defined? I would like to visualize it.

Also is labelling done as part of the loss function, or do you directly minimize the feature vector space?

How to train with gpu?

I run python train.py directly, and it takes a minute for each iteration. My host is equipped with a GTX980, but the gpu is not running.

Troubles replicating paper results

Hi, i tried to reproduce your results for my research and got poor performance. I've done the followng:

Download market-1501 dataset here http://www.liangzheng.org/Project/project_reid.html
Download resnet-50v1 from tf.slim models
Clone this repo and execute market1501_train.sh with specified paths and default parameters

In the end of the training i got around 0.15 rank-1 accuracy according to tensorboard logs, and around 0.84 rank-1 at 2000 iterations, see attached screenshot

Is there any chance that something wrong with default hyperparameters or with reimplementation code?
Thanks!

Weights

Hello, authors!
My PC's memory is too small, thus it's very hard for me to train the model. I want to know if you can share your model trained well. I'll appreciate it if you can provide the model weights file which is based on tensorflow.
Anyway, thanks!

The boundary value of judging whether two person is the same

Hello !
I'm now using the pre-trained model of Market1501.
My purpose is to judge whether a person can be found in a given dataset.
So I'm looking for the boundary value of distance to judge.
What's the recommended value of distance?

How to use in video?

Hello, the trained model how to use in video to Re-ID?

Extracting npz weights file after training own model

Hi,
I trained the resnet-v1-50 using the train.py given here. It produces checkpoint.index, checkpoint.meta and checkpoint.data files after completion of training. However, I wanted to have the weights file as .npz file similar to the weights file provided by you for the MARS and MARKET dataset so that I can use my model with the trinet_embed.py file. Can you please suggest how can I get the .npz file after my training?

Thanks in advance.

What's the proper setting of P and K

The expeirment is conducted based on the default setting of parameters.
Because of the limitation of memory, I can not test on P=32, and K=4.
I did the experiment based on P=25, K=4.
But looks like the loss can not decrease and the final result of mAP is less than 10%.

When test on K=2 , it works well. And result of mAP is about 50%.

Is it because some mistake I make? Since K = 4 is also the setting in your paper, do you have any idea why this happens?

Best wishes.

Is it true?

visualcomputinginstitute / triplet-reid Goto Github PK

triplet-reid's Introduction

Triplet-based Person Re-Identification

Pretrained TensorFlow models

Pretrained Theano models

Training your own models

Defining a dataset

Training

Pre-trained initialization

Example training log

Interrupting and resuming training

Performance issues

Out of memory

Custom network architecture

The core network

The network head

Computing embeddings

Evaluating embeddings

❗ Important evaluation NOTE ❗

Independent re-implementations

triplet-reid's People

Contributors

Stargazers

Watchers

Forkers

triplet-reid's Issues

Recommend Projects

Recommend Topics

Recommend Org