yule-li / cosface Goto Github PK
View Code? Open in Web Editor NEWTensorflow implementation for paper CosFace: Large Margin Cosine Loss for Deep Face Recognition
Tensorflow implementation for paper CosFace: Large Margin Cosine Loss for Deep Face Recognition
Hi I am a newbie, can you help me how to train and test my own dataset using cosface
I download the preprocessed dataset of CASIA-WebFace-112X96, however I didn't get the list_file for training. Could you provide me the it, or give some methods?
When I use four GPUs to training cosface model, exception occurs:
ValueError: Variable conv1_/conv2d/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
File "networks/sphere_network.py", line 49, in first_conv
network = tf.layers.conv2d(input, num_output, kernel_size = [3, 3], strides = (2, 2), padding = 'same', kernel_initializer = xavier, bias_initializer = zero_init, kernel_regularizer = l2_regularizer, bias_regularizer = l2_regularizer)
File "networks/sphere_network.py", line 14, in infer
network = first_conv(input, 64, name = 'conv1')
File "train/train_multi_gpu.py", line 197, in main
prelogits = network.infer(batch_image_split[i], args.embedding_size)
prelogits = network.infer(batch_image_split[i],args.embedding_size) construct graph for every GPU and there is no resue setting. It should be modified to that :
with tf.variable_scope(name_or_scope='', reuse=tf.AUTO_REUSE):
prelogits = network.infer(batch_image_split[i],args.embedding_size)
Hi,I have a question, what is the difference as follows:
#implemented by py_func
#value = tf.identity(xw)
#substract the marigin and scale it
value = coco_func(xw_norm,y,alpha) * scale
#implemented by tf api
#margin_xw_norm = xw_norm - alpha
#label_onehot = tf.one_hot(y,num_cls)
#value = scale*tf.where(tf.equal(label_onehot,1), margin_xw_norm, xw_norm)
Implemented by py_func is more complex,you should calculate grad by youself ,and implemented by tf api is simple,only 3 lines. And have any difference between the two?
Hi
Thanks for sharing Your work , I am working on a linux machine, so i'm finding it difficult to download the model and data. I was hoping if someone could please upload the model/data to google drive or dropbox.
Thanks in advance.
hi, first excellent job! but I have some problems.
image = tf.subtract(image,127.5)
image = tf.div(image,128.)
However, when testing, you use prewhiten function
def prewhiten(x):
mean = np.mean(x)
std = np.std(x)
std_adj = np.maximum(std, 1.0/np.sqrt(x.size))
y = np.multiply(np.subtract(x, mean), 1/std_adj)
return y
It is a little different from above! Will these two different operations of image preprocessing degrades the performance when testing?
model-20180309-083949.ckpt-60000 ? model-20180626-205832.ckpt-60000 ?
and how to choose the parameters ?
$ python test.py
usage: test.py [-h] [--network_type NETWORK_TYPE] [--fc_bn] [--prewhiten]
[--save_model SAVE_MODEL] [--do_flip DO_FLIP]
[--lfw_batch_size LFW_BATCH_SIZE]
[--embedding_size EMBEDDING_SIZE] [--image_size IMAGE_SIZE]
[--image_height IMAGE_HEIGHT] [--image_width IMAGE_WIDTH]
[--lfw_pairs LFW_PAIRS] [--lfw_file_ext {jpg,png}]
[--lfw_nrof_folds LFW_NROF_FOLDS] [--model_def MODEL_DEF]
lfw_dir model
test.py: error: too few arguments
When I debug "main(parse_arguments(sys.argv[1:]))"
NotFoundError (see above for traceback): Key resnet_v2_50/block3/unit_2/bottleneck_v2/conv3/weights not found in checkpoint
[[Node: save/RestoreV2_145 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_145/tensor_names, save/RestoreV2_145/shape_and_slices)]]
Could someone tell me who to slove the error.
I can test your upload model in the test.sh.
but i retrain the model by train.sh, i found i could not test in my correct model path. So i want to know your model in which structures...
--network_type sphere_network in the test.sh mean i should train the model by #NETWORK=sphere_network right? Thanks very much~
Thank you so much for this project. I might miss but I could not see the license declaration of this project. What is the license of cosface? Could you add this information in read me and put a dedicated license file in the root of the repo.
hi, when I am training on the webface, I find that the loss cannot decrease.
My network is sphere network and the loss is softmax.
Can anyone tell me the loss when convergence and how many epochs you trained?
Thanks!
python test.py
2019-05-08 08:59:41.657635: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-05-08 08:59:41.751877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-05-08 08:59:41.752518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce 930M major: 5 minor: 0 memoryClockRate(GHz): 0.941
pciBusID: 0000:04:00.0
totalMemory: 1.96GiB freeMemory: 1.76GiB
2019-05-08 08:59:41.752549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-05-08 08:59:42.368344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-08 08:59:42.368376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-05-08 08:59:42.368400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-05-08 08:59:42.368564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1518 MB memory) -> physical GPU (device: 0, name: GeForce 930M, pci bus id: 0000:04:00.0, compute capability: 5.0)
Skipped 6000 image pairs
image size 224
Traceback (most recent call last):
File "test.py", line 158, in
main(parse_arguments(sys.argv[1:]))
File "test.py", line 49, in main
with slim.arg_scope(resnet_v2.resnet_arg_scope(False)):
File "/home/wlc/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_utils.py", line 257, in resnet_arg_scope
weights_regularizer=regularizers.l2_regularizer(weight_decay),
File "/home/wlc/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/regularizers.py", line 92, in l2_regularizer
raise ValueError('scale cannot be an integer: %s' % (scale,))
ValueError: scale cannot be an integer: False
hi, in the paper it stated that "In the testing phase, the testing data is fed into CosFace to extract face features which are later used to compute the cosine similarity score to perform face verification and identification", but it seems like you are using euclidean distance in the test.py. Is there anything i got it wrong?
AssertionError: file ../CASIA-WebFace-112X96/0000045/001.jpg not exist
can you please include a header to explain cleaned_list.txt?
Allocator (GPU_0_bfc) ran out of memory trying to allocate 12.0 kiB (rounded to 12288) as requested by op tower_0/l2_normalize_1.
I tried to reduce batch_size, max_nrof_epochs, images_per_person, etc. Nothing works. Can anyone help me?
The results of my files are particularly bad. I hope you can provide your cleaned_list.txt file, thank you very much.
hello, thanks for sharing this code. But you haven't given the pairs.txt file! Could you please share it?
Hello,
When I try to restore the model from Google Drive link with named model-20180626-205832.ckpt-60000 it got the following error.
Unable to open table file models\model-20180626-205832.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
tensorflow: 1.9.0
python: 3.5.5
I think that tensorflow requires .data, .index and .meta files in the same directory. However, google drive link just downloads "model-20180626-205832.ckpt-60000.data-00000-of-00001"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.