License: BSD 2-Clause "Simplified" License

Python 12.93% Jupyter Notebook 87.07%

show_attend_and_tell.tensorflow's Introduction

Neural Caption Generator with Attention

Tensorflow implementation of "Show, attend and Tell" http://arxiv.org/abs/1502.03044
Borrowed most of the idea from the author's source code https://github.com/kelvinxu/arctic-captions

Code

make_flickr_dataset.py: Extracts conv5_3 layer activations of VGG Network for flickr30k images, and save them in 'data/feats.npy'
model_tensorflow.py: Main codes

Usage

Download flickr30k Dataset.
Extract VGG conv5_3 features using make_flickr_dataset.py
Train: run train() in model_tensorflow.py
Test: run test() in model_tensorflow.py

show_attend_and_tell.tensorflow's People

Contributors

Stargazers

Watchers

Forkers

krasin syrenity getbioinfo zmoon111 ml-lab yjy765 johnkorn kdjyss yys0718 omar-florez pei-jie wanjinchang jevenwill tonydeep mkoloberdin biprajiman athenspeterlong yangjunpro sujinzhao bullud deartonym koosyong rutgershan ziyu-zhang yoavz meteora9479 vijethbv paramsingh96-zz yaokaichun jhy1993 uincore carjun vyraun theolivenbaum giering bityangke cynsithia bodidze longcw runngezhang fancyerii bx5974 chln tngamemo kevinwenya apkrepo s4sarath rahulmirdha louk78 hzy-zg zy86603465 zgsxwsdxg shaoxuan92 banben yanceyzhangdl showly guanlongtianzi anejatanu34 zxsted dongzhuoyao russellcloud bikong2 ml365 xiaobaoer sasa33 gaolizhao waynesuzq fuleying liean scholltan chengmuni66 iqbal-chowdhury liangxi627 problemtryer th4nos dimplesl 401466399 lsheiba vish25v sharonzhu newzhx shenxin008 vanpersie32 jeffrey1hu xunan0812 fanbenchao melody-xiaomi hainan89 yeyuel chaoweiwu qwzhong1988 tianshuaifei hi-zhengcheng cosecant-csc shubhampachori12110095 zhaoyuhang denisenricohasyim93 topgunforone zhangyang5511 gridl

show_attend_and_tell.tensorflow's Issues

train out or not?

Have somebody train out a good result with the built model in this repo?Trying to recurrent the result in this repo， until now the result is not satisfactory......

Cannot initialize the CNN in the cnn_util.py file

I downloaded the VGG_19 model from here: jcjohnson/neural-style#202
Changed vgg_model and vgg_deploy path in make_flickr_dataset.py
Run make_flickr_dataset.py, it stucks at
cnn = CNN(model=vgg_model, deploy=vgg_deploy, width=224, height=224)

Any suggestion? Thanks so much!!!

It seems like some problem about LMDB...

/home/alex/anaconda2/lib/python2.7/site-packages/matplotlib/font_manager.py:273:
UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
WARNING: Readline services not available or not loaded.
WARNING: The auto-indent feature requires the readline library
Backend Qt4Agg is interactive backend. Turning interactive mode on.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1003 00:49:16.097299 11216 upgrade_proto.cpp:52] Attempting to upgrade input file specified using deprecated V1LayerParameter: /home/alex/caffe/models/vgg/VGG_ILSVRC_19_layers_deploy.prototxt
I1003 00:49:16.097414 11216 upgrade_proto.cpp:60] Successfully upgraded file specified using deprecated V1LayerParameter
I1003 00:49:16.097470 11216 net.cpp:313] The NetState phase (1) differed from the phase (0) specified by a rule in layer data
I1003 00:49:16.097595 11216 net.cpp:49] Initializing net from parameters:
...
I1003 00:49:16.097708 11216 layer_factory.hpp:77] Creating layer data
I1003 00:49:16.098646 11216 net.cpp:91] Creating Layer data
I1003 00:49:16.098654 11216 net.cpp:399] data -> data
I1003 00:49:16.098672 11216 net.cpp:399] data -> label
F1003 00:49:16.099346 11248 db_lmdb.hpp:15] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory
*** Check failure stack trace: ***

How to generate the npy files of other pictures?

I want to train and test in my dataset, and I have trained in my dataset and got the model. How do I generate files(guitar.npy) when testing?

About state = tf.zeros([self.batch_size, self.lstm.state_size])

In model.py, both in function build_model(self) and build_generator(self, maxlen), there is a line state = tf.zeros([self.batch_size, self.lstm.state_size]).
However, this line cannot run (I am running on tf 0.12). Since self.lstm.state_size is a tuple, while here i believe it just needs an int.

IndexError: index 0 is out of bounds for axis 1 with size 0

File "/home/hzhou/anaconda3/envs/py35/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
runfile('/home/hzhou/code_More/show_attend_and_tell_p1_2/make_flickr_dataset.py', wdir='/home/hzhou/code_More/show_attend_and_tell_p1_2')
File "/home/hzhou/zhoueheng/pycharm-2018.2.4/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/home/hzhou/zhoueheng/pycharm-2018.2.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/hzhou/code_More/show_attend_and_tell_p1_2/make_flickr_dataset.py", line 30, in
feats = cnn.get_features(unique_images, layers='conv5_3', layer_sizes=[512,14,14])
File "/home/hzhou/code_More/show_attend_and_tell_p1_2/cnn_util.py", line 72, in get_features
caffe_in = np.zeros(np.array(image_batch.shape)[[0,3,1,2]], dtype=np.float32)
IndexError: index 0 is out of bounds for axis 1 with size 0

model-8

hi,@jazzsaxmafia , I just don't know where to find 'model-8', and the issue is

"InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on ./model/model-8: Not found: ./model"

Why is the gradient vanishing？

Thank you very much for sharing. I want to use the attention model to do video classification, but there is always a gradient vanishing during the training. Have you had a similar problem before?

lstm 관련

안녕하세요.
사소한 것이기는 하지만 lstm 부분에서 혹시 123줄에
h = o * tf.nn.tanh(new_c)가 아니라
h = o * tf.nn.tanh(c) 여야 하지 않나 해서 이슈를 남겨봅니다.

그리고 이건 제가 이해를 제대로 못한건지 모르겠는데...
build_model과 build_generator에서 lstm 각 step에서 context_encode를 계속 더해나가는게 맞는건가요?
for문 바깥에서 정의한 context_encode만 더하는게 아니라 각 for문의 step에서 계속 더하는게 맞는지..
제가 이해하기론 for문 바깥에서 정의한 context_encode를 각 스텝에서 더하는 것이었는데,
확실하지가 않아서 이렇게 질문을 남겨봅니다.

감사합니다!

Result on flickr30k?

Do you have quantitative results on flickr30k? BLEU, METEOR etc.

How to see attended images?

i want to see attended images, while tests model.
(The model changes its attention to the relevant part of the image while it generates each word.)

how can i see attended images for testing?

a lots of weight matrix

in the function of build_model, I saw a lots of weight mtarix, for example image_att_W, hidden_att_W, att_W, image_encode_W and so on, I don't know why.
in my opion, the lstm has 2 weight matrix, w for input and u for hidden status, so I write code in the for loop like this:

context_encode = input * w + b
context_encode += h * u
context_encode = tanh(context_encode)

but what's that about alpha = tf.matmul(context_encode_flat, self.att_W) + self.att_b, and in line 110 again softmax function, and in line 114 another weight matrix image_encode_W

What is Prerequisites ?(python version,tensorflow version..)

which version of python and tensorflow should i use?
thanks~~

TypeError: int() argument must be a string, a bytes-like object or a number, not 'map'

ERROR LOG:

C:\Python35\python.exe C:/MainProject/show_attend_and_tell/model_tensorflow.py
Using TensorFlow backend.
preprocessing word counts and creating vocab based on word count threshold 30
filtered words from 20326 to 2942
Traceback (most recent call last):
  File "C:/MainProject/show_attend_and_tell/model_tensorflow.py", line 334, in <module>
    train()
  File "C:/MainProject/show_attend_and_tell/model_tensorflow.py", line 255, in train
    n_lstm_steps=int(maxlen)+1, # w1~wN까지 예측한 뒤 마지막에 '.'예측해야하니까 +1
TypeError: int() argument must be a string, a bytes-like object or a number, not 'map'

If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True


Process finished with exit code 1

Can anyone please tell me how to continue from here ?

can anyone please share the model-8 file and annotations.pickle files ?

I am trying to test the model. So I want model-8 and annotations.pickle file. Can anyone please share the files ?

A question about parameters update

In the following code, it seems that the parameter 'c' is never used.
`

        lstm_preactive = tf.matmul(h, self.lstm_U) + x_t + tf.matmul(weighted_context,self.image_encode_W)

        i, f, o, new_c = tf.split(1, 4, lstm_preactive)

        i = tf.nn.sigmoid(i)
        f = tf.nn.sigmoid(f)
        o = tf.nn.sigmoid(o)
        new_c = tf.nn.tanh(new_c)
        c = f * c + i * new_c
        h = o * tf.nn.tanh(new_c)`

Why the parameter 'h' depends on 'new_c' rather than 'c'?
In my opinion, i think the updating procedures should be
c(t) = f(t) * c(t−1) + i(t) * new_c(t)
h(t) = o(t) * tanh(c(t))

what's is flickr30k/1000092795.jpg ?

envy@ub1404:/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow$ ll flickr30k
total 222645
drwx------ 1 envy envy 0 3月 12 22:32 ./
drwx------ 1 envy envy 4096 3月 12 23:02 ../
-rw------- 1 envy envy 38318553 11月 25 2014 dataset.json
-rw------- 1 envy envy 236 11月 27 2014 readme.txt
-rw------- 1 envy envy 189659502 11月 25 2014 vgg_feats.mat

envy@ub1404:/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow$ PYTHONPATH=/home/envy/os_pri/github/caffe/python python make_flickr_dataset.py

I0312 23:06:40.975996 5711 net.cpp:228] relu1_1 does not need backward computation.
I0312 23:06:40.976006 5711 net.cpp:228] conv1_1 does not need backward computation.
I0312 23:06:40.976017 5711 net.cpp:270] This network produces output prob
I0312 23:06:40.976057 5711 net.cpp:283] Network initialization done.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
I0312 23:06:42.111446 5711 upgrade_proto.cpp:51] Attempting to upgrade input file specified using deprecated V1LayerParameter: VGG_ILSVRC_19_layers.caffemodel
I0312 23:06:42.520989 5711 upgrade_proto.cpp:59] Successfully upgraded file specified using deprecated V1LayerParameter
Traceback (most recent call last):
File "make_flickr_dataset.py", line 28, in
feats = cnn.get_features(unique_images, layers='conv5_3', layer_sizes=[512,14,14])
File "/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow/taeksoo/cnn_util.py", line 69, in get_features
image_batch = np.array(map(lambda x: crop_image(x, target_width=self.width, target_height=self.height), image_batch_file))
File "/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow/taeksoo/cnn_util.py", line 69, in
image_batch = np.array(map(lambda x: crop_image(x, target_width=self.width, target_height=self.height), image_batch_file))
File "/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow/taeksoo/cnn_util.py", line 7, in crop_image
image = skimage.img_as_float(skimage.io.imread(x)).astype(np.float32)
File "/home/envy/.local/lib/python2.7/site-packages/skimage/io/_io.py", line 100, in imread
img = call_plugin('imread', fname, plugin=plugin, *_plugin_args)
File "/home/envy/.local/lib/python2.7/site-packages/skimage/io/manage_plugins.py", line 207, in call_plugin
return func(_args, **kwargs)
File "/home/envy/.local/lib/python2.7/site-packages/skimage/io/_plugins/pil_plugin.py", line 46, in imread
im = Image.open(fname)
File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1996, in open
fp = builtins.open(fp, "rb")
IOError: [Errno 2] No such file or directory: 'flickr30k/1000092795.jpg'
envy@ub1404:/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow$

택수님 feats추출하는데 문제발생햇어요

make flickr dataset을 돌렸는데 ignoring ~~ 여러줄 뜨고나서 터미널이 멈춘 상태에요

feats 추출이 더 이상 진행되지 않는데 어떻게 해야하나요?

Bugs report and problems

@jazzsaxmafia
Have you trained out a good model from your code ?

There might be a bug:
In the function of build_model and build_generator in model_tensorflow.py
h = o * tf.nn.tanh(new_c) should be replaced by
h = o * tf.nn.tanh(c)

Another problem is about context_encode. Is that same with the original code ?
Moreover, I think data should be shuffled for each epoch. The code seems only shuffle the data once.

what is the pretrained model in model_tensorflow.py

Hi, i find there need pretrained model in model_tensorflow.py, but i am not sure what does this means.
Could anyone tell me what is this?
Thanks!

how to generate attend.jpg?

I want to also generate attend.jpg for my input image. How to do ?

ValueError when save feat.npy

Hello,

Thanks for the great project, I try to reproduce the result, but when I run the make_flickr_dataset file. I encounter a ValueError when save the feature: around 30G requested and 10G written. (ValueError: 3189487617 requested and 129452450 written)
Have you encounter some errors like this. And is your feat.npy around 30G?

Thank you very much!

Thanks~

jazzsaxmafia / show_attend_and_tell.tensorflow Goto Github PK