Git Product home page Git Product logo

show_attend_and_tell.tensorflow's Introduction

Neural Caption Generator with Attention

Code

  • make_flickr_dataset.py: Extracts conv5_3 layer activations of VGG Network for flickr30k images, and save them in 'data/feats.npy'
  • model_tensorflow.py: Main codes

Usage

  • Download flickr30k Dataset.
  • Extract VGG conv5_3 features using make_flickr_dataset.py
  • Train: run train() in model_tensorflow.py
  • Test: run test() in model_tensorflow.py

alt tag

show_attend_and_tell.tensorflow's People

Contributors

jazzsaxmafia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

show_attend_and_tell.tensorflow's Issues

train out or not?

Have somebody train out a good result with the built model in this repo?Trying to recurrent the result in this repo, until now the result is not satisfactory......

Cannot initialize the CNN in the cnn_util.py file

  1. I downloaded the VGG_19 model from here: jcjohnson/neural-style#202
  2. Changed vgg_model and vgg_deploy path in make_flickr_dataset.py
  3. Run make_flickr_dataset.py, it stucks at
    cnn = CNN(model=vgg_model, deploy=vgg_deploy, width=224, height=224)

Any suggestion? Thanks so much!!!

It seems like some problem about LMDB...

/home/alex/anaconda2/lib/python2.7/site-packages/matplotlib/font_manager.py:273:
UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
WARNING: Readline services not available or not loaded.
WARNING: The auto-indent feature requires the readline library
Backend Qt4Agg is interactive backend. Turning interactive mode on.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1003 00:49:16.097299 11216 upgrade_proto.cpp:52] Attempting to upgrade input file specified using deprecated V1LayerParameter: /home/alex/caffe/models/vgg/VGG_ILSVRC_19_layers_deploy.prototxt
I1003 00:49:16.097414 11216 upgrade_proto.cpp:60] Successfully upgraded file specified using deprecated V1LayerParameter
I1003 00:49:16.097470 11216 net.cpp:313] The NetState phase (1) differed from the phase (0) specified by a rule in layer data
I1003 00:49:16.097595 11216 net.cpp:49] Initializing net from parameters:
...
I1003 00:49:16.097708 11216 layer_factory.hpp:77] Creating layer data
I1003 00:49:16.098646 11216 net.cpp:91] Creating Layer data
I1003 00:49:16.098654 11216 net.cpp:399] data -> data
I1003 00:49:16.098672 11216 net.cpp:399] data -> label
F1003 00:49:16.099346 11248 db_lmdb.hpp:15] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory
*** Check failure stack trace: ***

About state = tf.zeros([self.batch_size, self.lstm.state_size])

In model.py, both in function build_model(self) and build_generator(self, maxlen), there is a line state = tf.zeros([self.batch_size, self.lstm.state_size]).
However, this line cannot run (I am running on tf 0.12). Since self.lstm.state_size is a tuple, while here i believe it just needs an int.

IndexError: index 0 is out of bounds for axis 1 with size 0

File "/home/hzhou/anaconda3/envs/py35/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
runfile('/home/hzhou/code_More/show_attend_and_tell_p1_2/make_flickr_dataset.py', wdir='/home/hzhou/code_More/show_attend_and_tell_p1_2')
File "/home/hzhou/zhoueheng/pycharm-2018.2.4/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/home/hzhou/zhoueheng/pycharm-2018.2.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/hzhou/code_More/show_attend_and_tell_p1_2/make_flickr_dataset.py", line 30, in
feats = cnn.get_features(unique_images, layers='conv5_3', layer_sizes=[512,14,14])
File "/home/hzhou/code_More/show_attend_and_tell_p1_2/cnn_util.py", line 72, in get_features
caffe_in = np.zeros(np.array(image_batch.shape)[[0,3,1,2]], dtype=np.float32)
IndexError: index 0 is out of bounds for axis 1 with size 0

model-8

hi,@jazzsaxmafia , I just don't know where to find 'model-8', and the issue is

"InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on ./model/model-8: Not found: ./model"

Why is the gradient vanishing?

Thank you very much for sharing. I want to use the attention model to do video classification, but there is always a gradient vanishing during the training. Have you had a similar problem before?

lstm 관련

안녕하세요.
사소한 것이기는 하지만 lstm 부분에서 혹시 123줄에
h = o * tf.nn.tanh(new_c)가 아니라
h = o * tf.nn.tanh(c) 여야 하지 않나 해서 이슈를 남겨봅니다.

그리고 이건 제가 이해를 제대로 못한건지 모르겠는데...
build_model과 build_generator에서 lstm 각 step에서 context_encode를 계속 더해나가는게 맞는건가요?
for문 바깥에서 정의한 context_encode만 더하는게 아니라 각 for문의 step에서 계속 더하는게 맞는지..
제가 이해하기론 for문 바깥에서 정의한 context_encode를 각 스텝에서 더하는 것이었는데,
확실하지가 않아서 이렇게 질문을 남겨봅니다.

감사합니다!

How to see attended images?

i want to see attended images, while tests model.
(The model changes its attention to the relevant part of the image while it generates each word.)

how can i see attended images for testing?

a lots of weight matrix

in the function of build_model, I saw a lots of weight mtarix, for example image_att_W, hidden_att_W, att_W, image_encode_W and so on, I don't know why.
in my opion, the lstm has 2 weight matrix, w for input and u for hidden status, so I write code in the for loop like this:

context_encode = input * w + b
context_encode += h * u
context_encode = tanh(context_encode)

but what's that about alpha = tf.matmul(context_encode_flat, self.att_W) + self.att_b, and in line 110 again softmax function, and in line 114 another weight matrix image_encode_W

TypeError: int() argument must be a string, a bytes-like object or a number, not 'map'

ERROR LOG:

C:\Python35\python.exe C:/MainProject/show_attend_and_tell/model_tensorflow.py
Using TensorFlow backend.
preprocessing word counts and creating vocab based on word count threshold 30
filtered words from 20326 to 2942
Traceback (most recent call last):
  File "C:/MainProject/show_attend_and_tell/model_tensorflow.py", line 334, in <module>
    train()
  File "C:/MainProject/show_attend_and_tell/model_tensorflow.py", line 255, in train
    n_lstm_steps=int(maxlen)+1, # w1~wN까지 예측한 뒤 마지막에 '.'예측해야하니까 +1
TypeError: int() argument must be a string, a bytes-like object or a number, not 'map'

If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True


Process finished with exit code 1

Can anyone please tell me how to continue from here ?

A question about parameters update

In the following code, it seems that the parameter 'c' is never used.
`

        lstm_preactive = tf.matmul(h, self.lstm_U) + x_t + tf.matmul(weighted_context,self.image_encode_W)

        i, f, o, new_c = tf.split(1, 4, lstm_preactive)

        i = tf.nn.sigmoid(i)
        f = tf.nn.sigmoid(f)
        o = tf.nn.sigmoid(o)
        new_c = tf.nn.tanh(new_c)
        c = f * c + i * new_c
        h = o * tf.nn.tanh(new_c)`

Why the parameter 'h' depends on 'new_c' rather than 'c'?
In my opinion, i think the updating procedures should be
c(t) = f(t) * c(t−1) + i(t) * new_c(t)
h(t) = o(t) * tanh(c(t))

what's is flickr30k/1000092795.jpg ?

envy@ub1404:/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow$ ll flickr30k
total 222645
drwx------ 1 envy envy 0 3月 12 22:32 ./
drwx------ 1 envy envy 4096 3月 12 23:02 ../
-rw------- 1 envy envy 38318553 11月 25 2014 dataset.json
-rw------- 1 envy envy 236 11月 27 2014 readme.txt
-rw------- 1 envy envy 189659502 11月 25 2014 vgg_feats.mat

envy@ub1404:/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow$ PYTHONPATH=/home/envy/os_pri/github/caffe/python python make_flickr_dataset.py

I0312 23:06:40.975996 5711 net.cpp:228] relu1_1 does not need backward computation.
I0312 23:06:40.976006 5711 net.cpp:228] conv1_1 does not need backward computation.
I0312 23:06:40.976017 5711 net.cpp:270] This network produces output prob
I0312 23:06:40.976057 5711 net.cpp:283] Network initialization done.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
I0312 23:06:42.111446 5711 upgrade_proto.cpp:51] Attempting to upgrade input file specified using deprecated V1LayerParameter: VGG_ILSVRC_19_layers.caffemodel
I0312 23:06:42.520989 5711 upgrade_proto.cpp:59] Successfully upgraded file specified using deprecated V1LayerParameter
Traceback (most recent call last):
File "make_flickr_dataset.py", line 28, in
feats = cnn.get_features(unique_images, layers='conv5_3', layer_sizes=[512,14,14])
File "/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow/taeksoo/cnn_util.py", line 69, in get_features
image_batch = np.array(map(lambda x: crop_image(x, target_width=self.width, target_height=self.height), image_batch_file))
File "/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow/taeksoo/cnn_util.py", line 69, in
image_batch = np.array(map(lambda x: crop_image(x, target_width=self.width, target_height=self.height), image_batch_file))
File "/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow/taeksoo/cnn_util.py", line 7, in crop_image
image = skimage.img_as_float(skimage.io.imread(x)).astype(np.float32)
File "/home/envy/.local/lib/python2.7/site-packages/skimage/io/_io.py", line 100, in imread
img = call_plugin('imread', fname, plugin=plugin, *_plugin_args)
File "/home/envy/.local/lib/python2.7/site-packages/skimage/io/manage_plugins.py", line 207, in call_plugin
return func(_args, **kwargs)
File "/home/envy/.local/lib/python2.7/site-packages/skimage/io/_plugins/pil_plugin.py", line 46, in imread
im = Image.open(fname)
File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1996, in open
fp = builtins.open(fp, "rb")
IOError: [Errno 2] No such file or directory: 'flickr30k/1000092795.jpg'
envy@ub1404:/media/envy/data1t/os_prj/github/show_attend_and_tell.tensorflow$

Bugs report and problems

@jazzsaxmafia
Have you trained out a good model from your code ?

There might be a bug:
In the function of build_model and build_generator in model_tensorflow.py
h = o * tf.nn.tanh(new_c) should be replaced by
h = o * tf.nn.tanh(c)

Another problem is about context_encode. Is that same with the original code ?
Moreover, I think data should be shuffled for each epoch. The code seems only shuffle the data once.

ValueError when save feat.npy

Hello,

Thanks for the great project, I try to reproduce the result, but when I run the make_flickr_dataset file. I encounter a ValueError when save the feature: around 30G requested and 10G written. (ValueError: 3189487617 requested and 129452450 written)
Have you encounter some errors like this. And is your feat.npy around 30G?

Thank you very much!

what's the file model-8?

when I run train() in model_tensorflow.py, there would be a DataLossError: unable to find model-8.
so what's the model-8? a pre-trained model of tensorflow? How can i get it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.