Git Product home page Git Product logo

neuraltalk's Introduction

I like deep neural nets.

neuraltalk's People

Contributors

alyxb avatar ericzeiberg avatar huyouare avatar karpathy avatar simov8 avatar vanessad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neuraltalk's Issues

How to generate json for new data?

Hi Andrej,
I was interested in using your algorithm for some new data. Basically, each images is associated with one sentence. Is there a convenient way to generate the json file as in your example (Flickr8k, etc). What is the structure of the json, and is there anyway to not using json format?

Thanks!
Wei

Confusion about an equation in the paper

Thanks for making the code public. This is a great work!
This issue is not about the code, but I feel a little confused about the 11th equation in the paper, since the relevant code is not available.
eq
What does i indicates in the above equation? and does t refer to an index of an image fragment or a sentence fragment? Also maybe maximizing this term makes more sense? I would really appreciate it if you can point out my misunderstandings here. Thanks!

R Kelly

For extra street credit, please adopt a R Kelly "real talk" meme photo in the Readme

MRFs for text segment alignments

Hi Andrej,
Thank you very much for open sourcing the code!
You paper talks about MRFs for decoding text segment alignments to images, but I couldn't find any code related to that. Am I missing something?

Thanks
Pradeep.

Problems in gradient check

Hi,

When I try to run the gradient check, for Ws, the gradient check prints "VAL SMALL WARNING". I have printed the numerical gradients and analytical gradients in this case, and find that the numerical gradients are exactly zero, and analytical gradients are in the order of e-12.

I am confused about that, since the numerical gradients are zero, that means some words are not in the batch, so, changing its value will not affect the cost (in grad_check, we add delta to the word vectors). However, the analytical gradients are not zero, that means these words actually appear in the batch, and these word vectors are updated.

Why will this happen?

Thanks.

Transfer Learning with word2vec?

Hi Andrej & Fei-fei,
I've been playing around with this and reading through the code -- many thanks for making it wonderful code to read! I was under the impression that it used pretrained word vector embeddings from Mikolov et al:
Image of mikolove ref slide

....but I don't see any evidence in the code where these vectors are loaded in. Are the word embeddings learned from scratch or are they in fact initialized in some way?

Many thanks!
chris moody

multi-bleu.perl

Hi,

Is this the same script with the Moses's multi-bleu.perl? I've seen that there are some modifications to the original version. I've been investigating that why my baseline model's (Google NIC with VGG-E) BLEU-2-3-4 performance is really low but what I've found is we are not using the same evaluation scripts. I know that this task is different than machine translation task, though. So, my questions are,

  • What's the intention behind the BLEU evaluation script modification?
  • Is all captioning people evaluate their models with this approach?

Thanks in advance.

Bounding box

Hello Andrej,

Great work!

Is it possible to get the bounding box associated with words? Or is that part of the alignment/retrieval model?

Thanks!

Have you implemented Visual-Semantic Alignments ?

Thanks for your kindness to release these codes!
It helps me a lot!
I am interested in your cvpr paper : Deep Visual-Semantic Alignments for Generating Image Descriptions. But I did not found anything about Visual-Semantic Alignments in this released code, have I missed something ? thanks !

eval_sentence_predictions.py: error: too few arguments

~/tf/neuraltalk-master$ python eval_sentence_predictions.py
usage: eval_sentence_predictions.py [-h] [-b BEAM_SIZE]
[--result_struct_filename RESULT_STRUCT_FILENAME]
[-m MAX_IMAGES] [-d DUMP_FOLDER]
checkpoint_path
eval_sentence_predictions.py: error: too few arguments

when i run this script i got this error,is checkpoint path error or others?thank you.

Why optimizing the Ws matrix directly?

Other approaches like [Show and Tell] use a We matrix for word embedding which optimize the We , But in neuraltalk I found that it direcly optimize the Ws in which each raw represent a word. So Why do this way? or which way performs better?

Running On Raw Images

How exactly would I go about getting a trained models predicition on an image (in some raw format) that I have?

How can i use this code to train regions & snippets RNN model?

In this code, i only find how to use images and the images description sentences to train a multimodal RNN. But i don't see any founctions about how to use the regions & snippets to train the model.Just like the figure 5 or part 4.3 in the paper.
How can i train my own model? How can i get the result just like the figure 5?

predict_on_images.py error

usage: predict_on_images.py [-h] [-r ROOT_PATH] [-b BEAM_SIZE] checkpoint_path
predict_on_images.py: error: the following arguments are required: checkpoint_path
An exception has occurred, use %tb to see the full traceback.

this error happened. what should i do?

Use for sentence input to sentence output

Would it be possible to use this code to accept a sentence input, and output the most likely sentence, in order to sustain dialogue, instead of a picture input and sentence output? I believe there is a paper on this. Sorry this is not an issue, didn't know where to comment.

Question about usage of RCNN

Hello, I recently read your paper, and very much appreciate about you sharing your codes here.

By the way, on your paper it is indicated that you first extracted top regions of obtained by RCNN and then get the CNN features, however I do not see that object detection part in your implementation. Either in training and test phase, it seems not using object detection functionality. Is it because it still works fine using the holistic image?

Thank you.

training over new dataset

I am training it over new dataset. I am getting this error in save checkpoint
36/1850 batch done in 2.356s. at epoch 0.97. loss cost = 9.295156, reg cost = 0.000000, ppl2 = 4.59 (smooth 14.32)
evaluating val performance in batches of 100
Traceback (most recent call last):
File "driver.py", line 315, in
main(params)
File "driver.py", line 232, in main
val_ppl2 = eval_split('val', dp, model, params, misc) # perform the evaluation on VAL set
File "/root/neuraltalk/imagernn/imagernn_utils.py", line 38, in eval_split
ppl2 = 2 ** (logppl / logppln)
ZeroDivisionError: integer division or modulo by zero

py_caffe_feat_extract

I think the bicubic implementation is of some problem.

The output image contains some obvious artifacts if you visualize it.
It's definitely not same as Matlab's imresize nor Opencv's resize(Inter_cubic).

I guess the vgg_feats.mat inside examples_images was produced by this function.
The results made by py_caffe_feat_extract were also slightly different with the ones made by opencv's resize(cubic).
Hope some one could fix the bug of the bicubic implementation some day.

Thanks a lot.

Maybe a mistake in lstm_generator.py

In the lstm_generator.py, line 71 Hin[t,1:1+d] = X[t] and 72 Hin[t,1+d:] = prev should be exchanged.
Because the hidden size is d, which is the dimension of the prev.
But i don't why it doesn't raise an error, anyone can explain this?

Best hyperparameters for RNN model

Hi,

When I try to train the RNN model, the performance is quite poor with default parameters since the default values are tuned for LSTM.

So, could you please share the tuned hyperparameters for RNN model?

Thanks.

multiple hosts

Hi Andrej,

I really love this implementation.
The most intriguing part to me is your monitorcv to visualize the cross-validation. It could help a lot during training.

In the code, I found it could show up-to-40 results with different host names, but my computer has only one hostname (using python gethostname).
I bet it's my lack of related knowledge.
I guess we could run on separate hosts (with different parameters or models) using the same computer, right?

Could you please give some instructions on how to do so?

Thank you so much.
Best,
-Ethan

list index out of range error

I created coco_sample directory containing the following files.

  • COCO_val2014_000000463825.jpg
  • model_checkpoint_coco_visionlab43.stanford.edu_lstm_11.14.p (from here)
  • tasks.txt (containing one line COCO_val2014_000000463825.jpg)
  • vgg_feats.mat (from here)

I ran the following command.

python predict_on_images.py coco_sample/model_checkpoint_coco_visionlab43.stanford.edu_lstm_11.14.p -r coco_sample

I got an error message as below.

parsed parameters:
{
"beam_size": 1,
"checkpoint_path": "coco_sample/model_checkpoint_coco_visionlab43.stanford.edu_lstm_11.14.p",
"root_path": "coco_sample"
}
loading checkpoint coco_sample/model_checkpoint_coco_visionlab43.stanford.edu_lstm_11.14.p
image 0/123287:
/home/ec2-user/neuraltalk/imagernn/lstm_generator.py:227: RuntimeWarning: overflow encountered in exp
IFOGf[t,:3_d] = 1.0/(1.0+np.exp(-IFOG[t,:3_d]))
PRED: (-14.587771) a man and a woman sitting on a bench in the middle of a park
image 1/123287:
Traceback (most recent call last):
File "predict_on_images.py", line 109, in
main(params)
File "predict_on_images.py", line 66, in main
img['local_file_path'] =img_names[n]
IndexError: list index out of range

Isn't it possible to run predict_on_images.py on a few images?

Encountered runtime warning while computing logistic function

@karpathy Thanks for open sourcing your image-to-sentences work. I got the code up & running with the Flickr30K dataset but encountered a runtime warning
" RuntimeWarning: overflow encountered in exp"

I have fixed it locally by using scipy.special.expit function. I have attached the patch below in case you want to "cherry-pick' my commit. Let me know if this patch is useful to you and whether you'd like me to make a PR with a fix:

From d3b8d3401a7ebeae1aff88538f1f5eff440b31cf Mon Sep 17 00:00:00 2001
From: Vimal Thilak
Date: Wed, 3 Dec 2014 15:16:28 -0800
Subject: [PATCH] [bugfix] Fix overflow runtime warning

  • Warning encountered in logistic function computation

Signed-off-by: Vimal Thilak

imagernn/lstm_generator.py | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/imagernn/lstm_generator.py b/imagernn/lstm_generator.py
index 011e333..af6797f 100644
--- a/imagernn/lstm_generator.py
+++ b/imagernn/lstm_generator.py
@@ -1,5 +1,6 @@
import numpy as np
import code
+import scipy.special

from imagernn.utils import initw

@@ -75,7 +76,7 @@ class LSTMGenerator:
IFOG[t] = Hin[t].dot(WLSTM)

   # non-linearities
  •  IFOGf[t,:3_d] = 1.0/(1.0+np.exp(-IFOG[t,:3_d])) # sigmoids; these are the gates
    
  •  IFOGf[t,:3*d] = scipy.special.expit(IFOG[t, :3*d])  #1.0/(1.0+np.exp(-IFOG[t,:3*d])) # sigmoids; these are the gates
    

    IFOGf[t,3_d:] = np.tanh(IFOG[t, 3_d:]) # tanh

    compute the cell activation

    @@ -224,7 +225,7 @@ class LSTMGenerator:
    C = np.zeros((1, d))
    Hout = np.zeros((1, d))
    IFOG[t] = Hin[t].dot(WLSTM)

  •  IFOGf[t,:3_d] = 1.0/(1.0+np.exp(-IFOG[t,:3_d]))
    
  •  IFOGf[t,:3_d] = scipy.special.expit(-IFOG[t,:3_d])  # 1.0/(1.0+np.exp(-IFOG[t,:3_d]))
    

    IFOGf[t,3_d:] = np.tanh(IFOG[t, 3_d:])
    C[t] = IFOGf[t,:d] * IFOGf[t, 3_d:] + IFOGf[t,d:2*d] * c_prev
    if tanhC_version:

    2.0.1

CAFFE API error

When I tried to run the python scripts python_features/extract_features.py today, I met with a problem as follow:

Traceback (most recent call last):
  File "./extract_features.py", line 102, in <module>
    net = caffe.Net(args.model_def, args.model)
Boost.Python.ArgumentError: Python argument types in
    Net.__init__(Net, str, str)
did not match C++ signature:
    __init__(boost::python::api::object, std::string, std::string, int)
    __init__(boost::python::api::object, std::string, int)

Then I search this error on the Internet, and I find a same issue in caffe's issue page: Caffe#1905. I think it's an error caused by the update of Caffe's API.
So I change the code in extract_features.py#101 as: net = caffe.Net(args.model_def, args.model, caffe.TEST). It worked, but a new problem came out:

Traceback (most recent call last):
  File "./extract_features.py", line 102, in <module>
    caffe.set_phase_test()
AttributeError: 'module' object has no attribute 'set_phase_test'

I think the reason is that some APIs in python_features/extract_features.py are too old.

Size of Descriptive Sentences

Hi Andrej,

Is there a limit to the size of the descriptive sentences? Has it been tried with multiple sentences each describing different features of the image? For example, if an image had a descriptor "A dog in a park. A kite in the sky." could it generate two sentences if the training data was in a similar format? OR is it better to split the descriptive sentences into several single sentence examples and show the same image for each (ie. image A: dog in a park, image A: kite in the sky).

Also, is the matlab feature extractor GPU enabled?

Thanks!

Incorrect prediction while testing.

When I am evaluating and predicting on the datasets called example_images given by you after training flickr8k images, I get all the wrong outputs. For each of the images, the prediction is incorrect. Why is this happening?

question about dropout implementation

Hi Andrej,

I have been learning a ton about RNNs and their implementation from looking through your code. I have a (perhaps silly) question about your dropout implementation. You claim that your code creates a mask that drops a fraction, drop_prob, of the units and then scales the remaining units by 1/(1-drop_prob). This doesn't seem correct to me since you are sampling using np.random.randn, which seems to sample from a normal distribution of mean 0 and variance 1.

For example, if you set drop_prob=1 (and ignore the fact that this makes your scale factor infinite) then you should be dropping all the units, but in reality you will be testing the boolean condition np.random.randn(some_shape)<(1-drop_prob). Since np.random.rand gives you negative values half the time (on average) you will only drop half the units (on average).

It seems like you want to be sampling from a uniform distribution from 0 to 1 in order for this to work properly.

Best,
Sam

predict_on_images.py: error: too few arguments

Hello..
Thanks for the code and the very helpful read me files..
I tried to call the predict_on_images.py on the examples folder you supported but got this error
C:\neuraltalk-master>python predict_on_images.py
usage: predict_on_images.py [-h] [-r ROOT_PATH]
predict_on_images.py: error: too few arguments

I would appreciate any help ...

Regards

Aborting, cost seems to be exploding.

training with flickr8k aborts:

253/15000 batch done in 5.037s. at epoch 0.84. loss cost = 37.447347, reg cost = 0.000001, ppl2 = 26.10 (smooth 48.09)
254/15000 batch done in 5.082s. at epoch 0.85. loss cost = 39.408169, reg cost = 0.000001, ppl2 = 29.19 (smooth 47.91)
255/15000 batch done in 4.914s. at epoch 0.85. loss cost = 140.730310, reg cost = 0.000001, ppl2 = 237360.65 (smooth 2421.03)
Aboring, cost seems to be exploding. Run gradcheck? Lower the learning rate?

image captioning

hi, i like to work on image captioning and i used a novel approach for image segmentation, and now i like to use these segmented image as a preprocessing step for image captioning, can u help me to give me an idea for my next step to do it? and if its possible may i have matlab codes for captioning?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.