Git Product home page Git Product logo

deeplearningproject's People

Contributors

anshulbasia avatar biogeek avatar bobbleoxs avatar brandly avatar mel-jecker avatar mkilavuz avatar spandan-madan avatar tomraulet avatar vargas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplearningproject's Issues

Got IOError half way through learning

When I got to the last session on model textual, the model went through 5 epochs then threw this error:

IOError: [Errno 24] Too many open files

I went ahead trying to change $ulimit -n but realized the easiest way is to just change numb_workers to 4. It's arbitrary but someone suggested 4*num of GPU is a good approximation for num_workers.

It's not a specific issue per se but I think it may be beneficial for people to know this is one of the nuances in building ML pipeline which is not necessarily apparent.

Example binarized vector representation is syntactically incorrect

Now let's store the genre's for these movies in a list that we will later transform into a binarized vector.

Binarized vector representation is a very common and important way data is stored/represented in ML. Essentially, it's a way to reduce a categorical variable with n possible values to n binary indicator variables. What does that mean? For example, let [(1,3),(4)] be the list saying that sample A has two labels 1 and 3, and sample B has one label 4. For every sample, for every possible label, the representation is simply 1 if it has that label, and 0 if it doesn't have that label. So the binarized version of the above list will be -

> [(1,0,1,0]),
> (0,0,0,1])]

This section has output that contains a brackets mismatch syntax error. Not a huge problem, but probably a bit confusing for a beginner. Otherwise, great tutorial!

Would be nice to support Python 3.

Just to be clear - consider this as just a minor suggestion rather than a complaint.

Thank you for this tutorial! It's rare that people actually spend a lot of time to make a great free learning resource.

small correction cell no.18

It should be

the list() method of the Genres() class returns a listing of all genres in the form of a dictionary.

list_of_genres=genres.movie_list()['genres']

Thanks for the amazing tutorial .

Few things to look in *Deep Learning to extract visual features from posters* on Section 7

  1. You declared VGG model function and stored in variable 'model' and used variable 'model_viz' for training, which means you did not use VGG at all. You can check your model by typing 'print(model_viz.layers)'. If you struggle to fix this issue, I can help you with this section if you add me as an author.
  2. It is important to show how well your model is trained. I would recommend plotting curves of loss and accuracy with history instance returned from 'model.fit()' function or a confusion matrix from predictions to show false positives and vice versa.

Possible wrong syntax

In [26]: # Create a tmdb genre object!
genres=tmdb.Genres()
the list() method of the Genres() class returns a listing of all genres in the form of a dictionary.
list_of_genres=genres.list()['genres']

The above segment throws this error:
Create a tmdb genre object!
genres=tmdb.Genres()
the list() method of the Genres() class returns a listing of all genres in the form of a dictionary.
list_of_genres=genres.list()['genres']

I apologize if this is a trivial issue. I'm new to Python. It'll be great if someone can help me resolve this. T

Dependency Issue on Windows

Tried to setup with the .yml file which was aborted.
Manual installation of the requested packages led to an error: tensorflow on Windows is only supported in 64-bit Python 3.5. Updating python raises depency errors for functools32 and subprocess32, which only run with Python 2.7.
So based on my limited knowledge: there is no way of setting up the environment on Windows. Or am I missing something?

Help !

Hi am new to ML, can i start with this tutorial ?
or where i have to start ? and how to start?
thanks in advance

Possible points of confusion and typos

Points of confusion

  • This section uses f as the generalized function and g as the exact function, whereas before f was exact and g was generalized. This has the potential to confuse readers.
  • On In [51] and In [52], id is assigned a value but does not seem to be used
  • On the section after Out [62] it says that the shape of Y is 1666,20 but the output of print Y.shape is (1595, 20). Where does the 1666 come from?

Typos

  • In the last sentence of the first paragraph of the same section, "listen to" should be changed to "watch"
  • in the last paragraph before In [68] (this section) "vocabular" should be "vocabulary"
  • In the first paragraph of this section, "can only integer values" should be "can only be integer values"
  • In the second item of the first list in this section, "difference models" should be "different models"

"That" vs "Which" grammatical error.

Kudos on a very well done writeup. I have a simple grammatical correction ... in many cases, you have used 'which' in place of 'that'.

See http://www.writersdigest.com/online-editor/which-vs-that

If 'which' is used to describe something, and is not preceded by a comma, it is a likely candidate for the confusion.

For example,
'use the available data to learn a function which can' ==> 'use the available data to learn a function that can'

Varibles undefined when run the scripts

In section 7, when extract VGG features for scraped images.
In the for loop where try and except block located, the varible 'imname' was not declared, may be change like the following:

for mov in poster_movies:
    i+=1
    mov_name=mov['original_title']
    mov_name1=mov_name.replace(':','/')
    poster_name=mov_name.replace(' ','_')+'.jpg'
    if poster_name in imnames:
        img_path=poster_folder+poster_name
        try:
            img = image.load_img(img_path, target_size=(224, 224))
            succesful_files.append(imname) # **imname undefined , change to poster_name ?**
            x = image.img_to_array(img)
            x = np.expand_dims(x, axis=0)
            x = preprocess_input(x)
            features = model.predict(x)
            file_order.append(img_path)
            feature_list.append(features)
            genre_list.append(mov['genre_ids'])
            if np.max(np.asarray(feature_list))==0.0:
                print('problematic',i)
            if i%250==0 or i==1:
               print "Working on Image : ",i
        except Exception,e:
            print Exception,":",e   # **for debuging**
            failed_files.append(imname) # **imname undefined , change to poster_name ?**
            continue
    else:
        continue

help

in [41] cell, when I am executing I am getting the following error:

HTTPError Traceback (most recent call last)
in ()
17 url += '&with_genres=' + str(g_id) + '&page=' + str(page)
18
---> 19 data = urllib2.urlopen(url).read()
20
21 dataDict = json.loads(data)

/home/shouvik/anaconda3/envs/deeplearningproject/lib/python2.7/urllib2.pyc in urlopen(url, data, timeout, cafile, capath, cadefault, context)
152 else:
153 opener = _opener
--> 154 return opener.open(url, data, timeout)
155
156 def install_opener(opener):

/home/shouvik/anaconda3/envs/deeplearningproject/lib/python2.7/urllib2.pyc in open(self, fullurl, data, timeout)
433 for processor in self.process_response.get(protocol, []):
434 meth = getattr(processor, meth_name)
--> 435 response = meth(req, response)
436
437 return response

/home/shouvik/anaconda3/envs/deeplearningproject/lib/python2.7/urllib2.pyc in http_response(self, request, response)
546 if not (200 <= code < 300):
547 response = self.parent.error(
--> 548 'http', request, response, code, msg, hdrs)
549
550 return response

/home/shouvik/anaconda3/envs/deeplearningproject/lib/python2.7/urllib2.pyc in error(self, proto, *args)
471 if http_err:
472 args = (dict, 'default', 'http_error_default') + orig_args
--> 473 return self._call_chain(*args)
474
475 # XXX probably also want an abstract factory that knows when it makes

/home/shouvik/anaconda3/envs/deeplearningproject/lib/python2.7/urllib2.pyc in _call_chain(self, chain, kind, meth_name, *args)
405 func = getattr(handler, meth_name)
406
--> 407 result = func(*args)
408 if result is not None:
409 return result

/home/shouvik/anaconda3/envs/deeplearningproject/lib/python2.7/urllib2.pyc in http_error_default(self, req, fp, code, msg, hdrs)
554 class HTTPDefaultErrorHandler(BaseHandler):
555 def http_error_default(self, req, fp, code, msg, hdrs):
--> 556 raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
557
558 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 401: Unauthorized


help appreciated.
Thanks

TMDB API key

Great walkthrough!

One recommendation is to replace your actual TMDB API key with a placeholder. That way no one can abuse your account via your API key.

P.S. Super nitpicky, but in that same block, I think the Jupyter step should read In [5]:

Cut out warnings from imports due to numpy ufunc and dtype sizes

Nice jobs with the notebooks- On block 2, if you'd like to get rid of the RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 warnings, the you can just add at the bottom of the block:

import warnings
warnings.filterwarnings("ignore", message="numpy.dtype size changed")
warnings.filterwarnings("ignore", message="numpy.ufunc size changed")

Tutorial is broken? Low recall/precision in TF results

Hey Spandan,

Looks like something has changed in the data or model, the TF precision and recall in the final runs are very low (.2 or so) might need.to future proof this a bit more against changes to the TMDB or IMDB apis

TMDB Genre list() changed to movie_list()

In Section 3 when looking at returning Genres from TMDB the instructions state to use the .list() method of the Genre object returned by tmdb.Genres().

There has been an update to the API and list() no longer exists. There are separate lists for movies, tv, etc. Currently the function we're looking for is movie_list(), which returns the list of movie genres.

Finding words that are most predictive of a genre

Hi, This was an extremely useful document, and I learnt a lot from the tutorial. An interesting extension to the problem would be to identify the words in the synopses that most distinguish a genre from other genres in the model - I have an analogous task in my project.

Is there a way to find the words that are most predictive of a genre? For example, is there a way to identify that the words ‘battle’, ‘challenge’ and ‘fight’ (for example) are the most predictive of a movie falling into the ‘Action’ category, based on the model we trained? i.e. which are the words (in the synopsis) that most prominently indicate that the synopsis would fall under a particular genre? (Using the model we have fit).
This basically translates to decoding the algorithm to find out how it works “under the hood.” - what features (words?) it uses "under the hood" to classify a synopsis into a genre.

A solution I found online is in the code snippet below - Using the classifier coefficients from clf.coef_ (clf is the name of the model I fit) and picking the top 10 words (which the model uses to distinguish/identify a genre based on a given text).

def print_top10(vectorizer, clf, class_labels):
"""Prints features with the highest coefficient values, per class"""
feature_names = vectorizer.get_feature_names()
for i, class_label in enumerate(class_labels):
top10 = np.argsort(clf.coef_[i])[-10:]
print("%s: %s" % (class_label,
" ".join(feature_names[j] for j in top10)))

Please let me know if this is appropriate and if there is a better way of doing this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.