Task :Recognising book genre- based on goodreads dataset. We will use different features-including text features(description and title), image features(Book covers) and other numerical features(such as number of pages, ratings and more) in order to recognise a book genre.
(updates were made from danielaneuralx which is my working github but its all mine.)
- eda_goodreads.ipynb (include exploration of dataset)
- inference_goodreads.ipynb (learning models)
Report: 701projectreport_bookgenre.pdf
Project structure:
- root
- data
- images-source(Directory-don't delete -this holds dataset images. It Exists in github)
- 1.jpg (image file)
- 2.jpg (image file)
- ..(230K images)
- books_images_names.csv
- goodreads_imagestxt.txt
- images-train (Directory-This directory and content will be created)
- images-val(Directory-This directory and content will be created)
- images-test(Directory-This directory and content will be created)
- goodreads_books_eng_f1.csv (Dont delete-This is the first dataset csv)
- goodreads_books_eng_f2.csv (Dont delete-This is the second dataset csv)
- images-source(Directory-don't delete -this holds dataset images. It Exists in github)
- goodreads (package)
- init.py
- baseline.py
- conv_goodreads.py
- custom_nn_with_embeddings
- results_utils.py
- utils.py
- configuration.yml (Very important -dont delete)= this holds hyperparameters configuration and general parameters
- eda_goodreads.ipynb
- inference_goodreads.ipynb
- data
Needs packages:
- tensorflow
- matplotlib
- sckit-learn
- os
- pyyaml
- numpy
- pandas
- tesnorflow-addons (for F1 metric)
- nltk
- gensim
- spacy
- nltk
- pickle
- shutil
should run : python -m spacy download en_core_web_md
Please for any problem or question-find me at [email protected]