Git Product home page Git Product logo

book_recommender's Introduction

Book Recommender

Exploratory Data Analysis + Data Visualization + Modelling

1 - Abstract

In this project I made Exploratory Data Analysis, Data Visualisation and lastly Modelling. Dataset contains 11123 rows in csv file. Each example row represent a book with 12 different information. Before modelling part I have to check NaN values and make some small adjustment for easy to use of the dataset and merge couple of languages on 1 language(en-AUS,en-UK to eng). Later I made couple of visualization to understand the dataset better. In modelling part, I used unsupervised learning algorithm K-means which is grouping unlabelled data. For deciding number of cluster I used Elbow method and decided to do 5 clusters. Finally, I test my model with several books and add input function for searching easily.

2 - Data

Dataset contains 12 columns and 11123 rows.

Columns Description:

  • bookID = contains the unique ID for each book/series
  • title = contains the titles of the books
  • authors = contains the author of the particular book
  • average_rating = the average rating of the books, as decided by the users
  • ISBN ISBN(10) = number, tells the information about a book - such as edition and publisher
  • ISBN 13 = the new format for ISBN, implemented in 2007. 13 digits
  • language_code = tells the language for the books
  • Num_pages = contains the number of pages for the book
  • Ratings_count = contains the number of ratings given for the book
  • text_reviews_count = has the count of reviews left by users
  • publication_date = date of publication
  • publisher = name of the publisher

3 - Exploratory Data Analysis

In EDA I visualize language distribution, Top 20 authors with number of books, Top 20 highest rated books, and Average rating distribution for all books.

Secondly, I create list for my favorite authors and visualize their books according to average rating of books.

authors = ['Gabriel García Márquez', 'Jack London', 'George Orwell', 'Jules Verne', 'Richard P. Feynman']

After all these steps, I wanted to investigate the relationship of columns. As you can see below, Average Rating and Number of Pages, Average Rating and Reviews Counts,Rating Counts and Average Ratings

4 - Modelling

In the modelling part, I already decide to use K-Means Algorithm but I have to decide how many should I use. For deciding this I used Elbow Method which is giving very good assumption. In the figure below you can see the graph.

After deciding 5 clusters, I created plotting and expressing clusters.

Lastly I implemented Min-max scaler, for reducing bias. Because some books has massive amount of features and some of them very few. So, Min-Max scaler will find the median all books.

5 - Result & Future Work

print_similar_books("Caesar (Masters of Rome  #5)")
  • The Metaphysical Club
  • One Hundred Years of Solitude
  • Alice's Adventures in Wonderland and Through the Looking-Glass (Alice's Adventures in Wonderland #1-2)
  • In Cold Blood
  • Desperation / The Regulators: Box Set
print_similar_books("Lord of the Flies")
  • Introduction to the Philosophy of History with Selections from The Philosophy of Right
  • Marie Dancing
  • The Odyssey
  • The Hour Before Dark
  • A Philosophical Enquiry into the Origin of our Ideas of the Sublime and Beautiful

As a result, my book recommender gives good results. But still there are more rooms to improvement. Such as, finding category of each books makes everything more effective. Or increasing size of data or information(more rows) can help more accurate recommendations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.