Git Product home page Git Product logo

lab-cleaning-categorical-data's Introduction

logo_ironhack_blue 7

Lab | Cleaning categorical data

For this lab, we will be using the dataset in the Customer Analysis Business Case. This dataset can be found in files_for_lab folder. In this lab we will explore categorical data. You can also continue working on the same jupyter notebook from the previous lab. However that is not necessary.

Instructions

  1. Import the necessary libraries if you are starting a new notebook.
  2. Load the csv. Use the variable customer_df as customer_df = pd.read_csv().
  3. What should we do with the customer_id column?
  4. Load the continuous and discrete variables into numericals_df and categorical_df variables, for eg.:
    numerical_df = customer_df.select_dtypes()
    categorical_df = customer_df.select_dtypes()
  5. Plot every categorical variable. What can you see in the plots? Note that in the previous lab you used a bar plot to plot categorical data, with each unique category in the column on the x-axis and an appropriate measure on the y-axis. However, this time you will try a different plot. This time in each plot for the categorical variable you will have, each unique category in the column on the x-axis and the target(which is numerical) on the Y-axis
  6. For the categorical data, check if there is any data cleaning that need to perform. Hint: You can use the function value_counts() on each of the categorical columns and check the representation of different categories in each column. Discuss if this information might in some way be used for data cleaning.

lab-cleaning-categorical-data's People

Contributors

haggarw3 avatar sandrabosk avatar

Stargazers

 avatar

Watchers

 avatar  avatar

lab-cleaning-categorical-data's Issues

Suggestion | Repeated questions

@sandrabosk
This lab is very similar to week 1 lab and questions 9-12 are repeated from previous lab (4.01), it might be a good idea to make it a bit more challenging, for example, adding some task that would require the use of np.where and/or .apply(), like having to create and apply a function that would reduce some categorical column.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.