Git Product home page Git Product logo

Comments (2)

Usernamezhx avatar Usernamezhx commented on May 23, 2024 2

Thank you very much for your patiently reply.

from pytorch-widedeep.

jrzaurin avatar jrzaurin commented on May 23, 2024

Hi @Usernamezhx

ok, let me explain. If you go here:
https://github.com/jrzaurin/Wide-and-Deep-PyTorch/blob/master/prepare_data.py#L52

you will read the following:

    embeddings_cols: List
        List containing just the name of the columns that will be represented
        with embeddings or a Tuple with the name and the embedding dimension.
        e.g.:  [('education',32), ('relationship',16)
    continuous_cols: List
        List with the name of the so called continuous cols
    standardize_cols: List
        List with the name of the continuous cols that will be Standarised.
        Only included because the Airbnb dataset includes Longitude and
        Latitude and does not make sense to normalise that

The functions in prepare_data.py are highly customised to the problem in particular. So, given this input, and for the airbnb dataset:

continuous_cols = ['latitude', 'longitude', 'security_deposit', 'extra_people']
standardize_cols = ['security_deposit', 'extra_people']

what will happen is that while 'security_deposit', 'extra_people' will be standarised, 'latitude', 'longitude' will not (because it does not make sense.

Regarding to the other column-type inputs, if you go here:
https://github.com/jrzaurin/Wide-and-Deep-PyTorch/blob/master/prepare_data.py#L128

you will read the following:

    wide_cols: List
        List with the name of the columns that will be one-hot encoded and
        pass through the Wide model
    crossed_cols: List
        List of Tuples with the name of the columns that will be "crossed"
        and then one-hot encoded. e.g. (['education', 'occupation'], ...)
    already_dummies: List
        List of columns that are already dummies/one-hot encoded

The wide columns are normally one-hot encoded and then pass through the model. However, there might be some columns that are already one hot encoded, and I call them already_dummies.

And regarding to your last question: "how to select the element to constitute interaction feature?" The answer is that you have to experiment, there is no rule for that. For example, if you have a couple of features and you think that including their relation might add useful information, then is probably useful if you "cross them". For example, directly from the tensorflow tutorials: "...If you have a feature 'favorite_sport' and a feature 'home_city' and you're trying to predict whether a person likes to wear red, your linear model won't be able to learn that baseball fans from St. Louis especially like to wear red..."

Let me know if this helps

from pytorch-widedeep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.