Git Product home page Git Product logo

Comments (4)

oywtece avatar oywtece commented on August 20, 2024 1

Each row of the Avito data csv file contains:
<target label><target one-hot fts><target mul-hot fts><ctxt1 one-hot fts, ctxt2 one-hot fts, ..., ctxt5 one-hot fts><ctxt1 multi-hot fts, ctxt2 multi-hot fts, ..., ctxt5 multi-hot fts><clk1 one-hot fts, clk2 one-hot fts, ..., clk5 one-hot fts><clk1 multi-hot fts, clk2 multi-hot fts, ..., clk5 multi-hot fts><unclk1 one-hot fts, unclk2 one-hot fts, ..., unclk5 one-hot fts><unclk1 multi-hot fts, unclk2 multi-hot fts, ..., unclk5 multi-hot fts>

There are n_one_hot_slot = 25 one-hot fts and n_mul_hot_slot = 2 multi-hot fts (with max_len_per_slot = 5; i.e., each multi-hot ft has 5 values; e.g., the multi-hot ft "search_params" may contain param a, param b, ..., param e) for each ad.
Therefore, there are totoally 1 + (25 + 2*5)*(1+5+5+5) = 561 columns in the csv file.

The 25 one_hot fts are:
bias, ad_id, position, ip_id, user_id, is_user_logged_on, search_query_keyword, search_loc_id, search_loc_level, search_region_id, search_city_id, search_cate_id, search_cate_level, search_par_cate_id, search_sub_cate_id, user_agent_id, user_agent_os_id, user_device_id, user_agent_family_id, ad_title_keyword, ad_cate_id, ad_cate_level, ad_parent_cate_id, ad_sub_cate_id, hist_ctr_bin

The 2 multi-hot fts are:
search_params, ad_params

from dstn.

oywtece avatar oywtece commented on August 20, 2024 1

'bias' is simply 1. It is redundant if your model has a bias parameter.

from dstn.

caotianwei avatar caotianwei commented on August 20, 2024

Each row of the Avito data csv file contains:
<ctxt1 one-hot fts, ctxt2 one-hot fts, ..., ctxt5 one-hot fts><ctxt1 multi-hot fts, ctxt2 multi-hot fts, ..., ctxt5 multi-hot fts><clk1 one-hot fts, clk2 one-hot fts, ..., clk5 one-hot fts><clk1 multi-hot fts, clk2 multi-hot fts, ..., clk5 multi-hot fts><unclk1 one-hot fts, unclk2 one-hot fts, ..., unclk5 one-hot fts><unclk1 multi-hot fts, unclk2 multi-hot fts, ..., unclk5 multi-hot fts>

There are n_one_hot_slot = 25 one-hot fts and n_mul_hot_slot = 2 multi-hot fts (with max_len_per_slot = 5; i.e., each multi-hot ft has 5 values; e.g., the multi-hot ft "search_params" may contain param a, param b, ..., param e) for each ad.
Therefore, there are totoally 1 + (25 + 25)(1+5+5+5) = 561 columns in the csv file.

The 25 one_hot fts are:
ad_id, position, ip_id, user_id, is_user_logged_on, search_query_keyword, search_loc_id, search_loc_level, search_region_id, search_city_id, search_cate_id, search_cate_level, search_par_cate_id, search_sub_cate_id, user_agent_id, user_agent_os_id, user_device_id, user_agent_family_id, ad_title_keyword, ad_cate_id, ad_cate_level, ad_parent_cate_id, ad_sub_cate_id, hist_ctr_bin

The 2 multi-hot fts are:
search_params, ad_params

Thanks for your patience!:)

from dstn.

caotianwei avatar caotianwei commented on August 20, 2024

Each row of the Avito data csv file contains:
<ctxt1 one-hot fts, ctxt2 one-hot fts, ..., ctxt5 one-hot fts><ctxt1 multi-hot fts, ctxt2 multi-hot fts, ..., ctxt5 multi-hot fts><clk1 one-hot fts, clk2 one-hot fts, ..., clk5 one-hot fts><clk1 multi-hot fts, clk2 multi-hot fts, ..., clk5 multi-hot fts><unclk1 one-hot fts, unclk2 one-hot fts, ..., unclk5 one-hot fts><unclk1 multi-hot fts, unclk2 multi-hot fts, ..., unclk5 multi-hot fts>

There are n_one_hot_slot = 25 one-hot fts and n_mul_hot_slot = 2 multi-hot fts (with max_len_per_slot = 5; i.e., each multi-hot ft has 5 values; e.g., the multi-hot ft "search_params" may contain param a, param b, ..., param e) for each ad.
Therefore, there are totoally 1 + (25 + 25)(1+5+5+5) = 561 columns in the csv file.

The 25 one_hot fts are:
bias, ad_id, position, ip_id, user_id, is_user_logged_on, search_query_keyword, search_loc_id, search_loc_level, search_region_id, search_city_id, search_cate_id, search_cate_level, search_par_cate_id, search_sub_cate_id, user_agent_id, user_agent_os_id, user_device_id, user_agent_family_id, ad_title_keyword, ad_cate_id, ad_cate_level, ad_parent_cate_id, ad_sub_cate_id, hist_ctr_bin

The 2 multi-hot fts are:
search_params, ad_params

One more quetion...what does the 'bias' (the first one-hot feature) refer to? It seems not included in the Avito dataset.

from dstn.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.