Git Product home page Git Product logo

torch-choice's People

Contributors

kanodiaayush avatar rodonn avatar tianyudu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

torch-choice's Issues

Coefficient Initialization

It would be ideal if the package allowed for greater flexibility in adjusting the coefficient initialization.

Documentation

Document Checklist
The documentation will be generated automatically after completing the docstring of classes mentioned below.

  • Dataset Objects.
  • Conditional Logit Model.
  • Nested Logit Model.

Tutorial Checklist
We still need someone to proofread these tutorials and point out unclear parts @kanodiaayush.

  • Dataset objects: ChoiceDataset and JointDataset wrapper. Jupiter notebook tutorial is here
  • Conditional Logit Model. Jupyter notebook tutorial is here
  • Nested Logit Model. Jupyter notebook tutorial is here

Other TODOs

Allowing for `user-item` and `user-session-item` specific observables.

The packge currently support the following four types of observables:
image

We are implementing new features so that the package supports useritem_obs and useritemsession_obs observables. As you would expect, they would have shapes of (num_users, num_items, num_variables) and (num_users, num_items, num_sessions, num_variables).

The following components need to be updated.

  • ChoiceDataset class: the x_dict method and _expand_tensor method, these methods directly interact with the model estimation API in CLM and NLM.
  • EasyDatasetWrapper class.
  • Testing.

Allow different lambdas across nests?

Hi Tianyu,

Thanks for the amazing package! I'm trying to allow different lambdas across nests by setting shared_lambda = False. However, it ran into the RuntimeError: Error(s) in loading state_dict for NestedLogitModel: Unexpected key(s) in state_dict: "lambdas". I wonder if there is a way to fix this. Thanks!

Issue with pandas upgrade

Description of the Issue

While running using the following code from the main branch,

import warnings
warnings.filterwarnings("ignore")

import random
from time import time
import numpy as np
import pandas as pd
import torch
import torch_choice
from torch_choice import run
from tqdm import tqdm
from torch_choice.data import ChoiceDataset, JointDataset, utils, load_mode_canada_dataset, load_house_cooling_dataset_v1
from torch_choice.model import ConditionalLogitModel, NestedLogitModel

# set the random seed to enforce reproducibility.
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.use_deterministic_algorithms(True)

car_choice = pd.read_csv("./tutorials/public_datasets/car_choice.csv")
car_choice.head()

user_observable_columns=["gender", "income"]
from torch_choice.utils.easy_data_wrapper import EasyDatasetWrapper
data_wrapper_from_columns = EasyDatasetWrapper(
    main_data=car_choice,
    purchase_record_column='record_id',
    choice_column='purchase',
    item_name_column='car',
    user_index_column='consumer_id',
    session_index_column='session_id',
    user_observable_columns=['gender', 'income'],
    item_observable_columns=['speed'],
    session_observable_columns=['discount'],
    itemsession_observable_columns=['price'])

data_wrapper_from_columns.summary()
dataset = data_wrapper_from_columns.choice_dataset
# ChoiceDataset(label=[], item_index=[885], provided_num_items=[], user_index=[885], session_index=[885], item_availability=[885, 4], item_speed=[4, 1], user_gender=[885, 1], user_income=[885, 1], session_discount=[885, 1], itemsession_price=[885, 4, 1], device=cpu)

Depeneding the pandas version, one may encounter a pandas error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 27
     25 user_observable_columns=["gender", "income"]
     26 from torch_choice.utils.easy_data_wrapper import EasyDatasetWrapper
---> 27 data_wrapper_from_columns = EasyDatasetWrapper(
     28     main_data=car_choice,
     29     purchase_record_column='record_id',
     30     choice_column='purchase',
     31     item_name_column='car',
     32     user_index_column='consumer_id',
     33     session_index_column='session_id',
     34     user_observable_columns=['gender', 'income'],
     35     item_observable_columns=['speed'],
     36     session_observable_columns=['discount'],
     37     itemsession_observable_columns=['price'])
     39 data_wrapper_from_columns.summary()
     40 dataset = data_wrapper_from_columns.choice_dataset

File ~/Development/torch-choice/torch_choice/utils/easy_data_wrapper.py:142, in EasyDatasetWrapper.__init__(self, main_data, purchase_record_column, item_name_column, choice_column, user_index_column, session_index_column, user_observable_data, item_observable_data, useritem_observable_data, session_observable_data, price_observable_data, itemsession_observable_data, useritemsession_observable_data, user_observable_columns, item_observable_columns, useritem_observable_columns, session_observable_columns, price_observable_columns, itemsession_observable_columns, useritemsession_observable_columns, device)
    135 self.derive_observable_from_main_data(item_observable_columns,
    136                                       user_observable_columns,
    137                                       session_observable_columns,
    138                                       price_observable_columns)
    140 self.observable_data_to_observable_tensors()
--> 142 self.create_choice_dataset()
    143 print('Finished Creating Choice Dataset.')

File ~/Development/torch-choice/torch_choice/utils/easy_data_wrapper.py:303, in EasyDatasetWrapper.create_choice_dataset(self)
    301 if len(np.unique(choice_set_size)) > 1:
    302     print(f'Note: choice sets of different sizes found in different purchase records: {rep}')
--> 303     self.item_availability = self.get_item_availability_tensor()
    304 else:
    305     # None means all items are available.
    306     self.item_availability = None

File ~/Development/torch-choice/torch_choice/utils/easy_data_wrapper.py:349, in EasyDatasetWrapper.get_item_availability_tensor(self)
    347 if self.session_index_column is None:
    348     raise ValueError(f'Item availability cannot be constructed without session index column.')
--> 349 A = self.main_data.pivot(self.session_index_column, self.item_name_column, self.choice_column)
    350 return torch.BoolTensor(~np.isnan(A.values))

TypeError: pivot() takes 1 positional argument but 4 were given

Cause of the Issue

After pandas 2.0.0 upgrade, there was a change in the pivot method (see this issue). The position argument to pivot has been disabled. Previous we could simply write df.pivot("A", "B", "C") but now we need to specify df.pivot(index="A", columns="B", values="C").

Complete PyTorch Lightning training loop.

  • add a PyTorch-Lightning class wrapper for both CLM and NLM, ideally should be a single wrapper class handling both model classes.
  • add a notebook/tutorial demonstration the use of the PyTorch lightning wrapper.

getting empty dataset.x_dict

Hi,
I am using the below snippet to create dataset, however, when I print dataset.x_dict, it;s coming out to be and empty dictionary. This is unlike the notebook that I followed, notebook , where dataset.x_dict seems to be automatically created.

`item_index = df[df['choice'] == 1].sort_values(by='case')['alt'].reset_index(drop=True)
print(item_index)
item_names = ['BRAND_1', 'BRAND_2', 'BRAND_3', 'BRAND_4']
num_items = 4
encoder = dict(zip(item_names, range(num_items)))
print(f"{encoder=:}")
item_index = item_index.map(lambda x: encoder[x])
item_index = torch.LongTensor(item_index)
print(f"{item_index=:}")

cost = utils.pivot3d(df, dim0='case', dim1='alt', values='cost')
msrp = utils.pivot3d(df, dim0='case', dim1='alt', values='msrp')

dataset = ChoiceDataset(item_index=item_index,
cost_data=cost,
msrp_data=msrp,
).to(device)
`

this is leading to the below error when I run the code

KeyError Traceback (most recent call last)
in
1 start_time = time()
----> 2 run(model, dataset, num_epochs=500, dataset_test=None, batch_size=-1, learning_rate=0.01, model_optimizer="Adam")
3 print('Time taken:', time() - start_time)

30 frames
/usr/local/lib/python3.8/dist-packages/torch_choice/model/conditional_logit_model.py in forward(self, batch, manual_coef_value_dict)
267 corresponding_observable = var_type.split("[")[0]
268 total_utility += coef(
--> 269 x_dict[corresponding_observable],
270 batch.user_index,
271 manual_coef_value=None if manual_coef_value_dict is None else manual_coef_value_dict[var_type])

KeyError: 'cost_data'

PS: Along with the above mentioned notebook, I was also following your official documentation step by step and I am not able to find what am I missing.

Can you please help me with this issue.

Thanks!

marginal effect

Hi,

Thank you so much for the useful package.

My understanding is that current version of torch-choice does not support the calculation of marginal effect after estimating the logit regression. Is there any way I could get around this issue?

Thank you!

Add user-session-obs support

Previously we were not supporting user-session-specific observables since there tends to be many users and many sessions, so the product space is huge. We need to include this feature for completeness.

Error in EarlyStopping function when providing a validation dataset

I encountered the error below when a validation dataset is provided:

RunTimeError: Early stopping conditioned on metric `val_ll` which is not available. Pass in or modify your `EarlyStopping` callback to use any of the following: `train_loss`, `val_log_likelihood`

I changed the following line in run_helper_lightening.py :

# current
callbacks = [EarlyStopping(monitor="val_ll", mode="max", patience=10, min_delta=0.001)] if val_dataloader is not None else []

# modified
callbacks = [EarlyStopping(monitor="val_log_likelihood", mode="max", patience=10, min_delta=0.001)] if val_dataloader is not None else []

After this change the code runs. I am not very familiar with PyTorch, so I am not sure whether this is a bug or I am making some newbie error here... Thanks for your help!

Package pip install issue - does not install some modules

There seems to be an issue in some package functionality on installing from pip.
I used pip3 install torch-choice to install the package, as suggested in the documentation.

On this, from torch_choice.data import ChoiceDataset gives a ModuleNotFound torch_choice.data error. The same issue does not occur when you install from source.

Note that import torch_choice works on both the methods of install.

@TianyuDu

Handling of sparse matrix

Hi,

Thank you so much for the useful package.

I am am dealing with quite a few items (+10,000) across sessions (+6,000) and sessions are more or less block diagonal in availability of items. In fact the sparsity of availability and session-item matrices are around 5%, meaning 5% of SxI matrix elements are non-zeros.

My understanding is that current version of torch-choice does not support sparse tensors. Is there any way I could get around this issue?

Thank you!

Fix bug that multi-level fixed effect cannot be correctly parsed

Current the model doesn't seem to support (1|user) and (1|item) at the same time, even if both of these are included in the formula, only one of them (the later one) will be used. This is because there is only one internal variable intercept in the model, and the current formula parser tries to attach both user-level variation and item-level variation to the same variable intercept can cause conflict.

Additionally, currently, the model only allows for one level of variation for each observable (e.g., you cannot have both user-specific coefficient and the item-specific coefficient for the same observable). We are actively working on methods to fix this issue.

Undeclared dependencies

Hey, thanks for this new package. Looks cool!

I hit an early snag trying to load the library after pip install; I don't think you've declared any of the dependencies. See the Requires field below.

$ pip3 show torch_choice                                                    ─╯
WARNING: Skipping /usr/local/lib/python3.11/site-packages/six-1.16.0-py3.11.egg-info due to invalid metadata entry 'name'
Name: torch-choice
Version: 1.0.2
Summary: A Pytorch Backend Library for Choice Modelling
Home-page:
Author: Tianyu Du
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.11/site-packages
Requires:
Required-by:

This is easy enough to resolve manually using your requirements.txt file on the repo but it's not something you'd want every user to be doing.

How to model outside option?

First, thank you for this great package!

I wonder if there is a way to model outside option in the package? For example, it is possible that the user buys nothing in a visit. One hacky solution I have is to force all item_obs and user_item_obs to be zero for the outside option. But this approach cannot handle session_obs unless we make every session_obs to be session_item_obs.

According to the following comment in this issue, it seems that there is a way to formally model outside option. I am good with normalizing it to 0 in each category. I very much appreciate it if you can give me some guidance on how to do this. Thank you!

there needs to be a discussion on how the outside option is modeled. How does the model choose that the user buys nothing from a given category? Can we change the value of the outside option in each category or is normalized to 0 for each category?

More items than purchase records (i.e. unchosen items)

Thank you so much for the package.

In my setting I have more purchase records than there are items on the market, which means that some items are left unchosen.

I think this variation is important in estimating the choice model, but I do not see how this gets passed into ChoiceDataset object.

It seems like it is limiting item numbers to be the ones that are seen in the purchase record.

Is there a fix to this?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.