gsbdbi / torch-choice Goto Github PK

View Code? Open in Web Editor NEW

39.0 39.0 8.0 58.89 MB

Choice modeling with PyTorch: logit model and nested logit model

License: MIT License

Python 5.05% Jupyter Notebook 94.71% R 0.22% Shell 0.02%

torch-choice's People

Contributors

Stargazers

Watchers

Forkers

rodonn jintao-sun yuxing-liang girmatkassie tuantx7110 leongogin xiaosanmeng marek-chadim

torch-choice's Issues

Add the `run()`/`fit()` as a member function of LitBEMBFlex for the models in the torch-choice library.

Add the run()/fit() as a member function of LitBEMBFlex for the models in the torch-choice library.

Moved discussion to the developer forum

(Moved discussion to the developer forum)

Documentation

Document Checklist
The documentation will be generated automatically after completing the docstring of classes mentioned below.

Dataset Objects.
Conditional Logit Model.
Nested Logit Model.

Tutorial Checklist
We still need someone to proofread these tutorials and point out unclear parts @kanodiaayush.

Dataset objects: ChoiceDataset and JointDataset wrapper. Jupiter notebook tutorial is here
Conditional Logit Model. Jupyter notebook tutorial is here
Nested Logit Model. Jupyter notebook tutorial is here

Other TODOs

Simulation studies (required by journal of statistical software).
Paper-like documentation for this project, see https://www.jstatsoft.org/authors

Change osbervable names `itemsession_<obs_name>` to `sessionitem_<obs_name>`.

The ChoiceDataset is expecting a tensor of shape (S, I, *) for observables depending on both item and session. The shape of tensors is session-item-*, but right now such a variable is called itemsession_obs. We need to change this to sessionitem_obs to make it more intuitive.

How to model outside option?

First, thank you for this great package!

I wonder if there is a way to model outside option in the package? For example, it is possible that the user buys nothing in a visit. One hacky solution I have is to force all item_obs and user_item_obs to be zero for the outside option. But this approach cannot handle session_obs unless we make every session_obs to be session_item_obs.

According to the following comment in this issue, it seems that there is a way to formally model outside option. I am good with normalizing it to 0 in each category. I very much appreciate it if you can give me some guidance on how to do this. Thank you!

there needs to be a discussion on how the outside option is modeled. How does the model choose that the user buys nothing from a given category? Can we change the value of the outside option in each category or is normalized to 0 for each category?

Package pip install issue - does not install some modules

There seems to be an issue in some package functionality on installing from pip.
I used pip3 install torch-choice to install the package, as suggested in the documentation.

On this, from torch_choice.data import ChoiceDataset gives a ModuleNotFound torch_choice.data error. The same issue does not occur when you install from source.

Note that import torch_choice works on both the methods of install.

@TianyuDu

Review Torch Choice tutorial

Link to tutorial .

I will review and comment all the pages accessible in the link above

@TianyuDu and @kanodiaayush

Allowing for `user-item` and `user-session-item` specific observables.

The packge currently support the following four types of observables:

We are implementing new features so that the package supports useritem_obs and useritemsession_obs observables. As you would expect, they would have shapes of (num_users, num_items, num_variables) and (num_users, num_items, num_sessions, num_variables).

The following components need to be updated.

ChoiceDataset class: the x_dict method and _expand_tensor method, these methods directly interact with the model estimation API in CLM and NLM.
EasyDatasetWrapper class.
Testing.

Coefficient Initialization

It would be ideal if the package allowed for greater flexibility in adjusting the coefficient initialization.

Integrate `Pydantic` into the torch-choice package

We may want to consider integrating the Pydantic into our package for more robust implementation.

Complete PyTorch Lightning training loop.

add a PyTorch-Lightning class wrapper for both CLM and NLM, ideally should be a single wrapper class handling both model classes.
add a notebook/tutorial demonstration the use of the PyTorch lightning wrapper.

Add user-session-obs support

Previously we were not supporting user-session-specific observables since there tends to be many users and many sessions, so the product space is huge. We need to include this feature for completeness.

Undeclared dependencies

Hey, thanks for this new package. Looks cool!

I hit an early snag trying to load the library after pip install; I don't think you've declared any of the dependencies. See the Requires field below.

$ pip3 show torch_choice                                                    ─╯
WARNING: Skipping /usr/local/lib/python3.11/site-packages/six-1.16.0-py3.11.egg-info due to invalid metadata entry 'name'
Name: torch-choice
Version: 1.0.2
Summary: A Pytorch Backend Library for Choice Modelling
Home-page:
Author: Tianyu Du
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.11/site-packages
Requires:
Required-by:

This is easy enough to resolve manually using your requirements.txt file on the repo but it's not something you'd want every user to be doing.

More items than purchase records (i.e. unchosen items)

Thank you so much for the package.

In my setting I have more purchase records than there are items on the market, which means that some items are left unchosen.

I think this variation is important in estimating the choice model, but I do not see how this gets passed into ChoiceDataset object.

It seems like it is limiting item numbers to be the ones that are seen in the purchase record.

Is there a fix to this?

Thank you!

Error in EarlyStopping function when providing a validation dataset

I encountered the error below when a validation dataset is provided:

RunTimeError: Early stopping conditioned on metric `val_ll` which is not available. Pass in or modify your `EarlyStopping` callback to use any of the following: `train_loss`, `val_log_likelihood`

I changed the following line in run_helper_lightening.py :

# current
callbacks = [EarlyStopping(monitor="val_ll", mode="max", patience=10, min_delta=0.001)] if val_dataloader is not None else []

# modified
callbacks = [EarlyStopping(monitor="val_log_likelihood", mode="max", patience=10, min_delta=0.001)] if val_dataloader is not None else []

After this change the code runs. I am not very familiar with PyTorch, so I am not sure whether this is a bug or I am making some newbie error here... Thanks for your help!

Adding support for the name `sessionitem_obs`

LBFGS optimization

Add the LBFGS optimization algorithm to the run file.

Handling of sparse matrix

Hi,

Thank you so much for the useful package.

I am am dealing with quite a few items (+10,000) across sessions (+6,000) and sessions are more or less block diagonal in availability of items. In fact the sparsity of availability and session-item matrices are around 5%, meaning 5% of SxI matrix elements are non-zeros.

My understanding is that current version of torch-choice does not support sparse tensors. Is there any way I could get around this issue?

Thank you!

Improve code quality of `std.py` for Hessian and standard deviation computations.

I found there is new documentation on Hessian computation here, we might want to improve the current code of Hessian computation with functorch.

Issue with pandas upgrade

Description of the Issue

While running using the following code from the main branch,

import warnings
warnings.filterwarnings("ignore")

import random
from time import time
import numpy as np
import pandas as pd
import torch
import torch_choice
from torch_choice import run
from tqdm import tqdm
from torch_choice.data import ChoiceDataset, JointDataset, utils, load_mode_canada_dataset, load_house_cooling_dataset_v1
from torch_choice.model import ConditionalLogitModel, NestedLogitModel

# set the random seed to enforce reproducibility.
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.use_deterministic_algorithms(True)

car_choice = pd.read_csv("./tutorials/public_datasets/car_choice.csv")
car_choice.head()

user_observable_columns=["gender", "income"]
from torch_choice.utils.easy_data_wrapper import EasyDatasetWrapper
data_wrapper_from_columns = EasyDatasetWrapper(
    main_data=car_choice,
    purchase_record_column='record_id',
    choice_column='purchase',
    item_name_column='car',
    user_index_column='consumer_id',
    session_index_column='session_id',
    user_observable_columns=['gender', 'income'],
    item_observable_columns=['speed'],
    session_observable_columns=['discount'],
    itemsession_observable_columns=['price'])

data_wrapper_from_columns.summary()
dataset = data_wrapper_from_columns.choice_dataset
# ChoiceDataset(label=[], item_index=[885], provided_num_items=[], user_index=[885], session_index=[885], item_availability=[885, 4], item_speed=[4, 1], user_gender=[885, 1], user_income=[885, 1], session_discount=[885, 1], itemsession_price=[885, 4, 1], device=cpu)

Depeneding the pandas version, one may encounter a pandas error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 27
     25 user_observable_columns=["gender", "income"]
     26 from torch_choice.utils.easy_data_wrapper import EasyDatasetWrapper
---> 27 data_wrapper_from_columns = EasyDatasetWrapper(
     28     main_data=car_choice,
     29     purchase_record_column='record_id',
     30     choice_column='purchase',
     31     item_name_column='car',
     32     user_index_column='consumer_id',
     33     session_index_column='session_id',
     34     user_observable_columns=['gender', 'income'],
     35     item_observable_columns=['speed'],
     36     session_observable_columns=['discount'],
     37     itemsession_observable_columns=['price'])
     39 data_wrapper_from_columns.summary()
     40 dataset = data_wrapper_from_columns.choice_dataset

File ~/Development/torch-choice/torch_choice/utils/easy_data_wrapper.py:142, in EasyDatasetWrapper.__init__(self, main_data, purchase_record_column, item_name_column, choice_column, user_index_column, session_index_column, user_observable_data, item_observable_data, useritem_observable_data, session_observable_data, price_observable_data, itemsession_observable_data, useritemsession_observable_data, user_observable_columns, item_observable_columns, useritem_observable_columns, session_observable_columns, price_observable_columns, itemsession_observable_columns, useritemsession_observable_columns, device)
    135 self.derive_observable_from_main_data(item_observable_columns,
    136                                       user_observable_columns,
    137                                       session_observable_columns,
    138                                       price_observable_columns)
    140 self.observable_data_to_observable_tensors()
--> 142 self.create_choice_dataset()
    143 print('Finished Creating Choice Dataset.')

File ~/Development/torch-choice/torch_choice/utils/easy_data_wrapper.py:303, in EasyDatasetWrapper.create_choice_dataset(self)
    301 if len(np.unique(choice_set_size)) > 1:
    302     print(f'Note: choice sets of different sizes found in different purchase records: {rep}')
--> 303     self.item_availability = self.get_item_availability_tensor()
    304 else:
    305     # None means all items are available.
    306     self.item_availability = None

File ~/Development/torch-choice/torch_choice/utils/easy_data_wrapper.py:349, in EasyDatasetWrapper.get_item_availability_tensor(self)
    347 if self.session_index_column is None:
    348     raise ValueError(f'Item availability cannot be constructed without session index column.')
--> 349 A = self.main_data.pivot(self.session_index_column, self.item_name_column, self.choice_column)
    350 return torch.BoolTensor(~np.isnan(A.values))

TypeError: pivot() takes 1 positional argument but 4 were given

Cause of the Issue

After pandas 2.0.0 upgrade, there was a change in the pivot method (see this issue). The position argument to pivot has been disabled. Previous we could simply write df.pivot("A", "B", "C") but now we need to specify df.pivot(index="A", columns="B", values="C").

Allow different lambdas across nests?

Hi Tianyu,

Thanks for the amazing package! I'm trying to allow different lambdas across nests by setting shared_lambda = False. However, it ran into the RuntimeError: Error(s) in loading state_dict for NestedLogitModel: Unexpected key(s) in state_dict: "lambdas". I wonder if there is a way to fix this. Thanks!

Fix bug that multi-level fixed effect cannot be correctly parsed

Current the model doesn't seem to support (1|user) and (1|item) at the same time, even if both of these are included in the formula, only one of them (the later one) will be used. This is because there is only one internal variable intercept in the model, and the current formula parser tries to attach both user-level variation and item-level variation to the same variable intercept can cause conflict.

Additionally, currently, the model only allows for one level of variation for each observable (e.g., you cannot have both user-specific coefficient and the item-specific coefficient for the same observable). We are actively working on methods to fix this issue.

getting empty dataset.x_dict

Hi,
I am using the below snippet to create dataset, however, when I print dataset.x_dict, it;s coming out to be and empty dictionary. This is unlike the notebook that I followed, notebook , where dataset.x_dict seems to be automatically created.

`item_index = df[df['choice'] == 1].sort_values(by='case')['alt'].reset_index(drop=True)
print(item_index)
item_names = ['BRAND_1', 'BRAND_2', 'BRAND_3', 'BRAND_4']
num_items = 4
encoder = dict(zip(item_names, range(num_items)))
print(f"{encoder=:}")
item_index = item_index.map(lambda x: encoder[x])
item_index = torch.LongTensor(item_index)
print(f"{item_index=:}")

cost = utils.pivot3d(df, dim0='case', dim1='alt', values='cost')
msrp = utils.pivot3d(df, dim0='case', dim1='alt', values='msrp')

dataset = ChoiceDataset(item_index=item_index,
cost_data=cost,
msrp_data=msrp,
).to(device)
`

this is leading to the below error when I run the code

KeyError Traceback (most recent call last)
in
1 start_time = time()
----> 2 run(model, dataset, num_epochs=500, dataset_test=None, batch_size=-1, learning_rate=0.01, model_optimizer="Adam")
3 print('Time taken:', time() - start_time)

30 frames
/usr/local/lib/python3.8/dist-packages/torch_choice/model/conditional_logit_model.py in forward(self, batch, manual_coef_value_dict)
267 corresponding_observable = var_type.split("[")[0]
268 total_utility += coef(
--> 269 x_dict[corresponding_observable],
270 batch.user_index,
271 manual_coef_value=None if manual_coef_value_dict is None else manual_coef_value_dict[var_type])

KeyError: 'cost_data'

PS: Along with the above mentioned notebook, I was also following your official documentation step by step and I am not able to find what am I missing.

Can you please help me with this issue.

Thanks!

Summary Statistics Helper Function for `ChoiceDataset` Class

Code Release Plan for `torch_choice`

See issue

marginal effect

Hi,

Thank you so much for the useful package.

My understanding is that current version of torch-choice does not support the calculation of marginal effect after estimating the logit regression. Is there any way I could get around this issue?

Thank you!

Add device information to conditional-logit and nested-logit models

(1) Add the device attribute to conditional-logit and nested-logit models, (2) add the device information to the __str__ method as well; and (3) add the to() method to these models for easier moving them between CPU and GPU.

gsbdbi / torch-choice Goto Github PK

torch-choice's People

Contributors

Stargazers

Watchers

Forkers

torch-choice's Issues

Description of the Issue

Cause of the Issue

Recommend Projects

Recommend Topics

Recommend Org