gsbdbi / torch-choice Goto Github PK
View Code? Open in Web Editor NEWChoice modeling with PyTorch: logit model and nested logit model
License: MIT License
Choice modeling with PyTorch: logit model and nested logit model
License: MIT License
Add the run()
/fit()
as a member function of LitBEMBFlex for the models in the torch-choice library.
(Moved discussion to the developer forum)
Document Checklist
The documentation will be generated automatically after completing the docstring of classes mentioned below.
Tutorial Checklist
We still need someone to proofread these tutorials and point out unclear parts @kanodiaayush.
ChoiceDataset
and JointDataset
wrapper. Jupiter notebook tutorial is hereOther TODOs
The ChoiceDataset
is expecting a tensor of shape (S, I, *)
for observables depending on both item and session. The shape of tensors is session-item-*
, but right now such a variable is called itemsession_obs
. We need to change this to sessionitem_obs
to make it more intuitive.
First, thank you for this great package!
I wonder if there is a way to model outside option in the package? For example, it is possible that the user buys nothing in a visit. One hacky solution I have is to force all item_obs
and user_item_obs
to be zero for the outside option. But this approach cannot handle session_obs
unless we make every session_obs
to be session_item_obs
.
According to the following comment in this issue, it seems that there is a way to formally model outside option. I am good with normalizing it to 0 in each category. I very much appreciate it if you can give me some guidance on how to do this. Thank you!
there needs to be a discussion on how the outside option is modeled. How does the model choose that the user buys nothing from a given category? Can we change the value of the outside option in each category or is normalized to 0 for each category?
There seems to be an issue in some package functionality on installing from pip.
I used pip3 install torch-choice
to install the package, as suggested in the documentation.
On this, from torch_choice.data import ChoiceDataset
gives a ModuleNotFound torch_choice.data
error. The same issue does not occur when you install from source.
Note that import torch_choice
works on both the methods of install.
I will review and comment all the pages accessible in the link above
The packge currently support the following four types of observables:
We are implementing new features so that the package supports useritem_obs
and useritemsession_obs
observables. As you would expect, they would have shapes of (num_users, num_items, num_variables)
and (num_users, num_items, num_sessions, num_variables)
.
The following components need to be updated.
ChoiceDataset
class: the x_dict
method and _expand_tensor
method, these methods directly interact with the model estimation API in CLM and NLM.EasyDatasetWrapper
class.It would be ideal if the package allowed for greater flexibility in adjusting the coefficient initialization.
We may want to consider integrating the Pydantic
into our package for more robust implementation.
Previously we were not supporting user-session-specific observables since there tends to be many users and many sessions, so the product space is huge. We need to include this feature for completeness.
Hey, thanks for this new package. Looks cool!
I hit an early snag trying to load the library after pip
install; I don't think you've declared any of the dependencies. See the Requires field below.
$ pip3 show torch_choice ─╯
WARNING: Skipping /usr/local/lib/python3.11/site-packages/six-1.16.0-py3.11.egg-info due to invalid metadata entry 'name'
Name: torch-choice
Version: 1.0.2
Summary: A Pytorch Backend Library for Choice Modelling
Home-page:
Author: Tianyu Du
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.11/site-packages
Requires:
Required-by:
This is easy enough to resolve manually using your requirements.txt file on the repo but it's not something you'd want every user to be doing.
Thank you so much for the package.
In my setting I have more purchase records than there are items on the market, which means that some items are left unchosen.
I think this variation is important in estimating the choice model, but I do not see how this gets passed into ChoiceDataset object.
It seems like it is limiting item numbers to be the ones that are seen in the purchase record.
Is there a fix to this?
Thank you!
I encountered the error below when a validation dataset is provided:
RunTimeError: Early stopping conditioned on metric `val_ll` which is not available. Pass in or modify your `EarlyStopping` callback to use any of the following: `train_loss`, `val_log_likelihood`
I changed the following line in run_helper_lightening.py
:
# current
callbacks = [EarlyStopping(monitor="val_ll", mode="max", patience=10, min_delta=0.001)] if val_dataloader is not None else []
# modified
callbacks = [EarlyStopping(monitor="val_log_likelihood", mode="max", patience=10, min_delta=0.001)] if val_dataloader is not None else []
After this change the code runs. I am not very familiar with PyTorch, so I am not sure whether this is a bug or I am making some newbie error here... Thanks for your help!
Add the LBFGS optimization algorithm to the run file.
Hi,
Thank you so much for the useful package.
I am am dealing with quite a few items (+10,000) across sessions (+6,000) and sessions are more or less block diagonal in availability of items. In fact the sparsity of availability and session-item matrices are around 5%, meaning 5% of SxI matrix elements are non-zeros.
My understanding is that current version of torch-choice does not support sparse tensors. Is there any way I could get around this issue?
Thank you!
I found there is new documentation on Hessian computation here, we might want to improve the current code of Hessian computation with functorch
.
While running using the following code from the main branch,
import warnings
warnings.filterwarnings("ignore")
import random
from time import time
import numpy as np
import pandas as pd
import torch
import torch_choice
from torch_choice import run
from tqdm import tqdm
from torch_choice.data import ChoiceDataset, JointDataset, utils, load_mode_canada_dataset, load_house_cooling_dataset_v1
from torch_choice.model import ConditionalLogitModel, NestedLogitModel
# set the random seed to enforce reproducibility.
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.use_deterministic_algorithms(True)
car_choice = pd.read_csv("./tutorials/public_datasets/car_choice.csv")
car_choice.head()
user_observable_columns=["gender", "income"]
from torch_choice.utils.easy_data_wrapper import EasyDatasetWrapper
data_wrapper_from_columns = EasyDatasetWrapper(
main_data=car_choice,
purchase_record_column='record_id',
choice_column='purchase',
item_name_column='car',
user_index_column='consumer_id',
session_index_column='session_id',
user_observable_columns=['gender', 'income'],
item_observable_columns=['speed'],
session_observable_columns=['discount'],
itemsession_observable_columns=['price'])
data_wrapper_from_columns.summary()
dataset = data_wrapper_from_columns.choice_dataset
# ChoiceDataset(label=[], item_index=[885], provided_num_items=[], user_index=[885], session_index=[885], item_availability=[885, 4], item_speed=[4, 1], user_gender=[885, 1], user_income=[885, 1], session_discount=[885, 1], itemsession_price=[885, 4, 1], device=cpu)
Depeneding the pandas version, one may encounter a pandas error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[1], line 27
25 user_observable_columns=["gender", "income"]
26 from torch_choice.utils.easy_data_wrapper import EasyDatasetWrapper
---> 27 data_wrapper_from_columns = EasyDatasetWrapper(
28 main_data=car_choice,
29 purchase_record_column='record_id',
30 choice_column='purchase',
31 item_name_column='car',
32 user_index_column='consumer_id',
33 session_index_column='session_id',
34 user_observable_columns=['gender', 'income'],
35 item_observable_columns=['speed'],
36 session_observable_columns=['discount'],
37 itemsession_observable_columns=['price'])
39 data_wrapper_from_columns.summary()
40 dataset = data_wrapper_from_columns.choice_dataset
File ~/Development/torch-choice/torch_choice/utils/easy_data_wrapper.py:142, in EasyDatasetWrapper.__init__(self, main_data, purchase_record_column, item_name_column, choice_column, user_index_column, session_index_column, user_observable_data, item_observable_data, useritem_observable_data, session_observable_data, price_observable_data, itemsession_observable_data, useritemsession_observable_data, user_observable_columns, item_observable_columns, useritem_observable_columns, session_observable_columns, price_observable_columns, itemsession_observable_columns, useritemsession_observable_columns, device)
135 self.derive_observable_from_main_data(item_observable_columns,
136 user_observable_columns,
137 session_observable_columns,
138 price_observable_columns)
140 self.observable_data_to_observable_tensors()
--> 142 self.create_choice_dataset()
143 print('Finished Creating Choice Dataset.')
File ~/Development/torch-choice/torch_choice/utils/easy_data_wrapper.py:303, in EasyDatasetWrapper.create_choice_dataset(self)
301 if len(np.unique(choice_set_size)) > 1:
302 print(f'Note: choice sets of different sizes found in different purchase records: {rep}')
--> 303 self.item_availability = self.get_item_availability_tensor()
304 else:
305 # None means all items are available.
306 self.item_availability = None
File ~/Development/torch-choice/torch_choice/utils/easy_data_wrapper.py:349, in EasyDatasetWrapper.get_item_availability_tensor(self)
347 if self.session_index_column is None:
348 raise ValueError(f'Item availability cannot be constructed without session index column.')
--> 349 A = self.main_data.pivot(self.session_index_column, self.item_name_column, self.choice_column)
350 return torch.BoolTensor(~np.isnan(A.values))
TypeError: pivot() takes 1 positional argument but 4 were given
After pandas 2.0.0 upgrade, there was a change in the pivot
method (see this issue). The position argument to pivot
has been disabled. Previous we could simply write df.pivot("A", "B", "C")
but now we need to specify df.pivot(index="A", columns="B", values="C")
.
Hi Tianyu,
Thanks for the amazing package! I'm trying to allow different lambdas across nests by setting shared_lambda = False. However, it ran into the RuntimeError: Error(s) in loading state_dict for NestedLogitModel: Unexpected key(s) in state_dict: "lambdas". I wonder if there is a way to fix this. Thanks!
Current the model doesn't seem to support (1|user)
and (1|item)
at the same time, even if both of these are included in the formula, only one of them (the later one) will be used. This is because there is only one internal variable intercept
in the model, and the current formula parser tries to attach both user-level variation and item-level variation to the same variable intercept
can cause conflict.
Additionally, currently, the model only allows for one level of variation for each observable (e.g., you cannot have both user-specific coefficient and the item-specific coefficient for the same observable). We are actively working on methods to fix this issue.
Hi,
I am using the below snippet to create dataset, however, when I print dataset.x_dict, it;s coming out to be and empty dictionary. This is unlike the notebook that I followed, notebook , where dataset.x_dict seems to be automatically created.
`item_index = df[df['choice'] == 1].sort_values(by='case')['alt'].reset_index(drop=True)
print(item_index)
item_names = ['BRAND_1', 'BRAND_2', 'BRAND_3', 'BRAND_4']
num_items = 4
encoder = dict(zip(item_names, range(num_items)))
print(f"{encoder=:}")
item_index = item_index.map(lambda x: encoder[x])
item_index = torch.LongTensor(item_index)
print(f"{item_index=:}")
cost = utils.pivot3d(df, dim0='case', dim1='alt', values='cost')
msrp = utils.pivot3d(df, dim0='case', dim1='alt', values='msrp')
dataset = ChoiceDataset(item_index=item_index,
cost_data=cost,
msrp_data=msrp,
).to(device)
`
this is leading to the below error when I run the code
KeyError Traceback (most recent call last)
in
1 start_time = time()
----> 2 run(model, dataset, num_epochs=500, dataset_test=None, batch_size=-1, learning_rate=0.01, model_optimizer="Adam")
3 print('Time taken:', time() - start_time)30 frames
/usr/local/lib/python3.8/dist-packages/torch_choice/model/conditional_logit_model.py in forward(self, batch, manual_coef_value_dict)
267 corresponding_observable = var_type.split("[")[0]
268 total_utility += coef(
--> 269 x_dict[corresponding_observable],
270 batch.user_index,
271 manual_coef_value=None if manual_coef_value_dict is None else manual_coef_value_dict[var_type])KeyError: 'cost_data'
PS: Along with the above mentioned notebook, I was also following your official documentation step by step and I am not able to find what am I missing.
Can you please help me with this issue.
Thanks!
See issue
Hi,
Thank you so much for the useful package.
My understanding is that current version of torch-choice does not support the calculation of marginal effect after estimating the logit regression. Is there any way I could get around this issue?
Thank you!
(1) Add the device
attribute to conditional-logit and nested-logit models, (2) add the device information to the __str__
method as well; and (3) add the to()
method to these models for easier moving them between CPU and GPU.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.