We provide the code for our work: "Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank".
pip install -r requirements.txt
We provide the following datasets for testing:
- dataset_fully_simulation/K=1: a connected IG
- dataset_fully_simulation/K=2: an IG with 2 connected components
- dataset_fully_simulation/K=3: an IG with 3 connected components
- dataset_fully_simulation/K=4: an IG with 4 connected components
- dataset_fully_simulation/K=2_node_intervention: a connected IG with applying node intervention on K=2 datasets
We used ULTRA framework (https://github.com/ULTR-Community/ULTRA) to process Yahoo! and Istella-S datasets. You can run the following commands on ULTRA project:
bash ./example/Yahoo/offline_exp_pipeline.sh
bash ./example/Istella-S/offline_exp_pipeline.sh
Check datasets without context types:
- Fully simulation dataset
python identifiability_check.py --data_path "dataset_fully_simulation/K=2"
- Yahoo!LETOR
python identifiability_check.py --data_path "Yahoo_letor/tmp_data"
- Check datasets with context types (see the section "Simulate context types").
python identifiability_check.py --data_path "Yahoo_letor/tmp_data" --context_path "Yahoo_letor/tmp_data/context.pkl"
- Test the
$K=2$ case:
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=2" --algorithm "dla"
Algorithm choices: dla / regression_em / two_tower
- Test the
$K=2$ case with node intervention:
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=2_node_intervention"
- Test the
$K=2$ case with node merging (here we merge 3 and 4 which is the best strategy, but you can try other strategies):
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=2" --node_merging_strategies "3-4"
- Test the
$K=2$ case with node intervention, but with random cost:
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=2" --random_node_intervention
- Test another number of clicks:
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=1" --number_of_clicks 10000
- Simulate 5,000 context types on Yahoo! and write to
Yahoo_letor/tmp_data/context.pkl
. It will also do node merging and save the merging results.
python simulate_context_and_node_merging.py --data_path "Yahoo_letor/tmp_data" --context_path "Yahoo_letor/tmp_data/context.pkl" --n_context 5000
- Test node merging:
python test_semi_synthetic.py --data_path Yahoo_letor/tmp_data --context_path Yahoo_letor/tmp_data/context.pkl
- Test no identification:
python test_semi_synthetic.py --data_path Yahoo_letor/tmp_data --context_path Yahoo_letor/tmp_data/context.pkl --no_identification
Please consider citing the following paper when using our code for your application.
@inproceedings{chen2024identifiability,
title={Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank},
author={Mouxiang Chen and Chenghao Liu and Zemin Liu and Zhuo Li and Jianling Sun},
booktitle={Proceedings of the 41st International Conference on Machine Learning},
year={2024}
}