Git Product home page Git Product logo

ultr-identifiability's Introduction

ULTR-identifiability

We provide the code for our work: "Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank".

pip install -r requirements.txt

Dataset

We provide the following datasets for testing:

Fully simulation datasets

  • dataset_fully_simulation/K=1: a connected IG
  • dataset_fully_simulation/K=2: an IG with 2 connected components
  • dataset_fully_simulation/K=3: an IG with 3 connected components
  • dataset_fully_simulation/K=4: an IG with 4 connected components
  • dataset_fully_simulation/K=2_node_intervention: a connected IG with applying node intervention on K=2 datasets

Semi-synthetic datasets

We used ULTRA framework (https://github.com/ULTR-Community/ULTRA) to process Yahoo! and Istella-S datasets. You can run the following commands on ULTRA project:

bash ./example/Yahoo/offline_exp_pipeline.sh
bash ./example/Istella-S/offline_exp_pipeline.sh

Check identifiability with regard to a dataset

Check datasets without context types:

  • Fully simulation dataset
python identifiability_check.py --data_path "dataset_fully_simulation/K=2"
  • Yahoo!LETOR
python identifiability_check.py --data_path "Yahoo_letor/tmp_data"
  • Check datasets with context types (see the section "Simulate context types").
python identifiability_check.py --data_path "Yahoo_letor/tmp_data" --context_path "Yahoo_letor/tmp_data/context.pkl"

Test performance on fully simulation datasets

  • Test the $K=2$ case:
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=2" --algorithm "dla"

Algorithm choices: dla / regression_em / two_tower

  • Test the $K=2$ case with node intervention:
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=2_node_intervention"
  • Test the $K=2$ case with node merging (here we merge 3 and 4 which is the best strategy, but you can try other strategies):
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=2" --node_merging_strategies "3-4"
  • Test the $K=2$ case with node intervention, but with random cost:
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=2" --random_node_intervention
  • Test another number of clicks:
python test_fully_simulation.py --data_path "dataset_fully_simulation/K=1" --number_of_clicks 10000

Simulate context types and do node merging

  • Simulate 5,000 context types on Yahoo! and write to Yahoo_letor/tmp_data/context.pkl. It will also do node merging and save the merging results.
python simulate_context_and_node_merging.py --data_path "Yahoo_letor/tmp_data" --context_path "Yahoo_letor/tmp_data/context.pkl" --n_context 5000

Test performance on semi-synthetic datasets

  • Test node merging:
python test_semi_synthetic.py --data_path Yahoo_letor/tmp_data --context_path Yahoo_letor/tmp_data/context.pkl
  • Test no identification:
python test_semi_synthetic.py --data_path Yahoo_letor/tmp_data --context_path Yahoo_letor/tmp_data/context.pkl --no_identification

Citation

Please consider citing the following paper when using our code for your application.

@inproceedings{chen2024identifiability,
  title={Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank},
  author={Mouxiang Chen and Chenghao Liu and Zemin Liu and Zhuo Li and Jianling Sun},
  booktitle={Proceedings of the 41st International Conference on Machine Learning},
  year={2024}
}

ultr-identifiability's People

Contributors

keytoyze avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.