Git Product home page Git Product logo

shifts's Introduction

img

Shifts Challege

This repository contains data readers and examples for the three tracks of the Shifts Dataset and the Shifts Challenge.

The Shifts Dataset contains curated and labelled examples of real, 'in-the-wild' distributional shift across three large-scale tasks. Specifically, it contains a tabular weather prediction task, machine translation, and Vehicle Motion Prediction. Dataset shift is ubiquitous in all of these tasks and modalities. The dataset, assessment metrics and benchmark results are detailed in our associated paper: Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks

If you use the Shifts Dataset in your work, please cite our paper using the following Bibtex:

@article{shifts2021,
  author    = {Malinin, Andrey and Band, Neil and Ganshin, Alexander, and Chesnokov, German and Gal, Yarin, and Gales, Mark J. F. and Noskov, Alexey and Ploskonosov, Andrey and Prokhorenkova, Liudmila and Provilkov, Ivan and Raina, Vatsal and Raina, Vyas and Roginskiy, Denis and Shmatova, Mariya and Tigar, Panos and Yangel, Boris},
  title     = {Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks},
  journal   =  {arXiv preprint arXiv:2107.07455},
  year      = {2021},
}

If you have any questions about the Shifts Dataset, the paper or the benchmarks, please contact [email protected] .

Dataset Download And Licenses

License

The Shifts dataset is released under a mixed license.

Weather Prediction

The Shifts Weather Prediction Dataset is released under CC BY NC SA 4.0 license. This dataset was constructed by combining features from publicly available weather prediction services and models. Specifically, we combined data from NOAA/NWS servers, data generated by WRF model from NCAR/UCAR, and data from Meteorological Service of Canada. Ground station readings were taken from [NOAA] (https://www.weather.gov/disclaimer). The data was cleaned and features standardized.

Machine Translation

The Shifts Machine Translation Dataset is released under a mixed license.

GlobalVoices evaluation data is released under CC BY NC SA 4.0.

The english source data was taken from GlobalVoices (originally licenced under CC BY 3.0) and target Russian translations provided by Yandex in-house professional translators.

The source-side text for the Reddit development and evaluation datasets exist under terms of the Reddit API. The target side Russian sentences were obtained by Yandex via in-house professional translators and are released under CC BY NC SA 4.0. We highlight that the development set source sentences are the same ones as used in the MTNT dataset.

Motion Prediction

Shifts SDC Motion Prediction Dataset is released under CC BY NC SA 4.0 license.

Download links

As the Shifts Challenge is currently underway, we are only releasing the full training and development sets of the canonical partition for all tasks of the Shift Dataset, as detailed in our paper. Evaluation data without ground-truth labels or metadata will be released on October 17th 2021. The evaluation data labels and ground-truth predictions, as well as the full Shifts Dataset, will become availabe on November 1st 2021, after the Shifts Challenge concludes.

By downloading the Shifts Dataset, you automatically agree to the linceses described above.

Weather Prediction

Canonical parition of the training and development data can be downloaded here.

Machine Translation

The development data can be downloaded here. The training data for this task if the WMT'20 En-Ru dataset. It can be downloaded via the scripts provided here.

Motion Prediction

Canonical parition of the training and development data can be downloaded here.

shifts's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.