Source: TabTransformer: Tabular Data Modeling Using Contextual Embeddings
Table of content
Tabular data plays a pivotal role in many Kaggle competitions, highlighting the need for a versatile framework that integrates various architectures tailored for such datasets.
Since the revolutionary "Attention Is All You Need" paper, Transformer-based models have demonstrated exceptional generalization capabilities across numerous domains, including computer vision (CV) and natural language processing (NLP). Our goal is to harness these capabilities for tabular data.
Despite the existence of Transformer-based frameworks for tabular data, we observe a scarcity in PyTorch-based implementations. Furthermore, many existing APIs fall short in providing satisfactory coding practices, and end-to-end frameworks remain nearly nonexistent. Although challenging, we believe it's a worthwhile endeavor to explore.
-
TabularTransformer Source: TabTransformer: Tabular Data Modeling Using Contextual Embeddings
-
FeatureTokenizerTransformer Source: Revisiting Deep Learning Models for Tabular Data
Provides a PyTorch-compatible dataset implementation for streamlined data handling.
-
train
,inference
Essential functions for training models and making predictions. -
seed_everything
Ensures reproducibility by setting a global random seed. -
get_data
,get_dataset
,get_data_loader
Includes functions for efficient data manipulation. -
plot_learning_curve
Visualizes the training and validation loss over epochs. -
to_submission_csv
Facilitates the creation of submission files for Kaggle competitions.
Introduces custom metrics specifically designed for Kaggle competitions.
Detailed examples demonstrating the usage of our models can be found in the template directory.
For classification tasks, refer to classification.
For regression tasks, refer to regression.
We present an end-to-end, PyTorch-based Transformer framework specifically designed for tabular data. Accompanied by pre-integrated templates and functions, our framework aims to streamline your workflows without sacrificing flexibility. We believe it will prove to be a valuable asset for your data modeling tasks.
This project is licensed under the MIT License.
- Contributions are welcome! For guidelines, please refer to our contribution guide.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ล., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (Vol. 30).
- Gorishniy, Y., Rubachev, I., Khrulkov, V., & Babenko, A. (2021). Revisiting deep learning models for tabular data. In Advances in Neural Information Processing Systems (Vol. 34, pp. 18932โ18943).
- Huang, X., Khetan, A., Cvitkovic, M., & Karnin, Z. (2020). Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678.