Git Product home page Git Product logo

tda_clustering's Introduction

tda_clustering

Actions Status Actions Status wemake-python-styleguide

Overview

This is a project to predict optimal clustering algorithm and quality metric for a given dataset using topological data analysis features.

Main idea is to train meta-algorithm on classification datasets to create prediction for unknown data.

How to install dependencies

Declare any dependencies in requirements.txt for pip installation.

To install them, run:

pip install -r requirements.txt

Project Organization

├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── get_openml_data.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   ├── calcers    <- Calcers realization for features
│   │   │   ├── base_calcer.py
│   │   │   ├── mapper_features.py
│   │   │   ├── stats_features.py
│   │   │   ├── target_features.py
│   │   │   └── tda_features.py
│   │   ├── tda_feature_engineering.py
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models
│   │   ├── feature_importances.py
│   │   └── train_model.py
│   │
│   ├── mlflow_utils.py
│   └── utils.py
├── tests              <- unit tests
│   ├── conftest.py
│   └── tests.py
└── test_environment.py

How to run

First of all, run MLflow. You can specify backend-store-uri and default-artifact-root as you wish:

mlflow server --backend-store-uri sqlite:///mlruns.db --default-artifact-root artifacts

To collect OpenML meta-data run

src/data/get_openml_data.py --config config/params.yaml

To collect OpenML datasets and build processed data run:

src/features/build_features.py --config config/params.yaml

To train model run:

src/models/train_model.py --config config/params.yaml

Project based on the cookiecutter data science project template. #cookiecutterdatascience

tda_clustering's People

Contributors

pacifikus avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.