Git Product home page Git Product logo

aisingapore / kapitan-hull Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 0.0 8.27 MB

The one-stop shop of a Cookiecutter template to spin up a working AISG project in minutes.

Home Page: https://aisingapore.github.io/kapitan-hull/

Python 1.84% HTML 97.72% Dockerfile 0.26% Makefile 0.02% Batchfile 0.03% Jupyter Notebook 0.12%
cookiecutter cookiecutter-python cookiecutter-python3 cookiecutter-template machine-learning ml mlops ai-singapore

kapitan-hull's People

Contributors

asherchewzy avatar auggie246 avatar deonchia avatar jjthia avatar ryzalk avatar siewyeng avatar syakyr avatar wennalyy avatar yu-mingyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kapitan-hull's Issues

[Bug]: Batch Inferencing's `hydra.job.chdir=True` parameter doesn't write `batch-infer-res.jsonl` into the `outputs` folder

Problem Domain

Python

Problem Brief

The section on batch inferencing would show that hydra.job.chdir=True parameter could be used to write batch-infer-res.jsonl into the outputs folder. But this doesn't seem to happen.

Steps to Reproduce

Run the following snippet from https://aisingapore.github.io/kapitan-hull/guide-for-user/09-batch-inferencing/ (with modification specified at the end of the page):

python src/batch_inferencing.py \
    hydra.job.chdir=True \
    batch_infer.model_path=$PRED_MODEL_PATH \
    batch_infer.input_data_dir="$PWD/data/batched-mnist-input-data"

Expected Result

batch-infer-res.jsonl written in outputs/<date>/<time>/ folder

Actual Result

batch-infer-res.jsonl not being saved anywhere that I know of

Logtrace

❯ python src/batch_inferencing.py hydra.job.chdir=True batch_infer.model_path=$(pwd)/models/model.pt
[2024-01-17 14:18:52,269][__main__][INFO] - Setting up logging configuration.
{"asctime": "2024-01-17T14:18:52+0800", "process": 3240, "name": "__main__", "levelname": "INFO", "message": "Loading the model..."}
{"asctime": "2024-01-17T14:18:52+0800", "process": 3240, "name": "__main__", "levelname": "INFO", "message": "Conducting inferencing on image files..."}
{"asctime": "2024-01-17T14:18:52+0800", "process": 3240, "name": "__main__", "levelname": "INFO", "message": "Batch inferencing has completed."}
{"asctime": "2024-01-17T14:18:52+0800", "process": 3240, "name": "__main__", "levelname": "INFO", "message": "Output result location: /home/nus/Codebase/aiap/aiap-15-test/aiap-dsp-mlops/outputs/2024-01-17/14-18-52/batch-infer-res.jsonl"}

[Bug]: GitLab WebUI pipeline borked due to DAG implementation

Problem Domain

GitLab

OS/Platform(s) Used

None

Problem Brief

After implementing the DAG pipelines in the template, it broke the functionality of running pipelines manually.

Steps to Reproduce

Use the web UI pipeline in Gitlab

Expected Result

Pipeline to successfully run

Actual Result

Error in running the pipeline due to the use of needs and subsequent jobs would not run if the job that is depended on doesn't require to be executed during the pipeline run.

Logtrace

No response

[Refactor]: Switch JSON replay files to YAML

Query Domain

  • Cookiecutter

Query Brief

It has been tested that YAML files can also be used as cookiecutter replay files, so this would be refactored in 0.4.0 as part of a cosmetic change to be in line with other configuration files in this repository.

High Priority Updates

  • Add YAML files documentation
  • Add Coder-specific step-by-step into the documentation
  • Change example to use Scikit-learn over Pytorch
    • Keeping Pytorch in favour of the use of checkpointing and getting them to know Pytorch more before their project phases

[Bug]: Switch from artifacts to cache for `test:conda-build` job's generated objects to be used across pipelines

Problem Domain

  • Gitlab

OS/Platform(s) Used

  • None

Problem Brief

Currently the test:conda-build section only saves the conda environment as artifacts, which only saves within the same pipeline. But the environment doesn't need to change unless the conda yaml file changes as well. Thus, we will test whether using cache instead of artifacts would be better suited to store the environment so that we don't have to rebuild the environment every pipeline.

Steps to Reproduce

CI/CD pipeline being triggered upon push of modified files

Expected Result

No need test-conda to run if conda yml file is unmodified.

Actual Result

test-conda has to run otherwise pylint-pytest job will fail.

[Bug]: Default issue tracker template for Gitlab to not be the default template used

Problem Domain

Gitlab

OS/Platform(s) Used

None

Problem Brief

Having the current issue template to be the default in Gitlab breaks some conventions with regards to how the issue tracker is being used across different teams. So it has to be that the issue template should be an option instead of being the default.

Steps to Reproduce

  • Create a new issue in Gitlab after creation of a repository using this template

Expected Result

Empty template

Actual Result

Filled template with it being the default

[Bug]: For `cv` problem template, removing `cpuonly` package doesn't enable GPU; CPU and GPU images does not work as intended

Problem Domain

  • Docker
  • Python

OS/Platform(s) Used

  • None

Problem Brief

The cv problem template is meant to be used as an example for AIAP MLOps Week. And within the week, GPUs are not given to be used, so this issue would fly under the radar during the session. However, if GPUs are used for the guide, just removing the cpuonly package doesn't make that happen as the Pytorch version installed uses a build that only uses CPU even if GPU and and CUDA are available within the container. Thus, the current Dockerfile used to create GPU images is redundant and bloated since the Pytorch package wouldn't use the GPUs attached to the container.

Steps to Reproduce

  • Attach GPUs to the container created using the *-gpu.Dockerfile with the cv problem template
  • Run Python/iPython within the container
  • Run import torch; torch.cuda.is_available()

Expected Result

true

Actual Result

false

[Feature]: To remove Pytorch example and separate it as an example section in the documentation

Query Brief

The Pytorch code would be pulled out from the codebase and to be downloaded separately and replace the template generation. There should not be any errors if the prompts are filled in correctly that are not to be personalised (Docker registry name, author name, project name, etc.). This is so that the template becomes package-agnostic, and hopes to reduce any confusion following the guide.

Tasks

  • Include post-gen-project hook to include source codes for example problem(s)
    • CV problem w. mnist dataset
  • Write up base template code
  • Change the guide site since it's different for each problem statement
  • Test the base template
  • Write up the underlying changes made with PR #24 on top of the feature that is to be implemented from this issue tracker

[Feature]: CI/CD scaffolding

  • Move PVC creation section to Kapitan Hull Admin
  • Banners generated based on what platform/orchestrator is used
  • The sample GitHub/GitLab pages to be replaced by something other than the cookiecutter placeholders
  • CI/CD for the generated template
    • Add back the UI component to interact with the model (Stick to Streamlit first, Gradio later)
    • Depending on the orchestrator, set the CI job to manually run a job to process data/train model/deploy UI instead of relying on commands
      • This assumes a couple of variables are set up on the template repo for it to run properly; to be
  • CI/CD for the main repo
    • Set up web pipeline to populate the 100E projects directly from the repo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.