aisingapore / kapitan-hull Goto Github PK
View Code? Open in Web Editor NEWThe one-stop shop of a Cookiecutter template to spin up a working AISG project in minutes.
Home Page: https://aisingapore.github.io/kapitan-hull/
The one-stop shop of a Cookiecutter template to spin up a working AISG project in minutes.
Home Page: https://aisingapore.github.io/kapitan-hull/
Python
The section on batch inferencing would show that hydra.job.chdir=True
parameter could be used to write batch-infer-res.jsonl
into the outputs
folder. But this doesn't seem to happen.
Run the following snippet from https://aisingapore.github.io/kapitan-hull/guide-for-user/09-batch-inferencing/ (with modification specified at the end of the page):
python src/batch_inferencing.py \
hydra.job.chdir=True \
batch_infer.model_path=$PRED_MODEL_PATH \
batch_infer.input_data_dir="$PWD/data/batched-mnist-input-data"
batch-infer-res.jsonl
written in outputs/<date>/<time>/
folder
batch-infer-res.jsonl
not being saved anywhere that I know of
❯ python src/batch_inferencing.py hydra.job.chdir=True batch_infer.model_path=$(pwd)/models/model.pt
[2024-01-17 14:18:52,269][__main__][INFO] - Setting up logging configuration.
{"asctime": "2024-01-17T14:18:52+0800", "process": 3240, "name": "__main__", "levelname": "INFO", "message": "Loading the model..."}
{"asctime": "2024-01-17T14:18:52+0800", "process": 3240, "name": "__main__", "levelname": "INFO", "message": "Conducting inferencing on image files..."}
{"asctime": "2024-01-17T14:18:52+0800", "process": 3240, "name": "__main__", "levelname": "INFO", "message": "Batch inferencing has completed."}
{"asctime": "2024-01-17T14:18:52+0800", "process": 3240, "name": "__main__", "levelname": "INFO", "message": "Output result location: /home/nus/Codebase/aiap/aiap-15-test/aiap-dsp-mlops/outputs/2024-01-17/14-18-52/batch-infer-res.jsonl"}
GitLab
None
After implementing the DAG pipelines in the template, it broke the functionality of running pipelines manually.
Use the web UI pipeline in Gitlab
Pipeline to successfully run
Error in running the pipeline due to the use of needs
and subsequent jobs would not run if the job that is depended on doesn't require to be executed during the pipeline run.
No response
Add a code profiler log the speed of every process for transparency sake
Example of a Python profiler:
It has been tested that YAML files can also be used as cookiecutter replay files, so this would be refactored in 0.4.0 as part of a cosmetic change to be in line with other configuration files in this repository.
Currently mlflow_test.py
is written independently of the rest of the scripts. This refactoring would use general_utils.mlflow_init
function to test the use of that function on top of testing the connection to the MLFlow server created/given.
While experience users can infer the commands to run the scripts locally, this is so that the guide is more explicit on this.
Exploring various technologies to incorporate into the Kapitan Hull stack:
Currently the test:conda-build
section only saves the conda environment as artifacts, which only saves within the same pipeline. But the environment doesn't need to change unless the conda yaml file changes as well. Thus, we will test whether using cache
instead of artifacts
would be better suited to store the environment so that we don't have to rebuild the environment every pipeline.
CI/CD pipeline being triggered upon push of modified files
No need test-conda to run if conda yml file is unmodified.
test-conda has to run otherwise pylint-pytest job will fail.
To look into alternatives for .ipynb
files as notebook changes often introduces merge headaches.
Considerations
Kubernetes
To implement Polyaxon section for legacy/open-source purposes
Gitlab
None
Having the current issue template to be the default in Gitlab breaks some conventions with regards to how the issue tracker is being used across different teams. So it has to be that the issue template should be an option instead of being the default.
Empty template
Filled template with it being the default
The cv
problem template is meant to be used as an example for AIAP MLOps Week. And within the week, GPUs are not given to be used, so this issue would fly under the radar during the session. However, if GPUs are used for the guide, just removing the cpuonly
package doesn't make that happen as the Pytorch version installed uses a build that only uses CPU even if GPU and and CUDA are available within the container. Thus, the current Dockerfile used to create GPU images is redundant and bloated since the Pytorch package wouldn't use the GPUs attached to the container.
*-gpu.Dockerfile
with the cv
problem templateimport torch; torch.cuda.is_available()
true
false
To align and integrate pre-commit hooks as part of the base Kapitan Hull template.
ml-project-cookiecutter-gcp
has various features that the current Kapitan Hull doesn't have:
Thus, there's still room to reach parity from prior prototype versions of Kapitan Hull.
To create a problem template specifically for time series problems in reducing the time taken to generate a working pipeline.
The Pytorch code would be pulled out from the codebase and to be downloaded separately and replace the template generation. There should not be any errors if the prompts are filled in correctly that are not to be personalised (Docker registry name, author name, project name, etc.). This is so that the template becomes package-agnostic, and hopes to reduce any confusion following the guide.
post-gen-project
hook to include source codes for example problem(s)
Moved to GitLab: https://gitlab.aisingapore.net/mlops/kapitan-hull/-/issues/5
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.