Git Product home page Git Product logo

tencentmusic / cube-studio Goto Github PK

View Code? Open in Web Editor NEW
2.5K 67.0 472.0 136.25 MB

cube studio开源云原生一站式机器学习/深度学习AI平台,支持sso登录,多租户/多项目组,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,支持国产cpu/gpu/npu芯片,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/spark/ray/volcano分布式

License: Other

Dockerfile 0.29% Shell 0.62% Python 26.65% HTML 1.35% JavaScript 0.93% TypeScript 7.72% Mako 0.01% CSS 0.42% Jupyter Notebook 60.19% Less 1.62% Mustache 0.19% Smarty 0.02%
kubernetes inference mlops workflow ai pytorch spark argo kubeflow automl

cube-studio's People

Contributors

674345386 avatar cdllp2 avatar chendile avatar clementine124 avatar colorfuldick avatar cyxnzb avatar data-infra avatar datascientistsamchan avatar ferdinandward avatar goldworker avatar gxin0426 avatar harry201706 avatar jacktao007 avatar jlwll avatar kalenhaha avatar ldd91 avatar lkad avatar nowbug avatar nutsjian avatar paopjian avatar stewart482 avatar winifred43 avatar xiaoyangmai avatar yanghua avatar yann-su avatar zhangchunsheng avatar zhuyaguang avatar znanjie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cube-studio's Issues

能否有一个一键部署的docker compse

这个项目看起来很棒,能够实现数据处理、建模、分析流水线操作,但是目前看部署起来还是挺复杂的,能否简化部署流程,直接docker compse up就行,最好能有个在线体验地址~

提个小建议,希望作者能更新cuda11的镜像

不知道大家使用的cuda是啥版本的,我的机器用的cuda是11.x版本的,所以在部署notebook节点的时候会有些问题
主要问题出现在会检测不到显卡、调用GPU训练的时候会提示缺少与cuda11有关的文件,以及cuDnn文件缺少
然后每次都要重复造轮子去给拉的节点重新安装cuda11,再装cuDnn,比较繁琐
自己制造镜像的话,体积过于庞大,如果没有经验的话可能比较麻烦。

can add claimRef in PV?

pv can not Bound to pvc ,add claimRef can solve it !

sometime i add storageClassName: local in pv,but Sometimes it works, sometimes it doesn't, I'm confused。

image

some namespaces without pods

in create_ns_secret.sh
for namespace in 'infra' 'kubeflow' 'istio-system' 'knative-serving' 'pipeline' 'katib' 'jupyter' 'kfserving' 'service' 'pre-service' 'cert-manager' 'monitoring' 'logging' 'kube-system' 'volcano-system'

katib jupyter kfserving pre-service cert-manager logging

this namespaces without pods

在平台部署notebook直接提示unkown,选择reset会报错

版本是2022年6月16日凌晨拉的master分支最新版本
单机部署
在k8s看所有节点状态正常
操作触发条件:
{
1.创建新的notebook项目
2.发现状态unkown后,点击reset触发
}
微信图片_20220616090308
微信图片_20220616090637
微信图片_20220616090645
点击“名称”直接提示404页面

有个小小的建议。A little suggestion

作为一个全功能性的平台,安全性应该是很重要的,毕竟在平台的使用中,很多功能都要牵涉到多机器的集群式部署、计算等等,有些甚至需要内外网调用,因此,个人拙见作者可以在相关功能上,做一些敏感操作预警
例如:

  1. 登录的时候对 登陆地点,时间进行比对
  2. 训练任务发布的时候进行比对
  3. 修改密码的时候身份比对
  4. 敏感操作可以绑定社交工具或者短信sms进行提醒等等

As a full-featured platform, security should be very important. After all, in the use of the platform, many functions involve multi-machine cluster deployment, computing, etc., and some even require internal and external network calls. Therefore, personal In my humble opinion, the author can do some sensitive operation warning on related functions.
E.g:

  1. When logging in, compare the login location and time
  2. Compare when the training task is released
  3. Identity comparison when changing the password
  4. Sensitive operations can be bound to social tools or SMS for reminders, etc.

腾讯云单机部署碰到的Bug

  1. 腾讯云的机器连不上的谷歌镜像源,如果脚本中缺少某个镜像,可用的镜像源中拉取不到,就会失败,比如我这里是kubeflow-prometheus-adapter;
  2. start.sh脚本中需要下载kfctl,需要连接外网下载,腾讯云服务器无法实现,进而kubeflow基础组件无法安装;
  3. 腾讯云CVM是双网卡,有内网和外网ip,k8s的配置中要使用内网ip,进入浏览器界面需要外网ip;
  4. 部署cube-studio前需要先部署对应版本的docker和k8s,如果你是用rancher来部署k8s,记得使用v2.5.2的rancher,而不是latest,docker的部署也最好按照官网的来,先装载好仓库,否则容易失败;
  5. 腾讯云单机部署cube-studio经验贴:https://blog.csdn.net/weixin_39750084/article/details/124986488?spm=1001.2014.3001.5502。

mysql error

工作负载: mysql
show ReplicaSet "mysql-69b7f785c9" has timed out progressing.

ImagePullBackOff: Back-off pulling image "mysql:5.7"

how to fix it

更新了最新版后还是会拉取镜像失败

Failed to pull image "ai.tencentmusic.com/tme-public/notebook:jupyter-ubuntu-cpu-1.0.0": rpc error: code = Unknown desc = Error response from daemon: pull access denied for ai.tencentmusic.com/tme-public/notebook, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

在jupyter lab中加入大数据功能

是否有计划在jupyter lab中加入大数据处理spark组件呢?通过livy server打通数据交换的流程。

p.s.如何加入你们开发呢?

任务部署

可以使用win11部署这个到本地吗,docker desktop可以一键部署k8s,linux会容易些。

dataset+job-template+pipeline+inference demo

视觉:yolo相关模型、darknet相关模型、PaddleSeg 图像分割,orc相关模型,等训练和推理支持

语音:wenet语音识别的训练和推理支持。

推荐:bin算法,deepfm,ple等算法的训练和推理服务支持

文本: bert框架模型的训练和推理支持

内网穿透部署无法访问notebook

采用的是frp进行内网穿透,目前只能打得开平台,打不开创建的notebook,直接是404,请问该怎么办,可否给予简单明了的解决办法,感谢!
frp我只穿透了80端口

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.