Git Product home page Git Product logo

mlab's Introduction

MLab

“云上炼丹师”中的云。

MLab构成

MLab是为云上炼丹师服务的云基础设施。由两个部分组成:

  • MLab HomePod,迄今为止最先进的容器化PyTorch训练环境。
  • MLab RookPod,迄今为止最先进的成本10万人民币以下存储解决方案。

以上两个MLab组件均为独立的产品,可以单独使用docker进行部署,也可以使用k8s进行部署。

MLab HomePod

迄今为止最先进的容器化PyTorch训练环境。

支持如下的软硬件平台:

MLab HomePod以Docker image形式(遵循OCI规范的image)封装,是我们的深度学习训练环境。目前最新版本为2.0,分为标准版和pro版本。规格如下:

参数项 MLab HomePod 2.0 MLab HomePod 2.0 pro
镜像 gemfield/homepod:2.0 gemfield/homepod:2.0-pro
OS Ubuntu 20.04 Ubuntu 20.04
PyTorch 1.9.0 1.9.0
PyTorch CUDA运行时 11.1 11.1
PyTorch CUDNN运行时 8.0.5 8.0.5
torchvision 0.10.0 0.10.0
torchaudio 0.9.0a0+33b2469 0.9.0a0+33b2469
torchtext 0.10.0 0.10.0
Conda 4.10.1 4.10.1
Python 3.8.8 3.8.8
numpy 1.20.2 1.20.2
cv2 4.5.2 4.5.2
onnx 1.8.1 1.8.1
g++ 9.3.0 9.3.0
cmake 3.16.3 3.16.3
KDE Plasma 5.22.1 5.22.1
KDE Framework 5.83.0 5.83.0
时区 ** **
protobuf-dev 3.6.1.3 3.6.1.3
protobuf python包 3.17.3 3.17.3
pybind11-dev 2.4.3 2.4.3
xrdp 0.9.12 0.9.12
tigervnc 1.10.1 1.10.1
VS CODE IDE 1.57.1 1.57.1
Firefox 89.0.1 89.0.1
中文输入法 IBus sunpinyin 2.0.3 IBus sunpinyin 2.0.3
coremltools 4.1 4.1
NCNN转换工具 20210525 20210525
TNN转换工具 0.3.0 0.3.0
MNN转换工具 1.2.0 1.2.0
tensorrt(转换工具) 8.0.0.3
libboost-dev 1.71.0
CUDA开发库 11.2.2
CUDNN开发库 8.1.1
MKL静态库 2020.4-912
pycuda包 2020.1
gemfield版pytorch 1.9.0
opencv4deepvac 4.4.0
libtorch静态库 1.9.0
deepvac项目 /opt/gemfield/deepvac
libdeepvac项目 /opt/gemfield/libdeepvac

除了这些核心软件,MLab HomePod还有如下鲜明特色:

  • 无缝使用DeepVAC规范;
  • 无缝构建和测试libdeepvac;
  • 包含有kdiff3、kompare、kdenlive、Dolphin、Kate、Gwenview、Konsole等诸多工具。

另外,标准版和pro版内容完全一致,除了pro版本增加了如下内容:

  • tensorrt python包,可以用来将PyTorch模型转换为TensorRT模型;
  • libboost-dev,用于C++开发者;
  • CUDA开发库,用于基于cuda的开发,或者pytorch的源码编译;
  • MKL静态库,用于基于mkl的开发,或者libtorch的静态编译;
  • pycuda python包,用于运行TensorRT模型;
  • gemfield版pytorch,基于master分支构建的pytorch python包,设置export PYTHONPATH=/opt/gemfield环境变量后来使用(从而覆盖掉标准路径下的标准版pytorch);
  • opencv4deepvac,opencv 4.4的静态库,为libdeepvac项目而生。路径为/opt/gemfield/opencv4deepvac
  • libtorch静态库,LibTorch静态库,为libdeepvac项目而生。路径为/opt/gemfield/libtorch
  • deepvac项目,https://github.com/DeepVAC/deepvac 项目在本地的克隆;
  • libdeepvac项目,https://github.com/DeepVAC/libdeepvac 项目在本地的克隆。

为了支持上述功能,pro版本的镜像足足增加了10个GB。也正是因为此,homepod从2.0版本开始拆分成了标准版和pro版。

1. 部署

MLab HomePod有三种部署方式:

  1. 纯粹的Docker命令行方式,部署且运行后只能在命令行里工作。
#有cuda设备
docker run --gpus all -it --entrypoint=/bin/bash gemfield/homepod:2.0
#没有cuda设备
docker run -it --entrypoint=/bin/bash gemfield/homepod:2.0
  1. 图形化的Docker部署方式,部署后可以在vnc客户端、rdp客户端、浏览器中访问图形界面。
#有cuda设备
docker run --gpus all -it -eGEMFIELD_MODE=VNCRDP -p 3389:3389 -p 7030:7030 -p 5900:5900 -p 20022:22 gemfield/homepod:2.0
#没有cuda设备
docker run -it -eGEMFIELD_MODE=VNCRDP -p 3389:3389 -p 7030:7030 -p 5900:5900 -p 20022:22 gemfield/homepod:2.0

参数中的端口号用途:

端口号 协议 用途
3389 rdp 用于rdp客户端,Windows远程桌面连接客户端
5900 vnc 用于vnc客户端
7030 http 用于浏览器
20022 ssh 用于ssh客户端、sftp客户端、KDE Dolphin、vscode remote ssh等

注意,当使用vscode remote ssh功能时,首先在vscode上新建ssh target,然后在"Enter SSH Connection Command"输入框中输入:

ssh -p 20022 gemfield@<your_host_running_mlab_homepod>

密码输入:deepvac

  1. k8s集群部署方式(需要k8s集群运维经验,适合团队的协作管理)。请访问基于k8s部署HomePod以获得更多部署信息。

2. 登录

三种部署方式中的第一种无需赘述,使用docker exec -it登录即可。后两种部署成功后使用图形界面进行登录和使用。支持如下使用方式:

3. 账户信息

MLab HomePod默认提供了如下账户:

  • 用户:gemfield
  • 密码:deepvac
  • HOME:/home/gemfield

如果要改变该默认行为,可以在docker命令行上(或者k8s yaml中)注入以下环境变量:

  • DEEPVAC_USER=<my_name>
  • DEEPVAC_PASSWORD=<my_password>
  • HOME=<my_home_path>

4. 账户安全

为了安全,用户在初始登录MLab HomePod后,最好使用passwd命令来修改账户密码。并在日常使用中,做到离开电脑5分钟以上手工锁定屏幕(KDE -> Leave -> Lock(Lock screen))。

MLab RookPod

迄今为止最先进的成本10万人民币以下存储解决方案。 (待补充)

mlab's People

Contributors

gemfield avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mlab's Issues

[bug] HomePod上import onnx_coreml出错

>>> import onnx_coreml
WARNING:root:scikit-learn version 0.24.1 is not supported. Minimum required version: 0.17. Maximum required version: 0.19.2. Disabling scikit-learn conversion API.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/onnx_coreml/__init__.py", line 6, in <module>
    from .converter import convert
  File "/opt/conda/lib/python3.8/site-packages/onnx_coreml/converter.py", line 35, in <module>
    from coremltools.converters.nnssa.coreml.graph_pass.mlmodel_passes import remove_disconnected_layers, transform_conv_crop

有人遇到过这个问题了:onnx/onnx-coreml#585

k8s不支持xrdp

HomePod 2.0添加了rdp支持,在docker上测试OK,但在k8s集群上无法连接。

[调整] 减小MKL静态库的大小

rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/bin
rm -rf /opt/intel/conda_channel
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/benchmarks
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/examples
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/*.so
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_blas*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_blacs*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_gf*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_intel_thread.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_pgi_thread.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_tbb_thread.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_intel_ilp64.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_lapack95*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_cdft_core.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_scalapack*

在MLab HomePod 2.0 pro上编译基于pybind11的程序出错

错误信息:

pydeepvac.cpp:37:53: error: no matching function for call to ??pybind11::class_<deepvac::SyszuxVisionTerror>::def(const char [8], <unresolved overloaded function type>)??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1315:13: note: candidate: ??template<class Func, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const char*, Func&&, const Extra& ...) [with Func = Func; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1315 |     class_ &def(const char *name_, Func&& f, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1315:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   couldn??t deduce template parameter ??Func??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1333:13: note: candidate: ??template<pybind11::detail::op_id id, pybind11::detail::op_type ot, class L, class R, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const pybind11::detail::op_<id, ot, L, R>&, const Extra& ...) [with pybind11::detail::op_id id = id; pybind11::detail::op_type ot = ot; L = L; R = R; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1333 |     class_ &def(const detail::op_<id, ot, L, R> &op, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1333:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??const pybind11::detail::op_<id, ot, L, R>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1345:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const pybind11::detail::initimpl::constructor<Args ...>&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1345 |     class_ &def(const detail::initimpl::constructor<Args...> &init, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1345:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??const pybind11::detail::initimpl::constructor<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1351:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const pybind11::detail::initimpl::alias_constructor<Args ...>&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1351 |     class_ &def(const detail::initimpl::alias_constructor<Args...> &init, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1351:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??const pybind11::detail::initimpl::alias_constructor<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1357:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(pybind11::detail::initimpl::factory<Args ...>&&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1357 |     class_ &def(detail::initimpl::factory<Args...> &&init, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1357:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??pybind11::detail::initimpl::factory<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1363:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(pybind11::detail::initimpl::pickle_factory<Args ...>&&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1363 |     class_ &def(detail::initimpl::pickle_factory<Args...> &&pf, const Extra &...extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1363:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??pybind11::detail::initimpl::pickle_factory<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);

Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired

模型推理时被warning信息刷屏:

  • 环境 MLab HomePod 2.0 pro
  • 错误信息:
[W ___torch_mangle_579.py:74] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:115] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:156] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:196] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:228] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:260] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
no text detected
[W ___torch_mangle_579.py:74] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:115] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:156] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:196] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:228] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:260] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )

homepod:2.0-pro torch.cuda.is_available return False and throws " Error 804: forward compatibility was attempted on non supported HW "

ENV

  • 显卡型号: GeForce GTX 1650
  • 驱动版本: Driver Version: 460.80
  • CUDA版本: CUDA Version: 11.3
  • 系统: Ubuntu20.04

ERROR

  • nvidia-smi ok
  • import torch; print(torch.cuda.is_available())
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448278899/work/c10/cuda/CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0
False

[新功能] 自MLab HomePod 2.0 pro以来pytorch的更新

  • 0dc40474fe Peter Bell Tue Jul 6 19:05:39 2021 -0700 Migrate glu from the THC to ATen (CUDA) (#61153);备注:glu是GatedLinearUnit;
  • a69e947ffd Freey0 Wed Jul 7 07:42:49 2021 -0700 avg_pool3d_backward: Port to structured (#59084)
  • 45ce26c397 Xue Haotian Wed Jul 7 12:32:43 2021 -0700 Port isposinf & isneginf kernel to structured kernels (#60633)
  • baa518e2f6 Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add Int32 support for NNAPI (#59365)
  • cf285d8eea Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::slice NNAPI converter (#59364)
  • d26372794a Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::detach NNAPI converter (#58543)
  • 0be228dd5f Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::flatten NNAPI converter (#60885)
  • b297f65b66 Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::div NNAPI converter (#58541)
  • eab18a9a40 Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::to NNAPI converter (#58540)
  • 14d604a13e Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::softmax NNAPI converter (#58539)
  • 179b3ab88c Xiao Wang Wed Jul 7 20:45:42 2021 -0700 [cuDNN] Enable cudnn_batchnorm_spatial_persistent for BatchNorm3d channels_last_3d (#59129)

[新功能]为user添加sudo组

usermod -aG sudo gemfield
Defaults        env_reset
Defaults        mail_badpass
Defaults        secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"

# User privilege specification
root    ALL=(ALL:ALL) ALL
gemfield ALL=(ALL:ALL) ALL
# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL

# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) ALL

#includedir /etc/sudoers.d

HomePod:2.0 和HomePod:2.0-pro的区别

pro在标准版的基础上,增加了一些面向底层开发者的软件包:

  • boost开发库(libboost-dev libboost-filesystem-dev libboost-program-options-dev libboost-system-dev);
  • cuda开发库(pro直接基于nvidia/cuda:devel系列镜像);
  • MKL静态库(从intel仓库下载的,用于编译静态libtorch应用);
  • pycuda包,tensorrt运行时的依赖;

[新功能] xrdp在ubuntu 18.04上可以成功运行,但在ubuntu 20.04上失败了

经过调查,从log上可以看出区别:

# on ubuntu 18.04
xrdp-sesman[57]: (57)(140460571419968)[INFO ] ++ created session (access granted): username gemfield, ip ::ffff:222.128.gemfield:33552 - socket: 11
xrdp-sesman[57]: (57)(140460571419968)[INFO ] starting Xorg session...
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 8 (AF_INET6 :: port 5910)
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 8 (AF_INET6 :: port 6010)
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 8 (AF_INET6 :: port 6210)
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 7 (AF_INET6 ::ffff:127.0.0.1 port 3350)
xrdp[62]: (62)(139675925534528)[INFO ] xrdp_wm_log_msg: login successful for display 10
xrdp[62]: (62)(139675925534528)[INFO ] xrdp_wm_log_msg: login successful for display 10
xrdp-sesman[64]: (64)(140460571419968)[INFO ] calling auth_start_session from pid 64
xrdp[62]: (62)(139675925534528)[DEBUG] xrdp_wm_log_msg: started connecting
xrdp-sesman[64]: pam_unix(xrdp-sesman:session): session opened for user gemfield by (uid=0)
dbus-daemon[10]: [system] Activating service name='org.freedesktop.login1' requested by ':1.5' (uid=0 pid=64 comm="/usr/sbin/xrdp-sesman --nodaemon " label="docker-default (enforce)") (using servicehelper)
New seat seat0.
dbus-daemon[10]: [system] Activating service name='org.freedesktop.systemd1' requested by ':1.6' (uid=0 pid=66 comm="/lib/systemd/systemd-logind " label="docker-default (enforce)") (using servicehelper)
dbus-daemon[10]: [system] Successfully activated service 'org.freedesktop.login1'
dbus-daemon[10]: [system] Activated service 'org.freedesktop.systemd1' failed: Launch helper exited with unknown return code 1


# on ubuntu 20.04
xrdp-sesman[11]: (11)(140645830477376)[INFO ] ++ created session (access granted): username gemfield, ip ::ffff:222.128.gemfield:54766 - socket: 11
xrdp-sesman[11]: (11)(140645830477376)[INFO ] starting Xorg session...
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 8 (AF_INET6 :: port 5910)
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 8 (AF_INET6 :: port 6010)
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 8 (AF_INET6 :: port 6210)
xrdp[37]: (37)(139680321079104)[INFO ] xrdp_wm_log_msg: login successful for display 10
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 7 (AF_INET6 ::ffff:127.0.0.1 port 3350)
xrdp[37]: (37)(139680321079104)[DEBUG] xrdp_wm_log_msg: started connecting
xrdp-sesman[39]: (39)(140645830477376)[INFO ] calling auth_start_session from pid 39
xrdp-sesman[39]: pam_unix(xrdp-sesman:session): session opened for user gemfield by (uid=0)
dbus-daemon[9]: [system] Activating service name='org.freedesktop.login1' requested by ':1.0' (uid=0 pid=39 comm="/usr/sbin/xrdp-sesman --nodaemon " label="docker-default (enforce)") (using servicehelper)
dbus-daemon[9]: [system] Activated service 'org.freedesktop.login1' failed: Launch helper exited with unknown return code 1
xrdp-sesman[39]: pam_systemd(xrdp-sesman:session): Failed to create session: Launch helper exited with unknown return code 1
xrdp-sesman[39]: (39)(140645830477376)[DEBUG] Closed socket 6 (AF_INET6 ::ffff:127.0.0.1 port 3350)
xrdp-sesman[39]: (39)(140645830477376)[DEBUG] Closed socket 7 (AF_INET6 ::ffff:127.0.0.1 port 3350)
......
dbus-daemon[9]: [system] Activating service name='org.freedesktop.login1' requested by ':1.1' (uid=1000 pid=43 comm="/usr/lib/xorg/Xorg :10 -auth .Xauthority -config x" label="docker-default (enforce)") (using servicehelper)
dbus-daemon[9]: [system] Activated service 'org.freedesktop.login1' failed: Launch helper exited with unknown return code 1

可以看到,在ubuntu20.04上,有关于org.freedesktop.login1的dbus错误。

[新功能]安装最新protobuf

在HomePod上

#dependency
sudo apt-get install autoconf automake libtool curl make g++ unzip

git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf
git submodule update --init --recursive
bash -x ./autogen.sh

./configure
 make
 make check
make install
#refresh shared library cache.
ldconfig

MLab RookPod 1.0 计划的功能

功能

  • 分布式存储,支持文件系统、块存储、对象存储;
  • 支持热存储和冷存储;
  • 支持数据的导出和导入,实现异地备份;

硬件规格

  • 至少10G的网络交换;
  • 热存储使用ssd,冷存储使用ssd或者hdd

[bug] 运行homepod 1.1的时候报错:Error response from daemon: AppArmor enabled on system but the docker-default profile could not be loaded: strconv.Atoi: parsing "file": invalid syntax.

错误如下:

gemfield@ThinkPad-X1C:~$ docker run -it --rm -p 5900:5900 -p 7030:7030 -v /app/gemfield:/app/gemfield -v /home/gemfield/github:/home/gemfield/github gemfield/homepod:1.1 bash
docker: Error response from daemon: AppArmor enabled on system but the docker-default profile could not be loaded: strconv.Atoi: parsing "file": invalid syntax.

[优化] 改变pybind11包的安装方式

当前的安装方式为:

apt install python3-pybind11

这个包有如下依赖:

gemfield@ThinkPad-X1C:~$ apt show python3-pybind11
Package: python3-pybind11
Version: 2.5.0-5
Source: pybind11
Origin: Ubuntu
Installed-Size: 610 kB
Depends: python3:any, pybind11-dev (= 2.5.0-5)
Recommends: python3-numpy
Homepage: https://github.com/pybind/pybind11
Download-Size: 113 kB
APT-Manual-Installed: yes
APT-Sources: http://archive.ubuntu.com/ubuntu groovy/universe amd64 Packages
......

导致实际会安装如下3个deb包:

  • python3-pybind11
  • pybind11-dev
  • python3

而这并不是必须的。

其中,python3-pybind11包含的文件如下:

/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/dependency_links.txt
/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/top_level.txt
/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/not-zip-safe
/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/PKG-INFO
/usr/lib/python3/dist-packages/pybind11/_version.py
/usr/lib/python3/dist-packages/pybind11/__main__.py
/usr/lib/python3/dist-packages/pybind11/__init__.py
/usr/lib/python3/dist-packages/pybind11/include/pybind11/cast.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/complex.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/buffer_info.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/common.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/operators.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/functional.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/attr.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/pytypes.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/embed.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/eigen.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/stl.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/eval.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/iostream.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/numpy.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/pybind11.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/options.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/chrono.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/stl_bind.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/common.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/typeid.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/internals.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/init.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/descr.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/class.h

pybind11-dev包含的文件如下:

/usr/lib/cmake/pybind11/pybind11Targets.cmake
/usr/lib/cmake/pybind11/pybind11Tools.cmake
/usr/lib/cmake/pybind11/FindPythonLibsNew.cmake
/usr/lib/cmake/pybind11/pybind11ConfigVersion.cmake
/usr/lib/cmake/pybind11/pybind11Config.cmake
/usr/share/doc/python3-pybind11/copyright
/usr/share/doc/pybind11-dev/copyright
/usr/share/doc/pybind11-dev/changelog.Debian.gz
/usr/include/pybind11/cast.h
/usr/include/pybind11/complex.h
/usr/include/pybind11/buffer_info.h
/usr/include/pybind11/common.h
/usr/include/pybind11/operators.h
/usr/include/pybind11/functional.h
/usr/include/pybind11/attr.h
/usr/include/pybind11/pytypes.h
/usr/include/pybind11/embed.h
/usr/include/pybind11/eigen.h
/usr/include/pybind11/stl.h
/usr/include/pybind11/eval.h
/usr/include/pybind11/iostream.h
/usr/include/pybind11/numpy.h
/usr/include/pybind11/pybind11.h
/usr/include/pybind11/options.h
/usr/include/pybind11/chrono.h
/usr/include/pybind11/stl_bind.h
/usr/include/pybind11/detail/common.h
/usr/include/pybind11/detail/typeid.h
/usr/include/pybind11/detail/internals.h
/usr/include/pybind11/detail/init.h
/usr/include/pybind11/detail/descr.h
/usr/include/pybind11/detail/class.h

[新功能] 自HomePod 1.0以来pytorch的更新

  • fd02fc5d715a7647631c5806db736794edc2a52f: Port put_ and take from TH to ATen
  • 4170a6cc24c6867ca6cd48f5581e98a4be89e593: Migrate mode from TH to ATen
  • 6866c033d5aa134a83bc1cb84e3e084a7329167f: [JIT] Add recursive scripting for class type module attributes

AI推理部署环境最低要求

宿主机:

  • x86_64 cpu;
  • Linux;
  • 8G RAM;
  • 4G CUDA RAM(图灵架构或以上);
  • nvidia-driver 450+;
  • docker 19.3+;
  • NVIDIA Container Toolkit;

每个AI推理能力:

  • 4G CUDA RAM;
  • 4G RAM;
  • 4 cpu cores.

[bug] HomePod上apt update出现错误:KeyError: 'suite'

错误如下所示:

gemfield@bd0d4c6acd4c:/etc/apt$ sudo apt update
Hit:1 http://dl.google.com/linux/chrome/deb stable InRelease
Hit:2 http://packages.microsoft.com/repos/code stable InRelease                                                      
Hit:3 https://packages.microsoft.com/repos/vscode stable InRelease                                                   
Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [109 kB]                                            
Hit:5 http://archive.ubuntu.com/ubuntu focal InRelease                                                               
Hit:6 http://ppa.launchpad.net/kdenlive/kdenlive-stable/ubuntu focal InRelease                                       
Hit:7 https://apt.repos.intel.com/mkl all InRelease                                                  
Get:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]                              
Get:9 http://archive.neon.kde.org/user focal InRelease [166 kB]                           
Get:10 http://security.ubuntu.com/ubuntu focal-security/main amd64 DEP-11 Metadata [28.5 kB]        
Get:11 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Get:12 http://security.ubuntu.com/ubuntu focal-security/universe amd64 DEP-11 Metadata [71.3 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 DEP-11 Metadata [365 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-updates/main DEP-11 64x64 Icons [87.9 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 DEP-11 Metadata [411 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 DEP-11 Metadata [2,540 B]
Get:17 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 DEP-11 Metadata [1,765 B]
Fetched 1,457 kB in 4s (386 kB/s)                                              
Traceback (most recent call last):
  File "/usr/lib/cnf-update-db", line 26, in <module>
    col.create(db)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 94, in create
    self._fill_commands(con)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 138, in _fill_commands
    self._parse_single_commands_file(con, fp)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 176, in _parse_single_commands_file
    suite=tagf.section["suite"]
KeyError: 'suite'
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 153, in apport_excepthook
    with os.fdopen(os.open(pr_filename,
FileNotFoundError: [Errno 2] No such file or directory: '/var/crash/_usr_lib_cnf-update-db.0.crash'

Original exception was:
Traceback (most recent call last):
  File "/usr/lib/cnf-update-db", line 26, in <module>
    col.create(db)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 94, in create
    self._fill_commands(con)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 138, in _fill_commands
    self._parse_single_commands_file(con, fp)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 176, in _parse_single_commands_file
    suite=tagf.section["suite"]
KeyError: 'suite'
Reading package lists... Done

MLab HomePod上onnx转TNN报错:ImportError: cannot import name 'optimizer' from 'onnx' (/opt/conda/lib/python3.8/site-packages/onnx/__init__.py)

转换命令如下(开启了-optimize=1 开关):

python3 onnx2tnn.py /app/gemfield/onnxmodels/v2.onnx -version=v1.0 -optimize=1 -half=0 -o /app/gemfield/onnxmodels

然后报错:

Traceback (most recent call last):
  File "onnx2tnn.py", line 41, in do_optimize
    import onnx2tnn.onnx_optimizer.onnx_optimizer as opt
ModuleNotFoundError: No module named 'onnx2tnn.onnx_optimizer'; 'onnx2tnn' is not a package

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "onnx2tnn.py", line 148, in <module>
    main()
  File "onnx2tnn.py", line 120, in main
    do_optimize(onnx_net_path, input_shape)
  File "onnx2tnn.py", line 43, in do_optimize
    import onnx_optimizer.onnx_optimizer as opt
  File "/app/gemfield/github/TNN/tools/onnx2tnn/onnx-converter/onnx_optimizer/onnx_optimizer.py", line 8, in <module>
    from onnx import optimizer
ImportError: cannot import name 'optimizer' from 'onnx' (/opt/conda/lib/python3.8/site-packages/onnx/__init__.py)

[新功能] 添加faiss

RUN git clone https://github.com/facebookresearch/faiss && \
    cd faiss && \
    mkdir build && \
    cd build && \
    cmake .. && \
    make VERBOSE=1 && \
    make install && \
    cd faiss/python && \
    python setup.py install && \
    cd ../../ && \
    make clean

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.