Git Product home page Git Product logo

recognize_docker's Introduction

recognize_docker

Recognize by marcelklehr (https://github.com/marcelklehr) put into Docker Big kudos o to Marcel for comingup with such a nice piece of software as well as quickly reacting to an ask to add GPU support!

Speed diff? ~ca. 25x (yes, 25 times quicker - average based on two systems testing):

  • Intel© Xeon© CPU E3-1505M v6 @ 3.00GHz × 4, 32GB ram, Quadro M1200 Mobile (GM107GLM)
  • Intel© Core© i7-4720HQ CPU @ 2.60GHz x 4, 16GB ram, GeForce GTX 960M (GM107M) How was that measured? Single runs in average take similar amount of time (this could be completely wrong way of measuring it, though I can see and touch the difference without advanced metrics). Any suggestion on how to exactly measure it as well as tests output are more than welcome.

Dockers for Recognize with GPU support (git master branch based).

Three options available:

  • Debian 12 + additional repo + pip - as close as possible to Debian based repos/binaries,
  • Debian 12 + all rest added as additional items (MiniConda, Node, Pip),
  • nVIDIA TensorFlow Docker image based (nvcr.io/nvidia/tensorflow:22.03-tf2-py3 as of 2023.10.20).

In all cases resulting docker image is heavy due to Cuda/CudNN, Tensorflow/TensorRT and Recognize models included together with Nextcloud source.

Pre-reqs:

  • nVIDIA GPU
  • drivers enabled at host level (nvidia-smi required to work properly)
  • docker with nvidia-toolkit enabled (to expose GPU to containers) - info can be found here: https://github.com/NVIDIA/nvidia-docker

If all is prepped well, this should work and provid nvidia-smi output from within container: sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

Interestingly the Debian 12 based images come out smaller. Potentially I've missed some of other elements included within nVIDIA one though all works.

How to use:

nVIDIA TensorFlow Docker image based

This one uses little hack, it tags nVIDIA TensorFlow Docker image as Debian, used in next steps to build php, etc. Use with caution and in case if have other images built on base of Debian, remove the Debian image afterwards.

Use build-all.sh script as it takes care of all dependencies and options/version.

Standard images

  1. Build docker, i.e.: cd <Dockerfile folder> DOCKER_BUILDKIT=1 docker build -t local/nextcloud-recognize-gpu:latest .

  2. Run it: docker run -it --rm --gpus all <your usual mappings, i.e. volumes for NC data, etc> local/nextcloud-recognize-gpu:latest -d

In case of update, or moving existing data/deployment.

  • Recognize with gpu support is in: /usr/src/nextcloud/custom_apps/recognize
  • it needs to be: rsync -avH /usr/src/nextcloud/custom_apps/recognize/ /var/www/html/custom_apps/recognize/

To validate if Recognize has any chances to use GPU, validate: docker exec -it <your_container_name> /bin/bash -c "cd custom_apps/recognize && node ./src/test_gputensorflow.js"

initial sizes

  • 20.8GB - debian - Debian base
  • 17.3GB - non-debian-packages - Debian + binary packages

Diff seems to be coming from the fact that Debian installs additional packages on the way.

Potential issues:

  1. Default docker image size is 10GB, with latest cuda libraries it surprassses this size. To change image size
  • update configuration in /etc/docker/daemon.json: { "storage-opts": [ "dm.basesize=20G" ] }
  • restart docker
  • potentially remove cached images (or rebuild without cache) docker builder prune

Known issue(s):

  • Recognize run via cron coplaints about missing PTXAS in paths, though and relies on library known one - all works (by the looks of it), just complaint. This doesn't pop-up when run from CLI. Tried to set paths for cron, but didn't get it fixed as got attention on nVIDIA TensorFlow based one.

recognize_docker's People

Contributors

bugsyb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

recognize_docker's Issues

Could not load library libcublasLt.so.12.

I've just tested your container image, but I get the following error when trying to run recognize with GPU acceleration:

{
   "reqId":"YhPskH0byNEBPwuVNS59",
   "level":2,
   "time":"2023-02-19T17:50:14+00:00",
   "remoteAddr":"",
   "user":"--",
   "app":"recognize",
   "method":"",
   "url":"--",
   "message":"Classifier process output: 2023-02-19 17:50:04.773140: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n2023-02-19 17:50:04.848781: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n2023-02-19 17:50:04.849224: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n2023-02-19 17:50:06.242900: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n2023-02-19 17:50:06.243328: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n2023-02-19 17:50:06.243622: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n2023-02-19 17:50:06.243954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3376 MB memory:  -> device: 0, name: Quadro M2000, pci bus id: 0000:00:10.0, compute capability: 5.2\nCould not load library libcublasLt.so.12. Error: libcublasLt.so.12: cannot open shared object file: No such file or directory\n",
   "userAgent":"--",
   "version":"25.0.3.2",
   "data":{
      "app":"recognize"
   }
}

For some reason the application is looking for libcublasLt.so version 12 instead of 11.

Any ideas why this is happening?

Illegal instruction (core dumped)

I was getting this when trying to build my own Dockerfile. I found yours and used all of it except building recognize as i already have it installed. I still get this error using your Debian Dockerfile (with a few edits)

EDIT: This message is when trying to load the tensorflow module in Python. I don't get any error in the console or the docker logs that I can find.

Python command run on the docker container
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Here is my Dockerfile

ARG NCVERSION=28

FROM nextcloud:$NCVERSION AS recognize-git
################################################
###### Install Recognize

### install composer
## composer - same as above
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    --mount=type=tmpfs,target=/tmp/ \
    --mount=type=tmpfs,target=/root/.cache \
    apt update && \
    apt install -y git

RUN --mount=type=tmpfs,target=/tmp/ \
    --mount=type=tmpfs,target=/root/.cache \
    mkdir -p /usr/src/nextcloud/custom_apps && cd /usr/src/nextcloud/custom_apps/ && \
    git clone https://github.com/nextcloud/recognize 
# Separate run, as sometimes removal was failing complaining dirs are not empty
RUN    rm -rf /usr/src/nextcloud/custom_apps/recognize/.git*/*

FROM nextcloud:$NCVERSION

# choose version, depending on HW support
# https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html
ENV CUDNN_VERSION=8.2.1
ENV CUDA_VERSION_NVIDIA=11-8

ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/lib64
ENV PATH=$PATH:/usr/local/cuda/bin

# NodeJS version selection
ENV NODE_MAJOR=20

# Notes
# software-properties-comon requried for nvidia
RUN  --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    --mount=type=tmpfs,target=/tmp/ \
    --mount=type=tmpfs,target=/root/.cache \
    apt update && \
    apt install -y make ocrmypdf tesseract-ocr-eng yt-dlp imagemagick-6.q16 libmagickcore-6.q16-6-extra ffmpeg lsof coreutils wget gnupg2 git software-properties-common logrotate sudo && \
    echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list && \
    curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg && \
    apt-key adv --fetch-keys https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key && \
    apt update && \
    apt install -y nodejs

######## TensoFlow/pip install with Cuda/cudnn ia conda
### cuda & cudnn install start
# 2023.10.14 - no debian12 packages: debian12 => debian11 substition
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
  --mount=type=cache,target=/var/lib/apt,sharing=locked \
  --mount=type=tmpfs,target=/tmp/ \
  --mount=type=tmpfs,target=/root/.cache \
  . /etc/os-release && \
  export OS="${ID}${VERSION_ID}" && \
[ "$OS" = "debian12" ] && export OS=debian11 && \
  curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \
    curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
      sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
        tee /etc/apt/sources.list.d/nvidia-container-toolkit.list && \
  apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/3bf863cc.pub && \
  add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /" && \
  apt update

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
  --mount=type=cache,target=/var/lib/apt,sharing=locked \
  --mount=type=tmpfs,target=/tmp/ \
  --mount=type=tmpfs,target=/root/.cache \
  apt -y install --download-only cuda-toolkit-${CUDA_VERSION_NVIDIA} libcudnn8-dev

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    --mount=type=tmpfs,target=/tmp/ \
    --mount=type=tmpfs,target=/root/.cache \
    apt -y install cuda-toolkit-${CUDA_VERSION_NVIDIA} libcudnn8-dev

######## TensoFlow installed with Pip
RUN  --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    --mount=type=tmpfs,target=/tmp/ \
    --mount=type=tmpfs,target=/root/.cache \
    apt update && \
    apt -y install python3 python3-pip python3-setuptools python3-dev

### Tensorflow and TensorRT
# 2023.10.14 newer pip breaks tensorrt install at wheel time (download issue)
#    pip install --break-system-packages --upgrade pip && \
#   --use-pep517

RUN --mount=type=tmpfs,target=/tmp/ \
    --mount=type=tmpfs,target=/root/.cache \
    pip install --break-system-packages --upgrade pip==21.3.1 && \
    pip install --upgrade tensorflow[and-cuda] tensorrt

################################################
###### Install Recognize

### install composer
## composer - same as above
RUN --mount=type=tmpfs,target=/tmp/ \
    --mount=type=tmpfs,target=/root/.cache \
    curl -sS https://getcomposer.org/installer -o /tmp/composer-setup.php && \
      HASH=`curl -sS https://composer.github.io/installer.sig` && \
      CHECK=`php -r "if (hash_file('SHA384', '/tmp/composer-setup.php') === '$HASH') { echo 'Installer verified'; } else { echo 'Installer corrupt'; unlink('composer-setup.php'); } echo PHP_EOL;"` && \
      if [ "$CHECK" = "Installer verified" ]; then \
        php /tmp/composer-setup.php --install-dir=/usr/local/bin --filename=composer; \
      else \
        exit 1 ; \
      fi

# give heads up the git installation as it doesn't need apt access for git pull
RUN mkdir -p /tmp/certs && cd /tmp/certs && \
    curl -O http://www.cacert.org/certs/root_X0F.crt -O http://www.cacert.org/certs/class3_x14E228.crt && \
    mv root_X0F.crt /usr/local/share/ca-certificates/cacert-root.crt && \
    mv class3_x14E228.crt /usr/local/share/ca-certificates/cacert-class3.crt && \
    update-ca-certificates
#RUN sleep 15


### Grab Recognize
RUN mkdir -p /var/www/html/custom_apps/recognize && chown www-data: /var/www/html/custom_apps/recognize -R
USER www-data
COPY --from=recognize-git /usr/src/nextcloud/custom_apps/ /var/www/html/custom_apps/
#     chown www-data: /usr/src/nextcloud/custom_apps/recognizecd && 
USER root
RUN chown www-data: /var/www/html/custom_apps/recognize -R
USER www-data:www-data

RUN --mount=type=tmpfs,target=/tmp/ \
    --mount=type=tmpfs,target=/root/.cache \
    cd /var/www/html/custom_apps/recognize && \
    ls -l /var/www/html/custom_apps/ && \
    make

USER root
RUN sed -i 's/16KP/128KP/g' /etc/ImageMagick-6/policy.xml
RUN sed -i 's/128MP/1.0737GP/g' /etc/ImageMagick-6/policy.xml
RUN sed -i 's/256MiB/2GiB/g' /etc/ImageMagick-6/policy.xml
RUN sed -i 's/512MiB/4GiB/g' /etc/ImageMagick-6/policy.xml
RUN sed -i 's/1GiB/8GiB/g' /etc/ImageMagick-6/policy.xml

I put the recognize building stuff in, but it doesn't do anything as I have an existing volume for my data directory.
I'm not sure if there is something I need to change for Nextcloud 28, or if I did something wrong in copying your Dockerfile. Is it possible this only works on a fresh install?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.