nv-morpheus / morpheus Goto Github PK

Morpheus SDK

License: Apache License 2.0

Python 58.05% Shell 3.09% Dockerfile 0.58% CMake 1.32% Jupyter Notebook 14.27% Cython 0.19% C++ 21.80% Cuda 0.70% JavaScript 0.01%

morpheus's Introduction

NVIDIA Morpheus

NVIDIA Morpheus is an open AI application framework that provides cybersecurity developers with a highly optimized AI framework and pre-trained AI capabilities that allow them to instantaneously inspect all IP traffic across their data center fabric. The Morpheus developer framework allows teams to build their own optimized pipelines that address cybersecurity and information security use cases. Bringing a new level of security to data centers, Morpheus provides development capabilities around dynamic protection, real-time telemetry, adaptive policies, and cyber defenses for detecting and remediating cybersecurity threats.

Documentation

Using Morpheus

Getting Started with Morpheus - Using pre-built Docker containers, building Docker containers from source, and fetching models and datasets
Morpheus CLI Overview - Brief overview of the morpheus command line interface
Building a Pipeline - Introduction to building a pipeline using the command line interface
Morpheus Examples - Example pipelines using both the Python API and command line interface
Pre-built Models and Datasets - Pretrained models with corresponding training, validation scripts, and datasets
Developer Guides - Covers extending Morpheus with custom stages

Modifying Morpheus

Contributing to Morpheus - Covers building from source, making changes and contributing to Morpheus

Deploying Morpheus

Morpheus Cloud Deployment Guide - Kubernetes and cloud based deployments

Full documentation for the latest official release is available at https://docs.nvidia.com/morpheus/.

morpheus's People

Contributors

Stargazers

Watchers

Forkers

raykallen cwharris dagardner-nv pdmack tzemicheal drobison00 mdemoret-nv efajardo-nv bartleyr lobotmcj jrschmitt nickhub919 python-repository-hub rishianand06 riddopic hiroyukionishi inarikami fpgaq hariyawan bvillagh davidanthonygardner gbatmaz trxcllnt schoenemeyer jarmak-nv yimicong de-bang sets-lab mai1x9 hsin-c fintechutcc yanlimi hebekind mgonz21 bsuryadevara shawn-davis cyd3nt sec-js opoyraz tspannhw n13l tanmoyio prodo56 pthalasta zhizhenzhong tamirgold ajschmidt8 classicvalues jjacobelli asmedeus998 kasinadhsarma tgrunzweig-cpacket jwan2 ekowyamoah cyberhipp yczhang-nv chenji1978 quico637 zeroegg gravwell nathanawmk smnopz nathan-aw danyray420 moonlightnexus elishahaim nyck33 spatil44 tianyil1 exactlyallan teddylo107 pranavm7 franko22 jensinjames wuzhi-dev cjliu20152 hesaneasycoder konzeptplus n0hats edknv youngsecurity konzeptplus konzeptplus-gmbh rjzamora ayodeawe mroeschke jbblairiii cywf wikieden dineshsaini kalchakra13 e-ago dmaynor acaklovic-nv asergarcia ashsong-nv easyfloat zeroocooloverride dnandakumar-nv leaopensador

morpheus's Issues

Docker release image build failure

docker buildx build -t nvcr.io/nvidia/morpheus/morpheus:latest --target runtime --build-arg FROM_IMAGE=gpuci/miniforge-cuda --build-arg CUDA_VER=11.4 --build-arg LINUX_DISTRO=ubuntu --build-arg LINUX_VER=20.04 --build-arg RAPIDS_VER=21.10 --build-arg PYTHON_VER=3.8 --build-arg TENSORRT_VERSION=8.2.1.3 --build-arg NEO_GIT_URL=REDACTED --network=host --ssh default --load  -f docker/Dockerfile .
[+] Building 4.5s (23/31)
 => [internal] load build definition from Dockerfile                                                                                                                    0.0s
 => => transferring dockerfile: 32B                                                                                                                                     0.0s
 => [internal] load .dockerignore                                                                                                                                       0.0s
 => => transferring context: 35B                                                                                                                                        0.0s
 => resolve image config for docker.io/docker/dockerfile:1.3                                                                                                            0.7s
 => CACHED docker-image://docker.io/docker/dockerfile:1.3@sha256:42399d4635eddd7a9b8a24be879d2f9a930d0ed040a61324cfdf59ef1357b3b2                                       0.0s
 => [internal] load build definition from Dockerfile                                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                                                       0.0s
 => [internal] load metadata for docker.io/gpuci/miniforge-cuda:11.4-devel-ubuntu20.04                                                                                  0.0s
 => [base 1/4] FROM docker.io/gpuci/miniforge-cuda:11.4-devel-ubuntu20.04                                                                                               0.0s
 => [internal] load build context                                                                                                                                       0.1s
 => => transferring context: 73.92kB                                                                                                                                    0.1s
 => CACHED [base 2/4] RUN apt-get update &&    apt-get upgrade -y &&    curl -sL https://deb.nodesource.com/setup_12.x | bash - &&    apt-get install --no-install-rec  0.0s
 => CACHED [base 3/4] WORKDIR /workspace                                                                                                                                0.0s
 => CACHED [base 4/4] RUN conda config --set ssl_verify false &&    conda config --add pkgs_dirs /opt/conda/pkgs &&    conda config --env --add channels conda-forge &  0.0s
 => CACHED [conda_bld_deps 1/4] COPY ci/conda/recipes/cudf/ ./ci/conda/recipes/cudf/                                                                                    0.0s
 => CACHED [conda_bld_deps 2/4] COPY ci/conda/recipes/libcudf/ ./ci/conda/recipes/libcudf/                                                                              0.0s
 => CACHED [conda_bld_deps 3/4] COPY ci/conda/recipes/run_conda_build.sh ./ci/conda/recipes/run_conda_build.sh                                                          0.0s
 => CACHED [conda_bld_deps 4/4] RUN --mount=type=ssh     --mount=type=cache,id=workspace_cache,target=/workspace/.cache,sharing=locked     --mount=type=cache,id=conda  0.0s
 => CACHED [conda_env 1/3] RUN --mount=type=bind,from=conda_bld_deps,source=/opt/conda/conda-bld,target=/opt/conda/conda-bld     --mount=type=cache,id=conda_pkgs,targ  0.0s
 => CACHED [conda_env 2/3] RUN source activate morpheus &&    conda config --env --add channels conda-forge &&    conda config --env --add channels nvidia &&    conda  0.0s
 => CACHED [conda_env 3/3] COPY docker/entrypoint.sh ./docker/                                                                                                          0.0s
 => CACHED [runtime  1/10] COPY docker/conda/environments/requirements.txt ./docker/conda/environments/                                                                 0.0s
 => CACHED [runtime  2/10] COPY docker/conda/environments/cuda11.4_runtime.yml ./docker/conda/environments/                                                             0.0s
 => CACHED [conda_bld_morpheus 1/2] COPY . ./                                                                                                                           0.0s
 => ERROR [conda_bld_morpheus 2/2] RUN --mount=type=ssh     --mount=type=cache,id=workspace_cache,target=/workspace/.cache,sharing=locked     --mount=type=cache,id=co  3.3s
------
 > [conda_bld_morpheus 2/2] RUN --mount=type=ssh     --mount=type=cache,id=workspace_cache,target=/workspace/.cache,sharing=locked     --mount=type=cache,id=conda_pkgs,target=/opt/conda/pkgs,sharing=locked     source activate base &&    MORPHEUS_ROOT=/workspace CONDA_BLD_DIR=/opt/conda/conda-bld CONDA_ARGS="--no-test" ./ci/conda/recipes/run_conda_build.sh morpheus:
#23 0.969 CUDA        : 11.4.1
#23 0.969 PYTHON_VER  : 3.8
#23 0.969 NEO_GIT_TAG : 5b55e37c6320c1a5747311a1e29e7ebb049d12bc
#23 0.969
#23 0.999 fatal: No names found, cannot describe anything.
#23 1.001 Running conda-build for morpheus...
#23 1.001 ++ conda mambabuild --use-local --build-id-pat '{n}-{v}' --variants '{python: 3.8}' -c rapidsai -c nvidia -c nvidia/label/dev -c conda-forge --no-test ci/conda/recipes/morpheus
#23 1.717 No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.16
#23 1.717 WARNING:conda_build.metadata:No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.16
#23 1.764 Cloning into bare repository '/opt/conda/conda-bld/git_cache/workspace'...
#23 1.778 done.
#23 1.785 Cloning into '/opt/conda/conda-bld/morpheus-split-None/work'...
#23 1.792 done.
#23 1.890 Your branch is up to date with 'origin/branch-22.04'.
#23 1.990 fatal: No names found, cannot describe anything.
#23 2.043 commit 09eb17f105dd0d719da26e8817892d89c0f33334
#23 2.043 Author: Michael Demoret <[email protected]>
#23 2.043 Date:   Thu Apr 21 17:20:55 2022 -0700
#23 2.043
#23 2.043     Initial Commit
#23 2.043
#23 2.043 commit 09eb17f105dd0d719da26e8817892d89c0f33334
#23 2.043 Author: Michael Demoret <[email protected]>
#23 2.043 Date:   Thu Apr 21 17:20:55 2022 -0700
#23 2.043
#23 2.043     Initial Commit
#23 2.043
#23 2.043 On branch branch-22.04
#23 2.043 Your branch is up to date with 'origin/branch-22.04'.
#23 2.043
#23 2.043 nothing to commit, working tree clean
#23 2.043
#23 2.043 Updating build index: /opt/conda/conda-bld
#23 2.043
#23 2.043 checkout: 'HEAD'
#23 2.043 ==> /usr/bin/git log -n1 <==
#23 2.043
#23 2.043 ==> /usr/bin/git describe --tags --dirty <==
#23 2.043
#23 2.043 ==> /usr/bin/git status <==
#23 2.043
#23 2.043 Adding in variants from internal_defaults
#23 2.043 INFO:conda_build.variants:Adding in variants from internal_defaults
#23 2.043 Adding in variants from /workspace/ci/conda/recipes/morpheus/conda_build_config.yaml
#23 2.043 INFO:conda_build.variants:Adding in variants from /workspace/ci/conda/recipes/morpheus/conda_build_config.yaml
#23 2.043 Adding in variants from argument_variants
#23 2.043 INFO:conda_build.variants:Adding in variants from argument_variants
#23 3.031 Attempting to finalize metadata for morpheus
#23 3.031 /opt/conda/lib/python3.8/site-packages/conda_build/environ.py:444: UserWarning: The environment variable 'CMAKE_CUDA_ARCHITECTURES' is being passed through with value 'ALL'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
#23 3.031   warnings.warn(
#23 3.031 /opt/conda/lib/python3.8/site-packages/conda_build/environ.py:444: UserWarning: The environment variable 'MORPHEUS_CACHE_DIR' is being passed through with value '/workspace/.cache'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
#23 3.031   warnings.warn(
#23 3.031 /opt/conda/lib/python3.8/site-packages/conda_build/environ.py:444: UserWarning: The environment variable 'PARALLEL_LEVEL' is being passed through with value '80'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
#23 3.031   warnings.warn(
#23 3.031 INFO:conda_build.metadata:Attempting to finalize metadata for morpheus
#23 3.108 Error: Failed to render jinja template in /workspace/ci/conda/recipes/morpheus/meta.yaml:
#23 3.108 list object has no element 1
------
error: failed to solve: executor failed running [/bin/bash -c source activate base &&    MORPHEUS_ROOT=/workspace CONDA_BLD_DIR=/opt/conda/conda-bld CONDA_ARGS="--no-test" ./ci/conda/recipes/run_conda_build.sh morpheus]: exit code: 1

[FEA] Setup CI to use new Jenkins Build

Is your feature request related to a problem? Please describe.
In transitioning to Github, much of our CI system was lost. The style checks, build and tests need to be re-enabled using the new Jenkins build from RAPIDS.

Describe the solution you'd like
Convert the previous CI system to use the new Jenkins build from RAPIDS.

[FEA] Separate Pipeline Type Inference from Node Creation

Is your feature request related to a problem? Please describe.
Currently, morpheus determines the input/output types of each stage at the same time the nodes are created. This is a relic of the streamz days and requires the _build() methods to accept and return tuples of StreamPair. This is an ugly implementation delays type checking until the pipeline is actually being created.

Describe the solution you'd like
Before the pipeline object is created, Morpheus should run type inference on all stages to determine their input/output types. This will separate the two pieces and allow for invalid pipelines to be checked before any nodes are created.

[FEA] Make the Morpheus CLI Extendable with the Ability to Add New Commands

Is your feature request related to a problem? Please describe.
All pipelines that define at least 1 custom Stage must use the Python interface and cannot use the CLI. This is because there is no way to register new stages with the CLI unless they are built as first-party stages. This forces all of our built-in examples to use the python interface, duplicating a lot of click commands.

Describe the solution you'd like
Custom stages could be registered as commands with the CLI using various methods (such as a decorator). Once registered, CLI commands simply forward all options as kwargs to the constructor of each stage.

Additional context
Custom stages could be automatically discovered using the entry_point functionality in python. This is used by tools such as pytest to build extensions into the pytest CLI.

[BUG] Intermittent ValueError: I/O operation on closed file

Describe the bug
I'm occasionally seeing this at the end of running pytest. I'm not sure if this is reproducible outside of the unittests or if the unittests are simply doing something weird.

Traceback (most recent call last):
  File "/home/dagardner/work/morpheus/morpheus/utils/logging.py", line 47, in emit
    click.echo(click.style(msg, **color_kwargs), file=file, err=is_error)
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/site-packages/click/utils.py", line 299, in echo
    file.write(out)  # type: ignore
Traceback (most recent call last):
ValueError: I/O operation on closed file.
  File "/home/dagardner/work/morpheus/morpheus/utils/logging.py", line 47, in emit
    click.echo(click.style(msg, **color_kwargs), file=file, err=is_error)
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/site-packages/click/utils.py", line 299, in echo
    file.write(out)  # type: ignore
Call stack:
ValueError: I/O operation on closed file.
Call stack:
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/logging/handlers.py", line 1487, in _monitor
    self.handle(record)
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/logging/handlers.py", line 1468, in handle
    handler.handle(record)
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/logging/__init__.py", line 954, in handle
    self.emit(record)
  File "/home/dagardner/work/morpheus/morpheus/utils/logging.py", line 53, in emit
    self.handleError(record)
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/logging/handlers.py", line 1487, in _monitor
    self.handle(record)
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/logging/handlers.py", line 1468, in handle
    handler.handle(record)
  File "/home/dagardner/work/conda/envs/morpheus/lib/python3.8/logging/__init__.py", line 954, in handle
    self.emit(record)
  File "/home/dagardner/work/morpheus/morpheus/utils/logging.py", line 53, in emit
    self.handleError(record)
Message: '====Pipeline Complete===='
Arguments: None
Message: '====Pipeline Complete===='
Arguments: None

Steps/Code to reproduce bug
Run pytest about 10 times.

[BUG] Broken cudf builds

Describe the bug
Performing a clean build results in a broken build of cudf

ImportError: cannot import name 'builder' from 'google.protobuf.internal'

Steps/Code to reproduce bug
Reproducable in both Conda & Docker builds

Additional context
Issue appears to be caused by a new version of protobuf being published on conda-forge three days ago. Morpheus uses 3.19, however the cudf builds don't specify a version.

[DOC] Typos in README.md; one more GitLab-style term in CONTRIBUTING.md

Report incorrect documentation

Location of incorrect documentation
See Pull Request #74 .

Describe the problems or issues found in the documentation
Some typos in README.md and one lingering gitlab-style reference in CONTRIBUTING.md.

Steps taken to verify documentation is incorrect
N/A

Suggested fix for documentation
See Pull Request #74 .

[FEA] Add environment yaml that doesn't install compiler toolchains.

Is your feature request related to a problem? Please describe.
gxx_linux packages can cause unexpected behavior in a build environment and should be optional for bare metal/conda environments.

Describe the solution you'd like
Add alternative [file]_nogcc.yml variants

[BUG] cli default args for num_threads should be based on process thread affinity.

We are using os.cpu_count() (or psutil.cpu_count()) as the default argument for num_threads for both CLI and example code, which is based on the physical number of CPUs on the device and is potentially non-determinable.

Instead we should probably use len(os.sched_getaffinity(0)), which looks to be the appropriate default as it accounts for how many threads the current process has access to.

https://github.com/NVIDIA/Morpheus/blob/02bfbfbeb6e9e62d5fd8fe47a2bf2a92a85d1adc/examples/gnn_fraud_detection_pipeline/run.py#L36-L41

[FEA] Add ability to build containers that default to a user other than root.

Is your feature request related to a problem? Please describe.
Yes. The primary issue, is that by default containers use the 'root' user. In addition to the baseline security concerns this can present, it also induces some inconvenient behavior and side effects. Specifically, if I run a container as myuser with a user id of 1234 and mount my working directory into the container, then all files created by the container in the mounted directory will have root ownership. This can cause a variety of subtle problems and degraded user experience.

Describe the solution you'd like
When running a Morpheus container an end user should be able to specify the UID they want the tasks in the container to run under.

Describe alternatives you've considered
No other alternatives considered

Additional context

There are some complicating issues with user permissions on base conda installation items, packages, etc.. which make this more complicated than I'd originally hoped and why its on the back log. I don't think there are any insurmountable problems, but it might require some trail and error to work out all the issues.

[BUG] Remove `no_args_is_help` from CLI

Describe the bug
If you use a morpheus CLI command without any arguments or options, it just prints the help instead of adding that stage

Steps/Code to reproduce bug
For example, this launch command will just print the help:

(morpheus)$ morpheus run pipeline-nlp \
                from-file --filename=data/pcap_dump.jsonlines \
                deserialize
Configuring Pipeline via CLI
Usage: morpheus run pipeline-nlp deserialize [OPTIONS]

Options:
  --help  Show this message and exit.

Expected behavior
The pipeline should run with the specified commands

[FEA] AppShieldSource Stage

Messages from Appshield plugins should be read into the morpheus pipeline as a single data frame per source for further processing, such as feature creation by the Appshield source.

[DOC] Fix Broken URL's in QSG

Fix broken urls's in Quickstart guide.

[FEA] Improve LFS with Scripts and Optional Download

Is your feature request related to a problem? Please describe.
Currently, Git LFS is used to store large and binary files in the repo. However, some users have not had success with Git LFS. By default, Git LFS will download all files for a particular commit, even if the user is unlikely to need every file. This adds about 2 GB to the repo checkout.

Describe the solution you'd like
Ideally, the LFS configuration would be improved to minimize the download size for the average user. Utility scripts could be added to download the necessary files for a particular workload/example/model. This would improve the user experience by minimizing the repo checkout size and would not require anyone to learn the necessary Git LFS commands to download ignored or skipped files.

Describe alternatives you've considered
Other options include storing large files external to Git (i.e. S3) and building a set of scripts or tools to download these files as necessary. This option (while preferred by some) would just trade one problem for another. We would need to develop our own versioning system for storing these large files, create the tools to download them, and maintain a system which functions very similarly to Git LFS but is slightly different. While Git LFS is far from perfect, we can mitigate some of the issues by correctly configuring our repo and providing scripts to shield users from needing to deal with Git LFS.

Additional context
The recommended solution from Rapids Ops is outlined here: https://docs.rapids.ai/resources/git/

[DOC] Suggested updates to developer guide for clarity

In reading through the developer guide I had some changes to suggest.
Some suggestions are typos or copy/paste errors; others are an attempt to increase clarity.
Please edit/disregard these suggestions at will.

Files with suggested updates:

docs/source/developer_guide/architecture.md
docs/source/developer_guide/guides/1_simple_python_stage.md
docs/source/developer_guide/guides/2_real_world_phishing.md
docs/source/developer_guide/guides/3_simple_cpp_stage.md
docs/source/developer_guide/guides/4_source_cpp_stage.md

Suggested fix for documentation
See Pull Request #96 .

to-file stage failure

Traceback (most recent call last):
  File "/opt/conda/envs/morpheus/bin/morpheus", line 11, in <module>
    sys.exit(run_cli())
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/cli.py", line 1381, in run_cli
    cli(obj={}, auto_envvar_prefix='CLX', show_default=True, prog_name="morpheus")
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1689, in invoke
    return _process_result(rv)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1626, in _process_result
    value = ctx.invoke(self._result_callback, value, **ctx.params)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/cli.py", line 543, in post_pipeline
    pipeline.run()
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 1239, in run
    asyncio.run(self._do_run())
  File "/opt/conda/envs/morpheus/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/envs/morpheus/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 1214, in _do_run
    self.build_and_start()
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 1038, in build_and_start
    self.start()
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 998, in start
    self._neo_executor.start()
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 960, in inner_build
    s.build(seg)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 526, in build
    dep.build(seg, do_propagate=do_propagate)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 526, in build
    dep.build(seg, do_propagate=do_propagate)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 526, in build
    dep.build(seg, do_propagate=do_propagate)
  [Previous line repeated 10 more times]
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 504, in build
    out_ports_pair = self._build(seg, in_ports_pairs)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 783, in _build
    return [self._build_single(seg, in_ports_streams[0])]
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/output/to_file.py", line 113, in _build_single
    to_file = neos.WriteToFileStage(seg,
ValueError: Read/Write mode ('+') is not supported by WriteToFileStage. Mode: w+

[DOC] ssh-agent must be set up to build as instructed in CONTRIBUTING.md

Report incorrect documentation

Location of incorrect documentation
CONTRIBUTING.MD in the "Build in Docker Container" section.

Describe the problems or issues found in the documentation
If ssh-agent is not set up as described in step 2, then step 1 may fail with the following error:
error: invalid empty ssh agent socket: make sure SSH_AUTH_SOCK is set.
If step 2 is executed first, then step 1 works without error.

Steps taken to verify documentation is incorrect
I encountered this error on:

Operating system: Ubuntu 18.04
Docker runtime version: 20.10.15

I suspect the offending lines are 44-47 of docker/build_container.sh here.

Suggested fix for documentation
Either switch steps 2 and 1 or (preferably) modify lines 44-47 of docker/build_container.sh to work even if ssh-agent has not been configured. Perhaps use a new, alternate environment variable here such as DOCKER_PRIVATE_REPOS instead of DOCKER_BUILDKIT?

[BUG] bert vocabulary file has incorrectly encoded special characters.

This file has incorrect encoded special characters.

https://github.com/NVIDIA/Morpheus/blob/branch-22.04/models/training-tuning-scripts/log-parsing-models/resources/bert-base-cased-vocab.txt

A copy of the file which is correctly encoded is located here:

https://github.com/NVIDIA/Morpheus/blob/branch-22.04/models/training-tuning-scripts/sid-models/resources/bert-base-cased-vocab.txt

We should remove the incorrectly encoded file, as well as the associated hash file, and update reference to use the correctly encoded file and hash file.

cc @raykallen @efajardo-nv

[FEA] Create Utility Repo to Store Common Functionality

Is your feature request related to a problem? Please describe.
We currently have several repos that all require similar functionality related to CMake, CI, Conda, Docker, and testing. This functionality is duplicated making it hard to maintain.

Describe the solution you'd like
Create a new repo containing the above common functionality that can be added as a submodule to all repos that need it. This will allow for versioning the common tools and keeping everything in sync. A partial list of items that could be included is:

#87 (comment)

Allow Building with `sccache` instead of `ccache`

In the PR to add a CI system, it was discovered that ccache doesnt work well between build stages. Instead, Morpheus should have the option to build using sccache.

Original comment from the PR review is below:

I think there might be a better way to configure using sccache. The script cmake/setup_cache.cmake does some templating to ensure ccache can figure out the compiler type during a conda-build run (look at the comments in that file for more info).

We could also just set the variable CCACHE_PROGRAM_PATH to sccache if they are drop in replacements for eachother.

Either way, we should make a new issue to explore this.

Originally posted by @mdemoret-nv in #80 (comment)

[BUG] CI scripts broken

Describe the bug
Both ci/scripts/python_checks.sh and ci/scripts/fix_all.sh are currently broken.
I believe python_checks might be working when invoked as a git hook and I have a pull-req open which fixes it in CI, but these should work when invoker directly.

Steps/Code to reproduce bug
ci/scripts/python_checks.sh

Environment overview (please complete the following information)

Environment location: Bare-metal
Method of Morpheus install: from source

[FEA] benchmark the example workflows

Let's benchmark the example workflows for end-to-end timing with fixed message counts and the ability to compare the python and c++ stage implementations.

Goals:

run all example workflows as benchmarks
ability to run pure python implementations or include c++ accelerated implementations
time benchmarks, not message count per second benchmarks.
answer the question "on average, how fast are the example workflows in both pure-python and c++ accelerated"

Non-Goals

per-stage metrics

Convert Gitlab Config to Github

With the transition to Github, we need to convert the GitLab specific config files into Github. Here is the task list:

Add the ops-bot config file into the repo to enable the ops-bot (Links: Ops-bot, Config file, and Example)
Convert the issue templates
Make a new CODEOWNERS file

For now, lets keep the ./.gitlab file around until after release.

[DOC] Add NGC references back to README

Report needed documentation
We removed the NGC references from the getting started section temporarily as to avoid confusion on the multiple paths to get started. The removed link directed users here: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/morpheus/collections/morpheus_. This is where a user would go if they want to use Helm charts to run Morpheus in a K8s environment.

Describe the documentation you'd like
This link and wording around it should be added back into the README.md file once we can decide on the wording to avoid potential confusion with users mixing and matching paths to get started.

[BUG] Add tidbit about setting DPython_EXECUTABLE & DPython3_FIND_STRATEGY hints

Something to add to the documentation, somewhere in the troubleshooting section.
Morpheus uses Python-3.8 however by default cmake will find the highest version of python available. Many Linux distros are shipping Python-3.9+ causing cmake to find a version of python outside of the conda env.

[DOC] Add model card info to readme and model-information.csv

Model-information.csv should contain all the information from model_card.csv info

A simplified version of the model cards should also be available in the model directory readme

[FEA] Documentation build stage shouldn't require a gpu

Is your feature request related to a problem? Please describe.
If we try to build without it we get this error:

F20220518 00:39:57.558239   679 device_info.cpp:39] Check failed: nvmlInit_v2() == NVML_SUCCESS (9 vs. 0) Failed to initialize NVML

Describe the solution you'd like
Sphinx provides the ability to mock modules during the build phase, likely we just need to identify these and add them to our config.

[BUG]following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC

Describe the bug
Yesterday new keys were pushed out to the nvidia apt repos, but our docker images still have the old key in their keyrings.

#72 installs the new key at build time however it is believed that this will be fixed at a future point in our base images

[BUG] WriteToFileStage fails when given a filename without a path

Describe the bug
Initializing WriteToFileStage with:
WriteToFileStage(config, filename='test.json')

Will cause a failure in _build_single:
os.makedirs(os.path.dirname(self._output_file), exist_ok=True)
The call to os.path.dirname will return '' causing os.mkdirs to raise an exception

[BUG] Missing pybind11-stubgen in dev container.

Describe the bug
pybind11-stubgen is missing in container build environment, which breaks the compile script in the container.

[FEA] Adding ability to build dev container with python debugging symbols / code.

Is your feature request related to a problem? Please describe.
See title

Describe the solution you'd like
Add a flag, such as PYTHON_DEBUG_SYMBOLS=True that will build the dev container with source build of python.

Describe alternatives you've considered
N/A

Additional context
N/A

[BUG] Relative paths in the CLI should always be computed as relative to the root

Describe the bug
Some of the default values for command line flags are relative paths ex:

@click.option('--labels_file', default="data/labels_nlp.txt",

The problem then is that this only works when run from the root of the repo. Several of our own examples are run from within the example sub-directory.

[DOC] Morpheus source build instructions in CONTRIBUTING.md do not work as written.

Report incorrect documentation

Location of incorrect documentation
CONTRIBUTING.md

export CUDAToolkit_ROOT=/usr/local/cuda-{CUDA_VER}
mamba env create -n morpheus -f ./docker/conda/environments/cuda${CUDA_VER}_dev.yml
conda activate morpheus

Describe the problems or issues found in the documentation

This does not work, because the cuda_xxx_dev.yml specifies cudf 21.10.* *morpheus as a dependency.
The specified build path for cuDF also will not work, because the build.sh approach does not add the correct morpheus tag to the installed package.

Suggested fix for documentation
Steps 1 and 2 need to be swapped, and cudf build description needs to be updated to reflect the new conda packaging process.

[FEA] Improve What Is Stored In CI Artifacts to Reduce Size and Allow for In-Place Builds

Is your feature request related to a problem? Please describe.
The original implementation of the CI system in PR #80, saves the entire conda environment and workspace as artifacts to allow for the built morpheus packages to be used in docs and testing. It seems that this is necessary because the cudf packages are installed into the conda environment and the morpheus packages are installed as in-place developer builds. This requires both the workspace and conda environment to be zipped up and moved to the next build stage.

Describe the solution you'd like
Taring up the entire conda environment is a bit unnecessary and will require 4+ GB of storage per build when the majority of the files are the same. Instead, we only need a few packages to be transferred between the build and test/docs phase:

The built cudf conda package
The built morpheus conda package
The built morpheus wheel package

[Doc] abp_pcap_detection example contains typos

Uses older triton server that doesn't support newer cards
Invalid paths in commands

[FEA] Improve Monitor Stage

Is your feature request related to a problem? Please describe.
The MonitorStage currently has several flaws that reduce its usefulness:

It has its own buffered queue meaning messages are not tracked as they are processed, but when back pressure forces them to be processed. This delays and alters the measured results
It requires a separate python thread to periodically update the screen with the latest results
The throughput numbers cannot be queried or displayed anywhere besides the console
It does not scale well. If multiple monitors are added, they all redraw and refresh at separate times.

Describe the solution you'd like
Ideally, the MonitorStage would:

Immediately track the throughput of the previous stage without any buffering. This would require changing the type of node that is used to a node that does not have it's own progress engine.
Be implemented in C++ to avoid grabbing the GIL when monitoring throughput
Utilize a common library such as Prometheus to allow extracting the measured values or displaying outside of the console
Synchronize updates between multiple instances to avoid scaling issues.

[FEA] Ensure all message classes have a C++ impl

Is your feature request related to a problem? Please describe.
This would allow pipelines like Hammah, where some stages have C++ impls, to run in a hybrid mode.

Describe the solution you'd like
There are a few message classes which don't have C++ impls but inherit from a message class that has a C++ impl. Typically these only add a few attributes.

Describe alternatives you've considered
Options would be:

Just add the new C++ message classes (I have some of them in an old branch).
Investigate pybind11 trampoline classes

[DOC] Add model cards to Morpheus docs site

Report needed documentation
Model card information is in the README of the models directory (https://github.com/NVIDIA/Morpheus/tree/branch-22.04/models). We also want those model cards in the docs site (https://docs.nvidia.com/morpheus/).

Describe the documentation you'd like
Replace the model cards in the main Morpheus docs HTML site.

[DOC] gnn_fraud_detection_pipeline needs updating

Report incorrect documentation

Minimally psutil needs to be replaced

Location of incorrect documentation
examples/gnn_fraud_detection_pipeline

[FEA] Update Morpheus to Use Latest cuDF

Is your feature request related to a problem? Please describe.
As of the GA release, Morpheus is stuck on cuDF version 21.10 because several modification to the cuDF source code were necessary and these modifications cannot be applied to later releases. The modified files are contained in a patch file located here and can be summarized by the following:

Addition of ${CUDF_SOURCE_DIR}/include/nvtext to the installed headers to allow Morpheus to use the C++ SubwordTokenizer
Public export of the cython Column class (this could possibly be removed)
Public export of the cython Table class

All of these modifications revolve around exposing public aspects of the cuDF library to external projects and were originally intended to be upstreamed after the 0.2 EA release. However, in the 21.12 release of cuDF, the Table class (item #3 above) was removed completely, preventing applying the patch to any release after 21.10.

With the removal of the Table cython class, Morpheus will need to refactor the morpheus.TableInfo class and the cudf_helpers.pyx code.

How the Table class is used in Morpheus
Morpheus stores a message's DataFrame object in C++ using a pybind11::object wrapped in the class IDataTable which is then referenced by a TableInfo struct. To operate on the data in a DataFrame using C++, we need to convert the python object referenced in a TableInfo into a cudf:table_view. The Table cython class facilitates this conversion in C++ even though it is not used directly (we cast a pybind11 object into cython via (PyTable*)data_frame.ptr() and is the main reason that the Table class needed to be exported in patch item #3 above).

To convert from a pybind11::object -> cudf:table_view, we use the cudf cython method cudf._lib.table.table_view_from_table()
To convert from a cudf::table_view -> pybind11::object, we use the cudf cython method cudf._lib.utils.data_from_table_view()

Describe the solution you'd like
Ultimately, the solution to this problem should allow Morpheus to utilize the most recent version of cuDF without needing a patch file or custom cuDF build while maintaining the C++ performance currently available in GA. The best solution will likely require working with the cuDF team to find alternatives and upstream any changes they see fit. Below is a rough outline of some of the expected changes:

Removal of a cuDF patch file and custom cuDF conda build from the Docker components
Update to the IDataTable class to store C++ objects instead of the python DataFrame
Update to the TableInfo class to handle slicing into the IDataTable using C++ functions instead of python
Update cudf_helpers to work with the latest cuDF classes and utility functions

Describe alternatives you've considered
While updating the IDataTable and TableInfo classes to store data in C++ instead of python would be preferred, it may not be necessary. Storing the data in C++ could significantly improve performance but may complicate the implementation. If this proves to be too difficult, alternative options that find workarounds to the removed classes while preserving the functionality would also satisfy the main goal of using the latest cuDF without modifications.

[DOC] abp_nvsmi_detection example is out of date

Report incorrect documentation

Location of incorrect documentation
examples/abp_nvsmi_detection/README.md

Describe the problems or issues found in the documentation

Uses the deprecated buffer stage
Uses the vizualize flag which appears to be broken
Uses a triton fork which appears to be no longer relevant and doesn't build properly

Steps taken to verify documentation is incorrect
I tried to follow the steps

[BUG] Stacktrace when interrupting (SIGINT) a from-kafka pipeline

Describe the bug
Ctrl-C of a from-kafka pipeline yields an abort and stacktrace.

====Pipeline Started====
Stopping pipeline. Please wait... Press Ctrl+C again to kill.C
From Kafka rate: 93085messages [03:45, 413.64messages/s]
Deserialization rate: 93085messages [03:45, 413.64messages/s]
Preprocessing rate: 93085messages [03:45, 413.65messages/s]
Inference rate: 93085inf [03:45, 413.65inf/s]
Serialization rate: 93085messages [03:45, 413.65messages/s]
To Kafka rate: 93085messages [03:45, 413.65messages/s]
====Stopping Pipeline====
%3|1651867570.038|ERROR|rdkafka#consumer-1| [thrd:GroupCoordinator]: 1/1 brokers are down
F20220506 20:06:10.052778  4511 executor_base.cpp:395] Check failed: search != m_segments.end() [segment id: 57864; rank: 1]: not found
*** Check failure stack trace: ***
    @     0x7fb4ed097c0d  google::LogMessage::Fail()
    @     0x7fb4ed09a7a6  google::LogMessage::SendToLog()
    @     0x7fb4ed097705  google::LogMessage::Flush()
    @     0x7fb4ed09ad6a  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fb4e4d834bd  neo::ExecutorBase::segment()
    @     0x7fb4e4d84602  neo::ExecutorBase::remove_segment()
    @     0x7fb4e4d87fae  neo::ExecutorBase::do_update()
    @     0x7fb4e4d86f01  neo::ExecutorBase::update()
    @     0x7fb4e4d8f93d  neo::StandaloneExecutor::stop()
    @     0x7fb4caf2ea6c  neo::pyneo::Executor::stop()
    @     0x7fb4caea3daf  (unknown)
    @     0x7fb4caea88aa  (unknown)
    @     0x561d62c03e7e  PyCFunction_Call
    @     0x561d62bec631  _PyObject_MakeTpCall
    @     0x561d62c03bfd  method_vectorcall
    @     0x561d62be7ffc  _PyEval_EvalFrameDefault
    @     0x561d62bf42a6  _PyFunction_Vectorcall
    @     0x561d62be3d9d  _PyEval_EvalFrameDefault
    @     0x561d62be2b76  _PyEval_EvalCodeWithName
    @     0x561d62bf433c  _PyFunction_Vectorcall
    @     0x561d62d0cc6a  _PyObject_Vectorcall.lto_priv.10
    @     0x561d62ba2441  context_run
    @     0x561d62beb0c6  cfunction_vectorcall_FASTCALL_KEYWORDS
    @     0x561d62c040bf  PyVectorcall_Call
    @     0x561d62be8d2e  _PyEval_EvalFrameDefault
    @     0x561d62bf42a6  _PyFunction_Vectorcall
    @     0x561d62be3d9d  _PyEval_EvalFrameDefault
    @     0x561d62bf42a6  _PyFunction_Vectorcall
    @     0x561d62be3d9d  _PyEval_EvalFrameDefault
    @     0x561d62bf42a6  _PyFunction_Vectorcall
    @     0x561d62be3d9d  _PyEval_EvalFrameDefault
    @     0x561d62bf42a6  _PyFunction_Vectorcall
Aborted (core dumped)

Steps/Code to reproduce bug
Ctrl-C in the shell of a from-kafka pipeline

Expected behavior
No stacktrace. Graceful exit from pipeline.

Environment overview (please complete the following information)

Environment location: [Bare-metal]
Method of Morpheus install: [Docker/k8s]

Environment details
https://gist.github.com/pdmack/2f1924031e20995e99985cfd735fbbc3

Additional context
Add any other context about the problem here.

[FEA] Python packages should be re-organized

Is your feature request related to a problem? Please describe.
Currently Morpheus stages are not easily discoverable as they are intermixed with pipeline, utility and message modules.

This would allow users to quickly get a list of available stages & messages.

Describe the solution you'd like
Ideally the Python layout modules should be match the directory structure of our C++ headers:

_lib/include/morpheus
    io/
    messages/
    objects/
    stages/
    utilities/

Such that:

from morpheus.pipeline import LinearPipeline
from morpheus.pipeline.input.from_cloudtrail import CloudTrailSourceStage
from morpheus.pipeline.general_stages import FilterDetectionsStage
from morpheus.pipeline.messages import MessageMeta

Would become:

from morpheus.pipeline import LinearPipeline
from morpheus.stages.cloudtrail_source import CloudTrailSourceStage
from morpheus.stages.filter_detection import FilterDetectionsStage
from morpheus.messages.meta import MessageMeta

This would make it easier for any users who are working with both the C++ & Python APIs

[DOC] Update log parsing example

Report incorrect documentation

Need to ensure its up to date

Location of incorrect documentation
examples/log_parsing

[DOC] nlp_si_detection example needs to be updated

Report incorrect documentation

Old triton
Reference to git sub module
Remove vizualization flag
In cpp mode it fails an assert early Check failed: this->mess_count == this->count At this time, mess_count and count must be the same for slicing
in python mode it fails later with ValueError: Cannot align indices with non-unique values

Location of incorrect documentation
examples/nlp_si_detection

[FEA] Add Python Stubs for Cython/PyBind11 Modules

Is your feature request related to a problem? Please describe.
Python modules created with cython/pybind11 do not show up in IDEs and fail all linting which is a poor developer experience.

Describe the solution you'd like
Utilize mypy or pybind11-stubgen to create pyi stubs for all C++ generated code. This allows linters and IDEs to pickup and check python code using these modules

[BUG] FilterDetectionsStage can spam the output edge with thousands of small sliced messages

Describe the bug
Degraded pipeline performance when using filter stage with any threshold.
In this case the pipeline_batch_size=8192 so messages entering the FilterDetectionsStage will generally contain 8192 rows, and FilterDetectionsStage will emit roughly ~1,800 slices of that incoming message which quickly causes the output edge to block on writes.

Steps/Code to reproduce bug

morpheus --log_level=DEBUG run --use_cpp=True --num_threads=8 --pipeline_batch_size=8192 --model_max_batch_size=32 --edge_buffer_size=32 \
  pipeline-nlp --model_seq_length=256 \
  from-file --filename=./examples/data/pcap_dump.jsonlines dropna \
  monitor --description='Drop Null Attributes rate' \
  deserialize \
  monitor --description='Deserialization rate' \
  preprocess --vocab_hash_file=./morpheus/data/bert-base-uncased-hash.txt --truncation=True --do_lower_case=True --add_special_tokens=False \
  monitor --description='Preprocessing rate' \
  inf-triton --force_convert_inputs=True --model_name=sid-minibert-onnx --server_url=localhost:8001 --use_shared_memory=True \
  monitor --description='Inference rate' --smoothing=0.001 --unit inf \
  add-class \
  monitor --description='Classification rate' \
  filter \
  monitor --description='Filter rate' \
  serialize --exclude '^ts_' \
  monitor --description='Serialization rate' \
  to-file --filename=/tmp/sid-minibert-onnx-output.jsonlines --overwrite

Expected behavior
Performance in line with other stages.

Environment overview (please complete the following information)

Environment location: Bare-metal
Method of Morpheus install: Docker
- If method of install is [Docker], provide docker pull & docker run commands used

Environment details
https://gist.github.com/pdmack/2f1924031e20995e99985cfd735fbbc3

Additional context
Add any other context about the problem here.

[DOC] Add Quickstart Guide

To get started with Morpheus workflows, add a Morpheus Quickstart guide doc to run precompiled workflows. The guide must describe how to install Morpheus SDK and its components, as well as how to run the workflow from end to end.

[BUG] tensorrt package missing leading to onnx-to-trt failure

# morpheus tools onnx-to-trt \
>             --input_model /common/models/sid-minibert-onnx/1/model.onnx \
>             --output_model /common/models/sid-minibert-trt_b1-8_b1-16_b1-32.engine \
>             --batches 1 8 \
>             --batches 1 16 \
>             --batches 1 32 \
>             --seq_length 256 \
>             --max_workspace_size 4000
Traceback (most recent call last):
  File "/opt/conda/envs/morpheus/bin/morpheus", line 11, in <module>
    sys.exit(run_cli())
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/cli.py", line 1381, in run_cli
    cli(obj={}, auto_envvar_prefix='CLX', show_default=True, prog_name="morpheus")
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/cli.py", line 136, in new_func
    return f(ctx, *args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/cli.py", line 233, in onnx_to_trt
    gen_engine(c)
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/utils/onnx_to_trt.py", line 30, in gen_engine
    import tensorrt as trt
ModuleNotFoundError: No module named 'tensorrt'

[BUG] Fix RPATH in `libmorpheus.so` to Use Relative Paths

Describe the bug
Discovered in PR #80, the RPATH for libmorpheus.so is not correctly setup to use relative paths and requires setting LD_LIBRARY_PATH

Originally posted by @mdemoret-nv in #80 (comment)

nv-morpheus / morpheus Goto Github PK

morpheus's Introduction

NVIDIA Morpheus

Documentation

Using Morpheus

Modifying Morpheus

Deploying Morpheus

morpheus's People

Contributors

Stargazers

Watchers

Forkers

morpheus's Issues

Report incorrect documentation

Report incorrect documentation

Report incorrect documentation

Report incorrect documentation

Report incorrect documentation

Report incorrect documentation

Report incorrect documentation

Recommend Projects

Recommend Topics

Recommend Org