craylabs / smartredis Goto Github PK

View Code? Open in Web Editor NEW

53.0 8.0 24.0 7.92 MB

SmartSim Infrastructure Library Clients.

License: BSD 2-Clause "Simplified" License

CMake 1.49% Shell 0.01% C++ 64.71% Python 12.86% C 8.94% Fortran 10.57% Makefile 0.99% Dockerfile 0.43%

redis-client redis redisai hpc machine-learning

smartredis's People

Contributors

Stargazers

Watchers

smartredis's Issues

CommandReply copy constructor and copy assignment operator

Description

Currently the CommandReply copy constructor and copy assignment operator are not implemented. However, as issue #1 is addressed, CommandReply should be fully implemented. While passing CommandReply by reference is encouraged, copy operations should be enabled and allowed.

Justification

This is a potential requirement for issue #1 as CommandReply objects are more broadly utilized for Python client commands. Additionally, if CommandReply is ever exposed to the user for custom command execution, all operators should be implemented.

Implementation Strategy

Implement copy assignment operator
Implement copy constructor

SmartRedis Fortran Client Does Not Build with PGI Compilers

Description

Building the serial Fortran client example smartredis_put_get_3D.F90 located in examples/serial/fortran with PGI compilers (nvfortran specifically) version 21.3.0 I get the following error message:

Scanning dependencies of target smartredis_dataset
[ 10%] Building Fortran object CMakeFiles/smartredis_dataset.dir/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/fortran_c_interop.F90.o
[ 20%] Building Fortran object CMakeFiles/smartredis_dataset.dir/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90.o
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
  0 inform,   0 warnings,   1 severes, 0 fatal for get_meta_scalars_i32
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
  0 inform,   0 warnings,   2 severes, 0 fatal for get_meta_scalars_i64
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
  0 inform,   0 warnings,   3 severes, 0 fatal for get_meta_scalars_float
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
  0 inform,   0 warnings,   4 severes, 0 fatal for get_meta_scalars_double
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
  0 inform,   0 warnings,   4 severes, 0 fatal for add_meta_scalar_i32
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
  0 inform,   0 warnings,   4 severes, 0 fatal for add_meta_scalar_i64
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
  0 inform,   0 warnings,   4 severes, 0 fatal for add_meta_scalar_float
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
  0 inform,   0 warnings,   4 severes, 0 fatal for add_meta_scalar_double
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
NVFORTRAN-S-0155-IGNORE_TKR may not be specified for meta (/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90)
  0 inform,   0 warnings,   4 severes, 0 fatal for add_meta_string
CMakeFiles/smartredis_dataset.dir/build.make:107: recipe for target 'CMakeFiles/smartredis_dataset.dir/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90.o' failed
make[2]: *** [CMakeFiles/smartredis_dataset.dir/projects/cfdml_aesp/ML/smartredis-0.1.1/src/fortran/dataset.F90.o] Error 2
CMakeFiles/Makefile2:96: recipe for target 'CMakeFiles/smartredis_dataset.dir/all' failed
make[1]: *** [CMakeFiles/smartredis_dataset.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make: *** [all] Error 2

Please note that the specific branch of SmartRedis was cloned with the following command git clone https://github.com/CrayLabs/SmartRedis.git --depth=1 --branch v0.1.1 smartredis-0.1.1. The CMakeLists.txt file was used to build the example code, with the only modification being that CMAKE_FC_COMPILER is set to the desired nvfortran compiler path.

How to reproduce

The error is produced on the ThetaGPU machine at ALCF. From a clean environment obtained executing module purge, CMake version 3.19.5 and the NVIDIA HPC SDK version 21.3 are loaded. The CmakeLists.txt file available in the example directory (ensuring nvfortran is used) is then used to configure and build the smartredis_put_get_3D.F90 example code, producing the compiler error pasted above. No PrgEnv is loaded since none is available on ThetaGPU.

Expected behavior

The expected behavior is for the Fortran client to build correctly.

System

OS: Ubuntu 18.04.5 LTS
Commit/version of library: I am using the following SmartRedis commit 95ca568

Client.run_model seg faults if name is wrong

Description
The python client.run_model seg faults if the model name is wrong. This is because the C++ client throws an uncaught redis error stemming from a redis-plus-plus throw. To provide better behavior, we should have the C++ client throw a std::runtime_error to the python client for all redis errors. This would enable easier error catching in the python client.

Github Actions

Description

Setup a few workflows for github actions

build on pull request (or update to pull request)
run black, pylint and possibly others prior to pull request.
Use CIbuildwheel to build on manylinux and other OS's

Justification

We don't want to have to manually build pip wheels for every OS. We also don't want to include code that doesn't build or doesn't pass pylint or black standards. Automating this will be a huge improvement to our contributor workflow.

Implementation Strategy

Write Github Actions for each of the above items. Eventually we will want a deployer as well.

Rename, copy, and delete DataSet

Description
Add the ability to rename, copy, and delete DataSet objects from the C++ client.

Improve metadata serialization

Description
Currently, metadata in a DataSet is serialized using protobuf and associated with the key dataset_name.meta. However, this single key, value pair does not allow the user to view the name of the metadata fields using the redis CLI. Moreover, protobuf is a very large dependency to include, given that the metadata types (e.g. protobuf message types) are relatively simple and are fixed.

Justification
Users will be able to view metadata fields using the redis-cli. Additionally, build/test/release will be simplified with the removal of a large dependency.

Implementation Strategy

Change DataSet representation in the database to use hash field commands (e.g. HSET) for metadata fields. For example, a command would look like HSET dataset_name.meta meta_field_1 serialized_value. This would still use protobuf for each metadata field.
Remove protobuf for each metadata field and implement serialization strategies for each metadata field type.

Update SILC references to SmartRedis

With the new project name, all references to SILC/silc must be changed to SmartRedis/smartredis.

Expose AI.INFO Redis command to the python client

Description

Give the python client access to AI.INFO by exposing it through the Pybindings. Here is the documentation for AI.INFO. It returns general information about the execution of a given model or script. This is a portion of issue #1 .

Justification

Users will be able to obtain the information from the AI.INFO RedisAI command in the python client.

Implementation Strategy

The C++ client has this functionality and it just needs to be exposed through the Pybindings and pyclient.cpp file and to the python client layer. The implementation will be similar to the INFO command implementation.

PyBind not pinned to a version

Description

PyBind is not pinned to a version, and as a result, our github actions are failing on MacOSX. We should only pin to stable, final versions.

How to reproduce

Any PR will fail the checks on MacOSX because of a changed in PyBind.

Expected behavior

Github actions pass.

Setup_env.sh LD_LIBRARY_PATH error

Description
setup_env.sh does not set LD_LIBRARY_PATH correctly for dependencies other than protobuf. We should construct the full path and then set LD_LIBRARY_PATH.

Test TensorFlow backend

Description

RedisAI supports TensorFlow as a backend. However, only PyTorch is tested in the SmartRedis unit tests: test_model_methods_torch.py. We should
have tests that ensure TensorFlow functionality is correct in the client, specifically for input parameters that are specific to TensorFlow models (i.e. inputs and outputs in client.set_model()).

Justification

There needs to be SmartRedis tests that ensure TensorFlow functionality is implemented correctly.

Implementation Strategy

The following API functions should be tested:

Client.set_model_from_file()
Client.set_model()
Client.run_model()
Client.get_model()
Test that TensorFlow specific inputs (i.e. inputs and outputs in client.set_model) are implemented correctly for ensembles with prefixing for the above functions.

Expose custom command structure to python client

Description
There are certain commands like AI.INFO that would be great to have access to in the python client, but non-trival to include in the other compiled clients (C, C++, Fortran). For this reason, the custom command structure used from Redis Plus Plus should be exposed through the Pybindings such that the python client can call arbitrary Redis and RedisAI commands.

Justification
Users will be able to obtain useful information like INFO, AI.INFO keys and other helpful Redis commands in the python client.

Implementation Strategy
The C++ client has this functionality and it just needs to be exposed through the pyclient.cpp file to the python client layer.

Expose redis command CLUSTER INFO to python client

Description

Give the python client access to the redis command CLUSTER INFO by exposing it through the Pybindings. Here is the documentation for CLUSTER INFO. This is a portion of issue #1 .

Justification

Users will be able to obtain information about Redis Cluster vital parameters in the python client.

Implementation Strategy

The C++ client has this functionality so just needs to be exposed through the Pybindings and pyclient.cpp file to the python client layer. This will be a non-keyed command that takes in an address of a db node in the cluster. We may want to add a function to pick a db node that is active so that we don't have to initialize a new connection.

Improve compiled client test time

Description
Currently every compile test builds the entire client from source. It would be much more efficient to build the client as a library, and use that library for each client test. This would dramatically improve time to test the library

Add Fortran interface for ensemble prefixing

Description
The Fortran client misses the calls needed to run ensembles in a flexible way, using the current features which are available in the other clients.

Justification
This will help running ensembles in Fortran.

Implementation Strategy
Steps needed:

Fortran interface
Tests

Functions which still need a Fortran interface:

void use_tensor_ensemble_prefix(bool use_prefix);
void use_model_ensemble_prefix(bool use_prefix);
void set_data_source(std::string source_id);

Functions for which a Fortran interface exists (you might want to quickly review them):

bool poll_model(const std::string& name,
                int poll_frequency_ms,
                int num_tries);
bool poll_tensor(const std::string& name,
                 int poll_frequency_ms,
                 int num_tries);
bool model_exists(const std::string& name);
bool tensor_exists(const std::string& name);

Ideally a test would look like a simplified version of tests/cpp/client_test_ensemble.cpp but we don't need to run models and scripts. Just check that the keys are prefixed as expected for a tensor, a model, a script, and a dataset, if env vars SSKEYIN and SSKEYOUT are set.

Failed connection error not raised to Python

Description

If the C++ client fails to connects to a database, the failed connection is not raised as a std::runtime_error to the Python client. This should be changed as all other errors get raised as std::runtime_error to the Python client.

How to reproduce

Attempt to connect the Python client to a nonexistent database.

Expected behavior

A std::runtime_error should be raised.

System

All

Copy, delete, rename for Python client

Description

Currently the copy, delete, and rename operations for tensors and DataSet are not implemented in the Python client. These need to be implemented.

Justification

This is a requested feature in the Python client.

Implementation Strategy

These functions will follow the same behavior as the C++, C, and Fortran clients.

Python exists function error

Description

The Python client returns None instead of True or False.

How to reproduce

Any Python client that runs key_exists(), model_exists(), or tensor_exists() will run into this.

Expected behavior

The function should return a boolean answer.

System

This bug applies to all systems and is an internal Python client error.

Question about running example

I follow the instruction here: https://www.craylabs.org/docs/installation.html#smartredis to install SmartRedis. I think all of libs are successfully built. And then I compile the examples and no errors. However, when I run smartredis/examples/serial/cpp/build/smartredis_mnist, I get the error as below:

terminate called after throwing an instance of 'std::runtime_error'
what(): The environment variable SSDB must be set to use the client.
Aborted (core dumped)

Do I miss something when I install smartredis?

Upgrade RedisAI from 1.0.2 to 1.2-RC1

Currently we are pinning to RedisAI 1.0.2. However, there is a build failure in 1.0.2 related to dlpack not being pinned to v0.3. At this point, we should update to 1.2-RC1.

Implement functions for commands without keys

Description

The Command object and RedisServer object were originally written for keyed commands. However, as the ability to execute commands like INFO is added to the Python client (see #1 ), these objects need to handle non-keyed commands that might address a single database node or the entire cluster.

Justification

This is a prerequisite feature in order to be able to accomplish #1 which will give users better view of database characteristics and performance.

Implementation Strategy

The design changes to Command and RedisServer should allow for the following functionality:

A non-keyed command working for a Redis non-cluster database
A non-keyed command addressable to a single node in a cluster database
A non-keyed command addressable to all nodes in a cluster database (i.e. collect replies from all nodes in the database)

Add metadata functions to the Python DataSet

Description
The Python client should have the same metadata functionality that C/C++/Fortran has.

Pip install for the python client

Description
Make the python client "pip-installable"

Justification
Users should be able to easily pull down the python client for rapid testing and usage. This will also simplify the usage of the SILC python client within the infrastructure library.

Implementation Strategy
Most likely, there will need to be some custom build step for the silc client. We currently build the client through a command in the makefile make pyclient. This will need to be called during the pip install. For an example of how to do this, see how I did this for Arkouda Bears-R-Us/arkouda#153

Add Unicode support to client interface

Description

This enables users from all around the world to work in their native language/character set.

Justification

Users from non-English speaking countries will gain access to full character sets

Implementation Strategy

Easiest approach is to add parallel routines to the interface for Unicode support, then push the plumbing all the way through. Making sure that strings are tracked by type will be a good safety feature.

CUDNN error with multiple threads per queue with large synthetic scaling

Description
When scaling on horizon, at 200 client nodes (80 clients per node) and 16 db nodes, we get an error thrown by the redis database:

bool cask_cudnn::compareHandles<cask_cudnn::ConvolutionShader>(cask_cudnn::ConvolutionShader const*, cask_cudnn::ConvolutionShader const*) ()
   from /lus/cls01029/arigazzi/smartsim-dev/tm_improve/SmartSim/third-party/RedisAI/install-gpu/backends/redisai_torch/lib/libtorch_cuda.so

Decreasing the threads per queue to 1 seemed to fix this problem.

This was done with redisAI v1.0.2, and redis v6. Similiar issues were seen with the redisAI v1.2RC.

Client set_function Exception issue

Description
Originally, bad_function in test_errors.py had only the line:

raise Exception

However, with the new version of RedisAI, we get an error from RedisAI saying:

undefined value Exception:
  File "<string>", line 3
def bad_function(data):
    """Bad function which only raises an exception"""
    raise Exception
          ~~~~~~~~~ <--- HERE

We got around this by return False from the function instead (the exception is not actually required by the tests. However, we need to understand why this is the case.

Custom error handling

Description
Currently we throw std::runtime errors for all internal C++ client errors, including those that result for redis-plus-plus and hiredis errors. This was done initially for convenience so that we could easily handle them in the Python client. However, as client functionality grows, we need better visibility for users and developers into the errors being thrown so that better exception handling can be performed.

Implementation Strategy

Custom error classes should be implemented in core C++ client, and those errors should be wrapped into the Python client binding.

Select CPU or GPU for compiled client tests

Description
Currently, the compiled client tests have CPU specified in the source file. It might be worthwhile to specify CPU or GPU via the setup_test_env.sh file or some other mechanism so the user or developer can switch between CPU and GPU.

Incomplete or empty DataSet

Description

Currently the Client will look for the .tensors metadata field in the DataSetwhen Client.get_dataset() is called. However, a check should be performed to ensure that the field exists. Currently, a runtime_error is thrown if the field does not exist, meaning that a DataSet must contain tensors. This is not the intended behavior. A DataSet should be able to contain only tensors, only metadata, or nothing. The DataSet should have a user-facing function that can be used to check if a field exists in the DataSet. This can be used internally as well when Client.get_dataset() is executed.

How to reproduce

Perform a client.get_dataset() call on a DataSet that does not contain any tensors.

Expected behavior

The DataSet should allow for only tensors, only metadata, nothing, or tensors and metadata.

System

This behavior affects all systems

Update README

Description

The README, while not factually incorrect, does not supply a whole lot of information. In addition, we should have the same badges we use for SmartSim available in the SmartRedis repository.

Implement dataset_exists() for all clients

Description

Although this functionality is essentially the same as another _exists() command, the differences in naming have tripped up new users and new developers.

Justification

Newer users and developers will find the new routine easier to find when looking for this functionality. Further, if the implementation for dataset existence ever diverges, we'll have a start on retraining users to use the new approach.

This will require doc update as well.

Implementation Strategy

How should this feature be implemented? What will need to be done/included?

CommandReply constructors and operators for QueuedReplies

Description

As pipelines inside of SmartRedis are added to improve efficiency, code simplification would increase if CommandReply could be constructed from sw::redis::QueuedReplies and if the CommandReply [] operator could index into the sw::redis::QueuedReplies. This would allow merging and reuse of code between pipeline and non-pipeline commands.

Justification

Internal pipelines improve client efficiency, and to accelerate the implementation of pipelines CommandReply should work with QueuedReplies.

Implementation Strategy

All of the CommandReply operators and constructors should work with QueuedReplies.

C++ Code Cleanup

Description

Assorted code cleanup:

Put try/catch blocks around C++ code, mainly around new operator and memory allocation in constructors
Null pointer checks and pointer validation
Initialize all variables; do not pass uninitialized variables to other routines

Define default values for enums so that they can be passed in as initialized

Use copy constructors for all STL containers

A lot of code uses push_back instead of copy constructors

Use NULL, not 0, for pointers
NULL out pointers after deallocation
Switch from C-style casts to C++ casts
Make sure that const usage is correct and consistent
Make sure that error conditions are handled properly and locally
Make sure memory management strategy is fully document for callers
rename variables prefixed "g_" to avoid confusion with Hungarian notation (in which this would mean the variables are globals)
Refactor code in Client::poll_key(), Client::poll_model(), Client::poll_tensor()
Make an inline for prefix management in Client code
Unify memory management to all malloc/free or new/delete
Eliminate unnecessary Command::add_field() overload
Use standard iterators in Command::add_fields()
Replace while loops with for loops where it improves readability
Refactor duplicate code into MetaData::Clone()
Refactor MetaData::add_scalar() for efficiency
Check for type mismatch in MetaData::add_string()
Add default cases for all switch statements
Make sure compiler warnings are at the highest finickiness

Add sleep time to IOError

Right now we are not pausing between caught IOError attempts. This causes reliability issues when a database is momentarily unavailable because of high traffic. We should add a sleep to each catch block in RedisCluster.run() and Redis.run(). This sleep might be best as an adaptive function that waits longer each attempt.

Extend inverse CRC16 prefixing to all hash slots

Description

Currently the inverse CRC16 calculation is used to distribute models and scripts to all of the Redis cluster nodes by prefixing each model name and script name with a two character hash tag that forces the key into a hash slot owned by the database node. However, certain prefixes calculated contain special characters (i.e. "}") that prematurely conclude the hash tag, resulting in erroneous placement. Currently this is avoided by picking a different hash slot for the database node. However, to achieve full control over key placement, we need to extend the inverse prefix calculation to all hash slots.

Justification

This is important for multi-key commands where we need to group multiple keys into the same hash slot, but do not have the flexibility to pick from a range of hash slots.

Implementation Strategy

First an investigation into the frequency of problematic prefixes must be made. After that, a solution strategy can be formulated.

Python client unpack_tensor

Description
Compiled clients have unpack_tensor() to make memory management better (i.e. not have the client or dataset allocate memory and have that memory for the entire life of the object). However, Python does not have that. It seems like we could implement an analog of unpack_tensor() if the user allocated a numpy array ahead of time. This could probably be accomplished with pybind11::array::mutable_data() which appears to return a c-ptr to the underlying numpy array data. Since numpy defaults to c-contiguous memory, we should just be able to fill that (although none of this has been tested).

However, it would be more pythonic to not that the user allocate the numpy array, but just return a numpy array to them that relies on python reference counting to control scope. The path to implement this is a little less clear.

Add support for Fortran complex numeric types

Description

Fortran has built-in support for complex numeric types, and these are used in some simulations. We should plumb support for these through SmartRedis.

Justification

Fortran users who do simulations with complex types will benefit

Implementation Strategy

Paralleling existing numeric types should get us most of the way there once we figure out the in-memory representation of the complex types. There may need to be some back-end support to handle them if the data types are larger than what is supported today.

Remove output related to TimeoutError and IOError

Description

Currently warnings are printed if an IOError or TimeoutError are encountered when executing a command. This can lead to too much output when many clients are active and it is hard to read. We will only have output on failed execution after all attempts are made.

Justification

This will improve user experience.

Implementation Strategy

We will remove the warning output.

Add SILC install path to setup_env.sh

Description
Add SILC_INSTALL_PATH to setup_env.sh to make it easier for outside applications to reference the silc static library.

DockerFile

Description

SmartRedis should have a base Dockerfile that users can look at and use as a template for including with their containerized jobs.

Justification

Any user wanting to run a containerized workload with SmartRedis will be able to use the Dockerfile to build an image and include that image in their container.

Implementation Strategy

Write a DockerFile for SmartRedis that

Installs dependencies
builds the static lib into /usr/local/lib
build the python client and installs it into python environment.

CommandList and Command copy and move operators

Description

CommandList is used to group multiple commands to be executed by RedisCluster or Redis. The Command objects contained in the CommandList are dynamically allocated and currently only pointers to the dynamically created Command objects are stored. The copy and move operators for the CommandlList are not implemented. These operators should be defined and implemented.

Justification

As issue #43 is addressed, the CommandList object should be completed. The default operators defined by the compiler may not produce the intended behavior (i.e. copy operations would be a shallow copy of the underlying Command). Default move operators should be sufficient, but should be specified as such in the header with default.

Implementation Strategy

Implement copy operators for Command
Implement copy operators for CommandList
Define CommandList move operators as default
Define and document behavior of copy operators

Internal pipeline for run_model and run_script commands

Description
When executing run_model and run_script with a Redis cluster, data movement is needed to make sure that the input and output tensors conform to the hash slot data locality requirements for Redis. As a result, the run_model() and run_script() commands actually consist of multiple Redis commands. Pipelining should make these multiple commands more efficient. Note that this is not a feature to expose pipelining the user.

Justification
Pipelining the run_model and run_script API calls should increase performance.

Implementation Strategy
This feature can be implemented with the pipelining features in underlying redis libraries.

Tensor memory management in Python

Description

Currently the Python client uses SmartRedis::get_tensor() inside of the PyClient::get_tensor(). This means, that the SmartRedis::Client is holding onto dynamically allocated memory even though it is copying that memory into a PyArray. This could cause out of memory errors if PyClient::get_tensor() is used inside of loops with large arrays. Currently there is no analog to unpack_tensor() in the Python client.

Justification

It is likely users will use PyClient::get_tensor() in a loop, and we need to make sure memory management is sufficient in the Python client.

Implementation Strategy

There are two routes to solve this problem:

We implement an unpack_tensor() function in Python that mimics the C++, C, and Fortran clients. This does not seem very Pythonic, however, as Python users won't really want to create a numpy array ahead of time.
We change the behavior of Pyclient::get_tensor() to release the memory allocated by Client after it has been copied to the numpy array.

it seems like option 2 is the better behavior for Python users, but it would mean that get_tensor() would have different memory management behavior in C/C++/Fortran and Python. We would need to call this out explicitly in the documentation.

This ticket will involve change the PyClient code and adding protected methods to SmartRedis::Client to allow for the release of memory.

DataSet rename() and delete() CROSSSLOT error with ensembles

Description

Currently the DataSet ack key is the set to the name of the DataSet. In the case where the client is used in conjunction with SmartSim ensemble capabilities (e.g. SSKEYIN and/or SSKEYOUT are set), there will be an error because the ack key will take the form ensemble_name.dataset_name while all other keys are of the form {dataset_name}.meta or {dataset_name}.tensor_name. On clusters, this will cause a CROSSSLOT error because the ack key will not be in the same hash slot as the other fields.

A short-term fix for this problem could be just to separate the manipulations of the ack key for rename and delete such that there is no CROSSLOT error. This works for the current implementation because DataSet put and get commands internally are not pipelined, so we do not need to ensure everything goes to the same hash slot or database node. However, in the future we should have all DataSet entities go to the same hash slot. Possible future (more robust) solutions include:

Put the ack key/value into the redis hashes object that stores all of the DataSet metadata.
Get rid of the ack key and use the redis hash object key as the ack key.

If this either of the more robust fixes is implemented, the process for checking the existence of a DataSet will need to be changed.

How to reproduce

Export the following variables before running the client_test_delete_dataset test:

export SSKEYIN="test_ensemble"
export SSKEYOUT="test_ensemble"

This will produce terminating with uncaught exception of type sw::redis::ReplyError: CROSSSLOT Keys in request don't hash to the same slot.

Expected behavior

The ack key should work correctly with and without the ensemble functionality activated.

System

All systems

Incorrect order of library CMakeLists.txt include paths

Description

Currently, the order of include paths in CMakeLists.txt is:

include_directories(SYSTEM
   install/include
   include
   src/fortran
)

However, this means that install/include will be search before install. This is problematic for developers because when repeatedly building the library after modifying header files, the files in install/include will be used for the build. Currently, the install directory needs to be deleted to build the library after modifying header files.

The order of includes should be:

include_directories(SYSTEM
   include
   install/include
   src/fortran
)

How to reproduce

make lib
// modify header file
make lib

Expected behavior

The modified header file in include should be used for the build.

System

All OS and systems.

Expose redis command FLUSHALL in python client

Description

Give the python client access to the redis command FLUSHALL by exposing it through the Pybindings. Here is the documentation for FLUSHALL . This is a portion of issue #1 .

Justification

Users will be able delete all the keys of all the existing databases from the python client.

Implementation Strategy

The C++ client has this functionality so just needs to be exposed through the Pybindings and pyclient.cpp file to the python client layer. This will be a non-keyed command that takes in an address of a db node. As with issue #90, we may want to add a function to pick a db node that is active so that we don't have to initialize a new connection.

RedisAI WRONGTYPE error in Fortran test

Description
On some systems, when running the Fortran test suite, client_test_misc_tensor fails and the following error is returned by RedisAI:
WRONGTYPE Operation against a key holding the wrong kind of value

How to reproduce
Running the Fortran test suite on some systems.
Notice that this can easily be reproduced by first storing a string variable and then trying to store a tensor with the same key. This is happening in the Fortran suite (across two tests), and some changes to the used keys is needed.

Expected behavior
The Fortran suite should run without errors.

System

OS: RHEL
OS version: 8.1
Commit/version of library: develop
Workload manager (e.g. PBS, Slurm): Slurm
Workload manager version: 20.11.2

Test ONNX backend

Description

RedisAI supports ONNX as a backend. However, only PyTorch is tested in the SmartRedis unit tests: test_model_methods_torch.py. We should have tests that ensure ONNX functionality is correct in the client.

Justification

There needs to be SmartRedis tests that ensure ONNX functionality is implemented correctly.

Implementation Strategy

The API functions below should be tested. A potential test case could use a sckit-learn random forest model. See RedisAI examples for scikit-learn examples.

Client.set_model_from_file()
Client.set_model()
Client.run_model()
Client.get_model()
Test that the ONNX backend works with ensemble functionality for prefixing.

Rework error handling scheme

Description

The current error handling scheme consists of throwing exceptions whenever a problem is detected. While this works fine for C++ and Python clients, C and Fortran clients have no way to detect, catch, or interpret an exception. All exceptions will thus be immediately fatal. We need a softer approach that is suitable for these clients.

Justification

See above; there is essentially no ability to recover from errors in the C or Fortran client today

Implementation Strategy

The standard approach in C libraries would be creating an enumeration of potential errors that can be returned as a response code to API calls, with something like ERROR_NONE to indicate that an operation proceeded to normal completion. The API layers could then catch exceptions thrown in the higher layers, translate, and return the error. The text associated with exceptions could be saved and returned through a new get_last_error_message() command. I believe that the Fortran client could be channeled through the C client interface to allow exception catching.

Include examples in documentation

The examples directory contains simple programs showing how one can use the client in each supported language. In particular, the C++ and Fortran examples use MPI, which is not used in tests.

The documentation does not mention the examples anywhere, and we want to fix this by adding the examples to the documentation, in such a way that if examples are changed, the documentation is automatically updated.

craylabs / smartredis Goto Github PK

smartredis's People

Contributors

Stargazers

Watchers

Forkers

smartredis's Issues

Description

Justification

Implementation Strategy

Description

How to reproduce

Expected behavior

System

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

How to reproduce

Expected behavior

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

How to reproduce

Expected behavior

System

Description

Justification

Implementation Strategy

Description

How to reproduce

Expected behavior

System

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

How to reproduce

Expected behavior

System

Description

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

Justification

Implementation Strategy

Description

How to reproduce

Expected behavior

System

Description