Git Product home page Git Product logo

aihwkit's Introduction

IBM Analog Hardware Acceleration Kit

PyPI Documentation Status Build Status PyPI - License arXiv

Description

IBM Analog Hardware Acceleration Kit is an open source Python toolkit for exploring and using the capabilities of in-memory computing devices in the context of artificial intelligence.

⚠️ This library is currently in beta and under active development. Please be mindful of potential issues and keep an eye for improvements, new features and bug fixes in upcoming versions.

The toolkit consists of two main components:

Pytorch integration

A series of primitives and features that allow using the toolkit within PyTorch:

  • Analog neural network modules (fully connected layer, 1d/2d/3d convolution layers, LSTM layer, sequential container).
  • Analog training using torch training workflow:
    • Analog torch optimizers (SGD).
    • Analog in-situ training using customizable device models and algorithms (Tiki-Taka).
  • Analog inference using torch inference workflow:
    • State-of-the-art statistical model of a phase-change memory (PCM) array calibrated on hardware measurements from a 1 million PCM devices chip.
    • Hardware-aware training with hardware non-idealities and noise included in the forward pass to make the trained models more robust during inference on Analog hardware.

Analog devices simulator

A high-performant (CUDA-capable) C++ simulator that allows for simulating a wide range of analog devices and crossbar configurations by using abstract functional models of material characteristics with adjustable parameters. Features include:

  • Forward pass output-referred noise and device fluctuations, as well as adjustable ADC and DAC discretization and bounds
  • Stochastic update pulse trains for rows and columns with finite weight update size per pulse coincidence
  • Device-to-device systematic variations, cycle-to-cycle noise and adjustable asymmetry during analog update
  • Adjustable device behavior for exploration of material specifications for training and inference
  • State-of-the-art dynamic input scaling, bound management, and update management schemes

Other features

Along with the two main components, the toolkit includes other functionalities such as:

  • A library of device presets that are calibrated to real hardware data and based on models in the literature, along with a configuration that specifies a particular device and optimizer choice.
  • A module for executing high-level use cases ("experiments"), such as neural network training with minimal code overhead.
  • A utility to automatically convert a downloaded model (e.g., pre-trained) to its equivalent Analog model by replacing all linear/conv layers to Analog layers (e.g., for convenient hardware-aware training).
  • Integration with the AIHW Composer platform, a no-code web experience that allows executing experiments in the cloud.

How to cite?

In case you are using the IBM Analog Hardware Acceleration Kit for your research, please cite the AICAS21 paper that describes the toolkit:

Malte J. Rasch, Diego Moreda, Tayfun Gokmen, Manuel Le Gallo, Fabio Carta, Cindy Goldberg, Kaoutar El Maghraoui, Abu Sebastian, Vijay Narayanan. "A flexible and fast PyTorch toolkit for simulating training and inference on analog crossbar arrays" (2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems)

Usage

Training example

from torch import Tensor
from torch.nn.functional import mse_loss

# Import the aihwkit constructs.
from aihwkit.nn import AnalogLinear
from aihwkit.optim import AnalogSGD

x = Tensor([[0.1, 0.2, 0.4, 0.3], [0.2, 0.1, 0.1, 0.3]])
y = Tensor([[1.0, 0.5], [0.7, 0.3]])

# Define a network using a single Analog layer.
model = AnalogLinear(4, 2)

# Use the analog-aware stochastic gradient descent optimizer.
opt = AnalogSGD(model.parameters(), lr=0.1)
opt.regroup_param_groups(model)

# Train the network.
for epoch in range(10):
    pred = model(x)
    loss = mse_loss(pred, y)
    loss.backward()

    opt.step()
    print('Loss error: {:.16f}'.format(loss))

You can find more examples in the examples/ folder of the project, and more information about the library in the documentation. Please note that the examples have some additional dependencies - you can install them via pip install -r requirements-examples.txt. You can find interactive notebooks and tutorials in the notebooks/ directory.

Further reading

We also recommend to take a look at the tutorial article that describes the usage of the toolkit that can be found here:

Manuel Le Gallo, Corey Lammie, Julian Buechel, Fabio Carta, Omobayode Fagbohungbe, Charles Mackin, Hsinyu Tsai, Vijay Narayanan, Abu Sebastian, Kaoutar El Maghraoui, Malte J. Rasch. "Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference" (APL Machine Learning Journal:1(4) 2023)

What is Analog AI?

In traditional hardware architecture, computation and memory are siloed in different locations. Information is moved back and forth between computation and memory units every time an operation is performed, creating a limitation called the von Neumann bottleneck.

Analog AI delivers radical performance improvements by combining compute and memory in a single device, eliminating the von Neumann bottleneck. By leveraging the physical properties of memory devices, computation happens at the same place where the data is stored. Such in-memory computing hardware increases the speed and energy efficiency needed for next-generation AI workloads.

What is an in-memory computing chip?

An in-memory computing chip typically consists of multiple arrays of memory devices that communicate with each other. Many types of memory devices such as phase-change memory (PCM), resistive random-access memory (RRAM), and Flash memory can be used for in-memory computing.

Memory devices have the ability to store synaptic weights in their analog charge (Flash) or conductance (PCM, RRAM) state. When these devices are arranged in a crossbar configuration, it allows to perform an analog matrix-vector multiplication in a single time step, exploiting the advantages of analog storage capability and Kirchhoff’s circuits laws. You can learn more about it in our online demo.

In deep learning, data propagation through multiple layers of a neural network involves a sequence of matrix multiplications, as each layer can be represented as a matrix of synaptic weights. The devices are arranged in multiple crossbar arrays, creating an artificial neural network where all matrix multiplications are performed in-place in an analog manner. This structure allows to run deep learning models at reduced energy consumption.

Awards and Media Mentions

Installation

Installing from PyPI

The preferred way to install this package is by using the Python package index:

pip install aihwkit

Conda-based Installation

There is a conda package for aihwkit available in conda-forge. It can be installed in a conda environment running on a Linux or WSL in a Windows system.

  • CPU

    conda install -c conda-forge aihwkit
  • GPU

    conda install -c conda-forge aihwkit-gpu

If you encounter any issues during download or want to compile the package for your environment, please take a look at the advanced installation guide. That section describes the additional libraries and tools required for compiling the sources using a build system based on cmake.

Docker Installation

For GPU support, you can also build a docker container following the CUDA Dockerfile instructions. You can then run a GPU enabled docker container using the follwing command from your peoject dircetory

docker run --rm -it --gpus all -v $(pwd):$HOME --name aihwkit aihwkit:cuda bash

Authors

IBM Research has developed IBM Analog Hardware Acceleration Kit, with Malte Rasch, Diego Moreda, Fabio Carta, Julian Büchel, Corey Lammie, Charles Mackin, Kim Tran, Tayfun Gokmen, Manuel Le Gallo-Bourdeau, and Kaoutar El Maghraoui as the initial core authors, along with many contributors.

You can contact us by opening a new issue in the repository or alternatively at the [email protected] email address.

License

This project is licensed under Apache License 2.0.

aihwkit's People

Contributors

abinandn avatar anu-pub avatar bobo-nums avatar cbjuan avatar charlesmackin avatar coreylammie avatar diego-plan9 avatar fabio-83 avatar filemaster avatar hcy-11 avatar heatphoenix avatar ihiaadj avatar imgbot[bot] avatar jo7701 avatar jubueche avatar kaoutar55 avatar kkvtran avatar maljoras avatar matifali avatar prb112 avatar ronmoore3 avatar soumyajeet8 avatar stevemar avatar todd-deshane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aihwkit's Issues

Add weight drift

Description and motivation

Add the possibility to drift the weights during training with a power-law (similar to decay weights).

Proposed solution

Alternatives and other information

Add more types of resistive devices

Description and motivation

The underlying C++ simulator has a number of has a number of resistive device types that have not yet been exposed in the Python layers, in particular:

  • Difference
  • LinearStep
  • Vector
  • Transfer
  • ExpStep

Making them available would facilitate using the simulator and experimenting with its features.

Proposed solution

Expose the remaining types of resistive devices, taking as basis the way ConstantStep has been exposed through the different layers.

Alternatives and other information

Enhance installation in conda environments

Description and motivation

Currently, the build system and documentation assume that the package will be installed in a pure-python environment. However, some of the steps could be simplified for conda users, taking advantage of some of the pre-packaged dependencies.

Proposed solution

Revise the install instructions, and the cmake configuration.

Alternatives and other information

Allow .to() usage for analog layers

Description and motivation

Currently, the .to() method of analog layers is not fully functional. At the moment the recommended way of moving layers to GPU is via .cuda() directly. Ideally, we should support:

  • moving the layers back to cpu
  • seamless usage of .to() with both devices

Proposed solution

The .apply() and ._apply() methods used internally in torch for that purposes are likely to make the implementation tricky, as they are meant to recursively operate only on the layer Parameters and Buffers only. We should evaluate whether it is feasible to fully tackle it without resorting to turning the Tile into a Tensor-like structure (which is likely desirable, but longer term) - as a first stage, focusing on AnalogSequential, where we have more control over the recursion, can be an option.

Alternatives and other information

Remove the numpy-specific tile

Description and motivation

During the initial stages of the toolkit, a numpy-based tile was created, as an intermediate step for using numpy as the preferred format for the structures that are used between the layers. As it has been a while since the tile is strictly needed and both rpucuda and the upper layers have been centered around torch tensors, it is a good time for finally removing the tile.

Proposed solution

Remove the numpy tiles and its tests.

Alternatives and other information

CUDA 11 build compatibility

Description and motivation

It seems that CUB is included along the CUDA Toolkit since version 11, which can cause issues during build (thanks @chaeunl for the valuable feedback and troubleshooting!):

$ python setup.py install -DUSE_CUDA=ON -DRPU_CUDA_ARCHITECTURES="80"
[ 14%] Built target cub
[ 50%] Built target RPU_CPU
[ 51%] Building CUDA object CMakeFiles/RPU_GPU.dir/src/rpucuda/cuda/bit_line_maker.cu.o
In file included from /usr/local/cuda-11.1/targets/x86_64-linux/include/thrust/system/cuda/detail/execution_policy.h:33:0,
                 from /usr/local/cuda-11.1/targets/x86_64-linux/include/thrust/iterator/detail/device_system_tag.h:23,
                 from /usr/local/cuda-11.1/targets/x86_64-linux/include/thrust/iterator/detail/iterator_facade_category.h:22,
                 from /usr/local/cuda-11.1/targets/x86_64-linux/include/thrust/iterator/iterator_facade.h:37,
                 from /.../aihwkit/_skbuild/linux-x86_64-3.8/cmake-build/cub-prefix/src/cub/cub/iterator/arg_index_input_iterator.cuh:48,
                 from /.../aihwkit/_skbuild/linux-x86_64-3.8/cmake-build/cub-prefix/src/cub/cub/device/device_reduce.cuh:41,
                 from /.../aihwkit/_skbuild/linux-x86_64-3.8/cmake-build/cub-prefix/src/cub/cub/cub.cuh:53,
                 from /.../aihwkit/src/rpucuda/cuda/bit_line_maker.cu:24:
/usr/local/cuda-11.1/targets/x86_64-linux/include/thrust/system/cuda/config.h:78:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
#error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.

Proposed solution

We should revise the using of CUB in the build system. Currently, we make an attempt to find it, and if not possible, we automatically download and include the package. This might just not be needed entirely for cuda 11 (as it might be included in the default cuda header paths), or the THRUST_IGNORE_CUB_VERSION_CHECK flag might allow for bypass the check and use the downloaded version (which might not be ideal, though).

Alternatives and other information

Weight initialization

Description

There are two issues on the weight initialization:

  1. It seems that aihwkit follows He's initialization, but the max/min bound is not correct
  2. For some memory devices, they have their own bounds and the bounds could be smaller than the bounds of Xavier's initialization.

How to reproduce

  1. From [1], . For instance, aihwkit.nn.AnalogLinear has 256 fan-ins and 128 fan-outs, then the range of weight should be -0.12 to 0.12. But, as you can see underneath, the range does not match (instead, half of the max/min value).
    cap
    So, I think it would be better to modify the allowed range of weights or you can also support various type of initialization methods.

  2. The underneath figure shows the response curve of a aihwkit.simulator.configs.devices.LinearStepDevice whose slope_up and slope_down are 0.0083.
    image
    As you might expect, if the number of neurons increases, then the allowed range of a device would be match the range of initial weights. Although it is rare, somebody will also report this issue in the future. I don't have concrete ideas to this issue, but it is a good alternative if people can modify the initial weights easily.

[1] Kaiming He, et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015.

Expected behavior

Other information

  • Pytorch version: 1.8.0
  • Package version:
  • OS: Ubuntu 18.04.2
  • Python version: 3.8
  • Conda version (or N/A):

-D BUILD_TEST=ON fails during compilation

Description

Currently, it is not possible to use the BUILD_TEST_ON flag during compilation, as it results in linking errors. The culprit seems to be that the googletest library is compiled using a different libstdc++ ABI (old vs new) than pytorch on most of the distributions and systems.

How to reproduce

Compiling with -DBUILD_TEST=ON.

Expected behavior

Other information

As a workaround, compiling manually googletest in order to use the right ABI should reenable compilation of the tests.

The usage of TransferCompound device

Description

In the docs, I think it is not clear which device is "transferred" or "transferring" Following the paper [1], there are two types of array: A for accumulation of gradients and C for weights of neural network. When defining devices by the parameters, unit_cell_devices, we have no idea about which index of the device corresponds to A and C. As I looked over the C++ source code, 0-th index of unit_cell_devices corresponds to A and 1-st index to C.

How to reproduce

Expected behavior

I hope you to describe the details at the usage of unit_cell_device.

Other information

[1] Tayfun Gokmen and Wilfried Hanesch, Algorithm for Training Neural Networks on Resistive Device Arrays, Frontiers in Neuroscience, 2020.

  • Pytorch version: 1.8.0
  • Package version:
  • OS: Ubuntu 18.04.2
  • Python version: 3.8
  • Conda version (or N/A):

Serialization of children analog layers

Description

Currently, if a model makes use of analog layers as children layers, it seems serialization is not handled properly:

load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
	Unexpected key(s) in state_dict: "analog_tile_state"

How to reproduce

Attempt to load the state dict of a model that has analog children layers.

Expected behavior

Other information

It seems the issue is related to the way the custom analog_tile_state is being used in the state dicts, causing it to be appended always to the top-level instead of respecting the prefix applied to the children layers.

Document aihwkit.simulator.rpu_base in readthedocs

Description and motivation

Currently, aihwkit.simulator.rpu_base is not properly displayed in the API reference. This is due to not being able to compile the simulator in the readthedocs build environment, as it does not allow the same flexibility when installing external packages (and at the moment the bindings need to be compiled in order to produce its documentation correctly).

It would be great to have online documentation for that module as well, in the same way as the rest of the modules.

Proposed solution

Alternatives and other information

We might have some luck using .pyi stubs - even if they seem not fully supported yet in sphinx (sphinx-doc/sphinx#4824), there are pointers for other options such as https://github.com/readthedocs/sphinx-autoapi.

Add new optimizer: Adam

Description and motivation

As part of widening the scope of the library, it would be great to include more analog optimizers that can be used in different context and use cases. One of the likely first candidates would be one of the Adam variations due its popularity.

Proposed solution

In a similar way to AnalogSGD, create a new subclass of the chosen torch optimizer, customizing its behavior when passing through the analog layers.

Alternatives and other information

  • As this would be the second optimizer, some refactoring is also a likely candidate in order to generalize the common functionality for both optimizers (thank Corey for the suggestion!) - in particular, regroup_param_groups() seems like a good candidate for a generic function to be shared by all analog optimizers through a base class or similar means.
  • Longer term, we should combine this approach with others that allow us to "adapt" other non-analog optimizers more easily - for example, making the tiles behave more like tensors or dividing the optimizers between analog and non-analog. These approaches are a bit more far-fetched

Avoid osx compilation warnings

Description and motivation

Under recents version of osx/xcode, a couple of warnings are emitted during compilation. While they seem harmless, it would be nice to tackle them in an effort to keep the output readable and not hide other (future) warnings.

Proposed solution

Revise the output of compilation in osx, and update the C++ sources to fix the warnings.

Alternatives and other information

Revise modifier types test

Description and motivation

As a follow-up to #90, it would be nice to revise test_post_forward_modifier_types in order to be able to run each test independently.

Proposed solution

Alternatives and other information

ImportError: cannot import name 'AnalogSGD' from 'aihwkit.optim'

Description

I tried to run the py files under tests/ and examples/ but it will return an error: ImportError: cannot import name 'AnalogSGD' from 'aihwkit.optim'

How to reproduce

python tests/test_utils.py

Expected behavior

Other information

Follow the installation guide and python version is 3.7.3

  • Pytorch version:
  • Package version:0.2
  • OS:MAC Catalina
  • Python version: 3.7.3
  • Conda version (or N/A):

Revise Python 3.9 compatibility

Description and motivation

As Python 3.9 has been released for some reasonable amount of time, we should ensure that the package is fully compatible with it and update the tooling (travis, etc) and other meta-information.

Proposed solution

Test 3.9 compatibility and provide fixes as needed.

Alternatives and other information

Add visualization utilities

Description and motivation

It would be nice to add a small visualization module that provides convenient plotting functions, which would help users exploring the different RPU configurations and other core entities.

Proposed solution

Add plotting functions as part of a module, which would also serve as starting an "utilities" sub-package aimed at providing convenient tools and helpers on top of the core functionality.

Alternatives and other information

Refactor common functionality of AnalogConv layers

Description and motivation

The analog convolution layers (AnalogConv1d, AnalogConv2d and AnalogConv3d) share quite a good chunk of functionality. As we now have enough of them, it would be a good time to generalize their shared pieces, in order to avoid code repetition.

Proposed solution

The digital Conv classes already share a common ancestor in PyTorch with some interesting reusing. It can serve as inspiration or taken into account for coming up with the generalization.

Alternatives and other information

Add Windows build support

Description and motivation

Currently, the build system has not been checked and adjusted in order to support Windows-based systems. While in theory cmake should help us in the process, it has not been tested and it is likely lacking some Windows-specific adjustments. We should revise the build system in order to allow easy compilation on Windows machines as well, as well as updating the documentation accordingly if needed.

Proposed solution

Alternatives and other information

Travis Windows support can help in automating and verifying the changes.

Add logging support

Description and motivation

As the toolkit evolves, there are some cases where it would be good to have a system of emitting information of several levels, in order to help debugging and allow users to chose the granularity of displaying that information.

Proposed solution

Start a convention for logging and making use of the logging module sparingly, identifying some cases where producing output would be useful.

Alternatives and other information

Specifying AnalogLinear or AnalogConv2d to run on cuda

Thank you for you guys to give detailed replies for any issues.

I am now composing a neural network which consists of AnalogLinear modules. But, when I am trying to specify the device, I have to specify each AnalogLinear module to run on cuda, using .cuda(). As you know, PyTorch modules are automatically run on cuda when users specify the network to run on cuda. But, with AnalogLayer or AnalogConv2d, without specifying each layer to use cuda, the error message occurs.

I am not sure that this improvement requires efforts. It would be convenient if each module automatically run on cuda by specifying it on the network level.

Pytorch 1.7 compatibility

Description

As pytorch 1.7 was released yesterday, we should check the compatibility with the newer version.

Other information

At a glance:

  • it seems that other than an extra CUDA warning when running the tests if the host does not have a recent cuda version, the test suite is running fine.
  • if installing via the wheels, an error at linking time might be present.

Revise bindings docstrings

Description and motivation

While working on #115 , it seems the documentation for some functions in the bindings might have fallen out of sync since the latest updates. It would be nice to double-check that the documentation is accurate, as for the bindings we cannot take advantage of pylint and the tooling.

Proposed solution

Check that the docstrings blocks match the current function signatures (latest updates have been #102
and #115), and update them accordingly.

Alternatives and other information

Enhance decay_weights in AnalogSGD

Description and motivation

Currently there are some aspects of using decay_weights in AnalogSGD that are a bit different from the standard SGD. It would be great if we could refine the remaining aspects and revise the current documentation in order to highlight the differences and potential pitfalls to users, and streamline a bit its usage.

Proposed solution

Alternatives and other information

This is meant to be a small, "refinement" issue - not meaning to include bigger changes to the optimizer.

Would there be any way of fixing the value of specific weights in nn.AnalogLinear?

Hi! I'm trying to modify the values of specific weights in nn.AnalogLinear layer.
I'm currently using MLP (784-256-128-10) and each layer is made up of nn.AnalogLinear
I know how to access each weight values and initialize, but what I want to do is not only to modify the values but to fix the values for the entire training process, without effecting the gradients.
Would there be any way of doing this using AnalogLinear?

(Sorry for not properly adding the label 'question', that I'm so new to this github system..!)

More informative detection of dependency versions

Description and motivation

While we have a manageable number of build dependencies and we list the required versions in the docs, for some of them we don't explicitly check during the cmake build, which can lead to obscure and not easy to debug problems during compilation.

By checking the versions at compile time we would be able to troubleshoot quicker, and overall provide a smoother experience when building. One particularly tricky case is pybind11, which is lagging behind a bit on some distributions and might go unnoticed until the final stages of the compilation.

Proposed solution

Make use of cmake commands for defining the minimal versions, or other means that allow failing the build early if unsupported versions are detected.

Alternatives and other information

Include analog device presets

Description and motivation

While using the family of RPUConfig devices and adjust their parameters manually offers a great deal of flexibility, for some use cases being able to start with a set of pre-defined interesting configurations would be useful. This would help the users being familiar with the different options, serve as a basis for exploring the toolkit, and allow using "well-known" configurations off the shelf for a number of purposes.

Proposed solution

Add a number of curated analog device presets to the package, representing devices in existing literature or other specific configurations that seem relevant or interesting.

Alternatives and other information

Get error when trying with Windows 10:

with open(version_path) as version_file:

I got an error message "UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2 in position 3855: illegal multibyte sequence." This case is solved by adding encoding="UTF-8" when opening file.
(i.e., open(version_path, encoding="UTF8"))

Same as issues occurred at line#37.

Add analog LSTM layer

Description and motivation

Continuing the approach of adding new analog layers to the library that was recently tackled in #47 , it would be interesting to include an analog LSTM layer.

Proposed solution

Follow the same approach used by the previous layer, by adding a new class that sticks to the interface of the existing pytorch digital counterpart, and making use of AnalogModuleBase for the setting up of the tile and rest of analog-specific tweaks.

Alternatives and other information

This issue also implies revising if other parts of the library need to be adjusted in order to support LSTM.

Revise forward_indexed for CUDA

Description

During some tests using the CUDA-compiled version of the library, the following error is displayed:

TypeError: forward_indexed(): incompatible function arguments. The following argument types are supported:

It seems the forward_indexed for the rpu_base_tiles_cuda.cpp binding needs to be revised in order to match its CPU counterpart.

How to reproduce

Expected behavior

Other information

  • Pytorch version: 1.7.1
  • Package version: master
  • OS: Linux (CUDA 11.2)
  • Python version: 3.9
  • Conda version (or N/A): N/A

Add additional examples

Description and motivation

As part of improving the documentation and describing the usage of the different features, it would be ideal to include more examples in the examples/ directory.

Proposed solution

Alternatives and other information

Error encountered when: Trying to import aihwkit inside Jupyter Notebook

Description

Hello, from my last comment about the jupyter notebook in a few previous issues ago I said I would post about the Jupyter notebook error with aihwkit. Please see the attached output below.

Screen Shot 2020-11-16 at 1 26 35 PM

How to reproduce

So basically, I've followed the advanced and development guides in starting the virtual environment for building aihwkit from scratch and that works (with conda managing the dependencies). But now I would like to run the example codes in Jupyter Notebook but now I'm getting an aihwkit import error (Symbol not found).

python3 -m venv aihwkit_env
cd aihwkit_env/
source bin/activate
cd aihwkit
git clone https://github.com/IBM/aihwkit.git

*** So after I've done the above steps and installed the dependencies with conda (conda activated also) and build the aihwkit with setup.py I started a Jupyter Notebook and put that notebook inside the "aihwkit" folder (where the python venv and conda environment are). Then after trying to run the example code I get the import error.

Note: I have both conda activated and python venv activated within this folder (aihwkit_env)

Is there a path I'd have to set for the Jupyter Notebook?

Expected behavior

The expected behavior is that Is should be able to run the code with no problem as I have already built the project from scratch with setup.py

Other information

  • Pytorch version: 1.60
  • Package version: aihwkit 0.2.0
  • OS: OSX 10.15.7
  • Python version: 3.8.6
  • Conda version (or N/A): 4.9.2

Simplify and streamline testing of multiple tiles and devices

Description and motivation

In a relatively large number of cases, we have tests that perform the same check but vary in regards to the tile or device that they test. Currently, each test file handles the variation in a "similar but not quite identical" way, mostly relying on TestCase inheritance and mixins.

For maintainability, it seems we could extract the functionality they have in common and streamline it a bit, making it easier to add new tests and update existing ones. This will become specially relevant as the number of tiles/devices/layer types evolve, for example during #6 .

Proposed solution

There are several approaches that can be combined:

  • unifying the get_tile variations into a factory-like method
  • making use of a parametrizing library

Alternatives and other information

Flaky test in test_inference_tiles.py

Description and motivation

There is a test that seems to be failing occasionally:

tests/test_inference_tiles.py:160: in test_post_forward_modifier_types
    self.assertNotAlmostEqualTensor(x_output, x_output_post)
    tests/helpers/testcases.py:36: in assertNotAlmostEqualTensor
    assert_raises(AssertionError, self.assertTensorAlmostEqual, tensor_a, tensor_b)
E   AssertionError: AssertionError not raised by assertTensorAlmostEqual

Proposed solution

Check the test and update it as needed - it seems an issue with the test rather than with the code.

Alternatives and other information

Support basic hardware-aware training

Description and motivation

As part of the next round of features, we should enable hardware-aware training (albeit in a basic form that can be expanded in later iterations).

Proposed solution

Provide an interface for applying during forward pass:

  • simple additive noise
  • DACs, ADCs applied during forward pass

Alternatives and other information

Provide CUDA-enabled wheels

Description and motivation

For the initial 0.1.0 release, the pre-packaged wheels are compiled without GPU support. This requires the users to explicitly compile the library enabling CUDA during the compilation, which can be a non-trivial process - while we have some documentation in place, distributing GPU-enabled wheels in upcoming releases would be more convenient.

Proposed solution

Update the wheel build process in order to enable the USE_CUDA (-DUSE_CUDA=ON) flag.

Alternatives and other information

Segmentation fault (core dumped)

Description

Segmentation fault (core dumped) during rebuilding a neural network model.
This is sample code:

def create_model(k):
    model = AnalogSequential(AnalogLinear(784, k), nn.Sigmoid, AnalogLinear(k,10)).cuda()
return model

x = [100,200,300]
for i in x:
    ...
    model = create_model(i)
    ...

When I run the above code, after the first iteration, I got the message, "segmentation fault (core dumped)".
If I replace AnalogSequential with nn.Sequential and AnalogLinear with nn.Linear, it works. So, I think PyTorch and cuda are not the cause of this issue. In addition, if I run without cuda(), it also works.

How to reproduce

Expected behavior

Other information

  • Pytorch version: 1.7.1
  • Package version: git clone
  • OS: Ubuntu 18.04.2
  • Python version: 3.8
  • Conda version (or N/A):
  • NVIDIA Graphic Driver: 450.51.05
  • CUDA Version: 11.0
  • GPU: RTX 6000

Issues on installation with latest CUDA toolkit and PyTorch

Description

Hello,
For several reasons, I am trying to update my CUDA toolkit and NVIDIA driver. As PyTorch only supports CUDA 11.0 version, I installed PyTorch 1.8.0 by wheel. But, during installation of aihwkit, I got an error message saying that:

-- The CUDA compiler identification is unknown
-- Check for working CUDA compiler: /usr/bin/nvcc
CMake Error in /home/chaeunl/aihwkit/_skbuild/linux-x86_64-3.8/cmake-build/CMakeFiles/CMakeTmp/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "cmTC_6aef9".

A few months ago, I also tried with PyTorch 1.8.0 and it was successful. The differences are CUDA/cuDNN/NVIDIA driver version (previously, CUDA=11.1, cuDNN=8.0.4, NVIDIA driver=455.xx). Also, I tried with the same cmd: python setup.py install -DUSE_CUDA=ON -DRPU_CUDA_ARCHITECTURES="80"

I confirmed that cmake, pybind11, blas, and scikit-build satisfy the conditions described in installation page. Could you let me know how to fix it?

  • Pytorch version: 1.8.0.dev20210208+cu110
  • Package version: -
  • OS: 18.04.2
  • Python version: 3.8.5
  • Conda version (or N/A):
  • CUDA: 11.2
  • cuDNN: 8.1
  • NVIDIA driver: 460.39
  • VGA: RTX 3090

Provide pre-built conda package

Description and motivation

In the same way we provide pip wheels, it would be great to also provide conda packages for each release, for convenience to end users. This continues the work in #21 and might be able to help closing #58 (as in ironing out and testing further).

Proposed solution

Currently depend on two packages that are not in the main anaconda channel (pytorch and scikit-build). We should find out if it is possible to contribute to conda-forge and follow its procedures, which is incidentally the channel where scikit-build is available (and it seems that there is also a pytorch recipe, although lagging behind the official channel quite a bit).

Alternatives and other information

If depending on channels outside conda-forge is not an option, an alternative might be to register our own channel, although that might be a bit too far fetched. Another alternative would be including a base meta.yaml in this repository along with instructions on how to build the package manually.

Start Exception hierarchy

Description and motivation

As the functionality grows, it would be a good time to introduce a small number of custom Exceptions, in order to allow users finer-grained control over unexpected behavior and for us to have room to expand in a controlled way.

Proposed solution

Introduce a aihwkit.exceptions module with a minimal number of useful base exceptions.

Alternatives and other information

The learning rate of analog tile for TransferCompound

Description and motivation

I am trying to change the learning rate of an analog tile using nn.AnalogLinear.analog_tile.set_learning_rate(). However, for the case of TransferCompound, I think there is no way to change the learning rate for each devices defined in TranferCompound. For instance, if I define a TransferCompound having two devices, A and C (supposing that each device is defined in different tiles), then there are no way to adjust learning rate of each tile of A or C, respectively.

  • questions. I confirmed that the learning rate of an analog tile equals to dw_min of the device. I want to know that is there any specific reason to set the learning rate to be equal to dw_min?

Proposed solution

It would be great for us if a tile is defined with TransferCompound device, it has multiple learning rates for each device or the device is defined with multiple tiles.

Use pip-installable pybind11 2.6.0

Description and motivation

Currently, we recommend installing pybind11 from the package manager (or compile it manually), mostly as the pip-versions did not include the .cmake files. It seems the latest release finally includes them in the pip-package, which would simplify quite a bit the instructions and setting up of pybind for the build.

Proposed solution

Test the new version and update the cmake and documentation accordingly.

Alternatives and other information

Add examples for presets and visualization

Description and motivation

After the introduction of #133 and #141 , it would be good to add some examples or revise the existing ones as a quick way of showing the usage of the new functionality.

Proposed solution

Add new examples or update the existing ones.

Alternatives and other information

Is Aihwkit compatible with AMD EPYC 7002 model?

Hi again!
It feels like I question too much, but I'm now in my way of actively exploring this nice toolkit :3
I'm trying to buy a new pc with AMD EPYC 7002 CPU, but little worried that it might not compatible with aihwkit.
So before I buy, I just want to check if aihwkit works okay with AMD processor.
Would there be any issues if I use AMD CPU for aihwkit?

Inference with PCM statistical models

Description and motivation

As part of the next round of features, we should add the ability to perform inference using PCM statistical models.

Proposed solution

  • provide an abstraction of PCM statistical models
  • provide convenience integrations for performing inference
  • integrate into existing forward pass:
    • programming noise
    • read noise
    • drift

Alternatives and other information

Add README.md for examples/ folder

Description and motivation

As we have a number of examples since 0.2.0, it would be nice to have a summary in the examples/ for being able to navigate through them a bit more easily (and pave the way for additional examples as well).

Proposed solution

Alternatives and other information

Additional options to control pulse update on each device

void DifferenceRPUDevice<T>::doSparseUpdate(

To make use of reference devices which correspond to the value "0" of weights, pulse updates should be controlled for each device. The behavior of reference devices is described at the paper, [1]. It would be convenient for users to have the following options:

  1. Fix the conductance of reference device and the other device is the only one to be updated by pulses.
  2. Can modulate updating the reference device not only to make use of sort of techniques such as zero-shifting [1], but also to design custom update rules for the device.

In addition, it is recommended for every devices inherited from "aihwkit.simulator.configs.devices.PulsedDevice" base to have the reference device. So, I think that it would be better for you to modify the code at the level of device class, "rpu_pulsed.h" and "rpu_pulsed.cpp"

[1] Kim, Hyungjun, Malte Rasch, Tayfun Gokmen, Takashi Ando, Hiroyuki Miyazoe, Jae-Joon Kim, John Rozen, and Seyoung Kim. "Zero-shifting Technique for Deep Neural Network Training on Resistive Cross-point Arrays." arXiv preprint arXiv:1907.10228 (2019).

Add new layers: Conv1d and Conv3d

Description and motivation

As part of widening the scope of the library, it would be great to include more analog layers that can be used in different context and use cases. One of the likely first candidates would be additional convolution layers, taking advantage of the fact that we currently already have the AnalogConv2d layer implemented and that they would share quite some functionality.

Proposed solution

Using AnalogConv2d as a reference, implement other types of convolution layers. Note that torch actually uses a _ConvNd internal module as the basis for the convolution layers: it seems likely that the same approach would help us avoid some duplication and streamline the new layers.

Alternatives and other information

Use add_compile_options in cmake

Description and motivation

Currently we are modifying CMAKE_CXX_FLAGS directly for setting compilation flags:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wno-narrowing -Wno-strict-overflow")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3 -ftree-vectorize")

Proposed solution

It would be nice to use add_compile_options instead. It seems to support generator expressions that can be used for differentiating between release and debug, although the main blocker is that the flags end up being passed to the cuda compiler directly, which might result in an error (there seems to be a $<COMPILE_LANGUAGE:lang> that can be used in the generator expression)

Alternatives and other information

ImportError: cannot import name 'AnalogSGD'

Description

Traceback (most recent call last):
File "examples/1_simple_layer.py", line 25, in
from aihwkit.optim import AnalogSGD
ImportError: cannot import name 'AnalogSGD'

How to reproduce

I tried both
python examples/1_simple_layer.py
python3 examples/1_simple_layer.py

/home/zzzzzz/venv/aihwkit_env/git/aihwkit

Expected behavior

Other information

  • Pytorch version: 1.6
  • Package version: 0.2.0
  • OS: Ubuntu 20.04
  • Python version: 3.6.9
  • Conda version (or N/A):

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.