google-research / rigl Goto Github PK

View Code? Open in Web Editor NEW

314.0 314.0 49.0 830 KB

End-to-end training of sparse deep neural networks with little-to-no performance loss.

License: Apache License 2.0

Python 91.40% Shell 0.33% Jupyter Notebook 8.27%

computer-vision machine-learning neural-networks sparse-training

rigl's Introduction

Google Research

This repository contains code released by Google Research.

All datasets in this repository are released under the CC BY 4.0 International license, which can be found here: https://creativecommons.org/licenses/by/4.0/legalcode. All source files in this repository are released under the Apache 2.0 license, the text of which can be found in the LICENSE file.

Because the repo is large, we recommend you download only the subdirectory of interest:

SUBDIR=foo
svn export https://github.com/google-research/google-research/trunk/$SUBDIR

If you'd like to submit a pull request, you'll need to clone the repository; we recommend making a shallow clone (without history).

git clone [email protected]:google-research/google-research.git --depth=1

Disclaimer: This is not an official Google product.

Updated in 2023.

rigl's People

Contributors

Stargazers

Watchers

rigl's Issues

Grow & Drop Ambiguity

Within the paper, it is referenced that when selecting the weights to grow by using the ArgTopK function over the sparse value's gradients, you may NOT select indices that are left over after the drop phase.

After the drop phase, technically the topology of the network is smaller for a brief moment before entering the grow phase, therefore I have a couple questions:

In the grow phase, can elements that were dropped in the drop phase be de-selected for drop candidacy (assuming their gradient is large enough to be selected by the ArgTopK)?
If the answer to #1 is "yes", then are these values re-initialized to 0, or are they unaltered?

EDIT: seems like here covers this. it looks to me like they are by default NOT re-initialized, but rather kept as-is

I have had a hard time parsing through the code here, though it seems to me that the answer to 1 should be "yes" and the answer to 2 should be "they are unaltered".

I am reimplementing the paper in PyTorch and am having a hard time reproducing your results. For my previous simulations I have ran it under the assumption that the answer to 1 is "no", however I am re-running them with the "yes" and "unaltered" answers.

Thank you!

How to train own convolutional network

Hi, good work!
How can I train my CNN using RIGL?

In issue #2 , you recommend modify the model and wrap the optimizer but no matter how I try to do it, for my CNN it doesn't work.

Can you tell me exactly how this is done for CNN?

Thank you!

RigL TF2 on Resnet50 + Imagenet

I wanted to use RigL TF2 code to train a sparsified Resnet50 architecture and see how that goes.
I loaded a Resnet50 -
model = tf.keras.applications.resnet50(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",)

And then sparsified it.
The test accuracy seems to be stuck at 0.1% for the first 2000 steps even
Is this a common problem?

RigL TF2 - Initial Sparsification

There is an on-going effort to open-source a tf2 version with tf-model_optimization toolkit. It should be out in few months. Pytorch implementation is on hold (I plan to do when I have more time), but I am happy to help if you are interested adding that to the repo.

Originally posted by @evcu in #2 (comment)

Hello @evcu ,
I am going through the TF2 implementation for RigL and I'm trying to understand how the initial sparsification of the model is being controlled? I don't see an explicit argument for mask_init_method like in the TF1 implementation which I can set to as ERK or uniform. Can you point me to how I can control the initial sparsification of the model? Is it through pruning_params in rigl.gin config file?

Can you post the CIFAR-10 results of the paper?

Hi,

I am trying to reproduce your paper's results on CIFAR-10 in PyTorch. In the paper figure 4 describes the results however there are no exact numbers.
Can you please post the exact results you got from your experiments on CIFAR?

Thanks

How to load pretained weights?

Hi!
How can I load pertained weights in README?

I want to extract those weight for research purpose.

Can you tell me how can I load it? or can give me brief code to make model and load weights?

Using Rigl to train with structured sparsity

Hi,
Thank you for sharing your work.

I have been using this codebase by modifying the initial sparsification method to give some structured sparsity to the network. Now, what I would like is to make sure that when back-propagation takes place and weights are updated, they only update the values in the initial non-zero positions and do not update the positions which are already zero.

How can I achieve this with the codebase here?

Hello, can I use this method to sparse my own neural network?

Hello, can I use this method to sparse my own neural network?
Your work is so cool！

MetaInit w. VGG

I want to try MetaInit with VGG. Can you share some instructions for it? Thank you so much

No module named 'officialresnet'

Hi,
I am trying to run imagenet_train_eval.py but I get the following error:
Traceback (most recent call last): File "rigl/imagenet_resnet/imagenet_train_eval.py", line 37, in <module> from officialresnet import imagenet_input ModuleNotFoundError: No module named 'officialresnet'
Should I install officialresnet library separately?
Thanks

Question: what is the speed up gain from Rigl and other methods implemented here in training?

Hi,

I wonder if the code implementation for Rigl (and other methods like SET) here shows real speed up gain and less memory consumption when training compared to training a dense model or does it only simulate the method in terms of accuracy?

Thanks,
Ofir

Specify TF and TF.data versions?

Hi,

I tried running the cifar10 example with:

tensorflow-datasets      1.3.0
tensorflow-estimator     1.15.1
tensorflow-gpu           1.15.4

This fails with the error:

  File "/srv/home/varunsundar/rigl/rigl/cifar_resnet/data_helper.py", line 105, in input_fn
    images_batch, labels_batch = tf.compat.v1.data.make_one_shot_iterator(
  File "/srv/home/varunsundar/.conda/envs/tf37/lib/python3.7/site-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in __getattr__
    attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module 'tensorflow._api.v1.compat.v1.compat' has no attribute 'v1'

Seems like the tensor flow version is too high (too new)- I'm guessing tf.compat.v1.data became tf.data?

Could you specify the exact tf and tf-datasets versions to be used for reproduction?

TF2 Grow Scores calculation

The TF1 code repo calculates grow scores based on the gradients of masked variables (after multiplication of masks and vars)
Refer:

rigl/rigl/sparse_optimizers_base.py

Line 481 in 0f02973

masked_grads_vars = self._optimizer.compute_gradients(

where masked_weights are fetched from https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/contrib/model_pruning/python/pruning.py#L258

whereas in the TF2 code, there is no such thing, which I think affects the performance of RigL in general. Is there any way to apply a fix for this?
Also, is this because if the TF2 code the updates are made using a part of validation dataset? If I were to change it back to how TF1 does the updates based on the specific training batch, I would have to make the above-mentioned change right?
[We want the gradient calculation for mask update to happen after the mask and weights are multiplied and not on the original weights before parameterization as shown in the figure attached]

google-research / rigl Goto Github PK

rigl's Introduction

Google Research

rigl's People

Contributors

Stargazers

Watchers

Forkers

rigl's Issues

Recommend Projects

Recommend Topics

Recommend Org