Light

ndaultryball1 / activation_sparsity Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 2.0 9.64 MB

Jupyter Notebook 97.88% Python 2.12%

activation_sparsity's People

Contributors

Stargazers

Watchers

Forkers

activation_sparsity's Issues

TO DO (coding tasks in italics) from Jared

The recursion function for hard and self thresholding: Simplify the formulae so that they are easy to understand, not having both the $\phi$ and $\tilde\Phi$, make it as easy to interpret as possible, remember you are writing so that someone wants to read this. Then plot this function, $q$ vs $V(q)$ for a few values of $(\sigma_w,\sigma_b,\tau)$.
Formula for $\chi$: Having $\chi=1$ now links $\sigma_w$ and $\sigma_b$ and you can express the $V(q)$ when $\chi=1$ as just a function $V(q; \tau,\sigma_w)$ and illustrate what this looks like; plot it, say, fix $\sigma_w$ and make a surface plot with say $x$ being $q$, $y$ being $\tau$, and $z$ being $V(q,\tau)$, for each of hard and soft thresholding and one that surface plot a line of the fixed point.
Prove that $\frac{d}{dq}V(q)$ at fixed points is less than one, and consequently it has a stable fixed point. Compute the fixed points, and make a plot of the fixed point $q*$ as a function $\sigma_w$ and $\tau$; say one dimensional plots of $q*$ as a function of $\tau$ for a few values of $\sigma_w$, then $q*$ as a function of $\sigma_w$ for a few values of $\tau$. Think how you can give a reader insight into what you computed.
The correlation map may be harder to compute, but it should be computed as it is an important part of the story. Michael Murray is part of the team and he has code that can do this. Write to him, ask him for some code and if you have trouble using it ask him for guidance on how to adapt it to the new activations.
You can also repeat the above for FAT ReLU. There is a lot one can do here to build this up and it would be good to be doing these things.
Great to see the loss function for the network. Be sure to explain what actually is being done here. This is a paper about initialisation. How were they initialised? What was $\tau$? Write this will all of the parameters explained so that a reader could reproduce what was done. The experiment should be conducted with $\chi=1$ and a few values of $\tau$ and $\sigma_w$. In addition to showing the loss function, you should show the training and test accuracy. Once you have code that can do this kind of experiment, conduct experiments with a few different choices of network depth and show a table/plot of the final/asymptotic training and test accuracy as a function of $\tau$ and/or the fraction of nonzero in the hidden layers. For a fixed $\sigma_w$ and $\tau$ show the fraction of nonzero in the hidden layers as a function of the training, and then include a few values of $\tau$.
Once you can get good results for the soft and hard thresholding one needs to compare with other methods. Repeat the experiments for FAT ReLU and by adding adding $l_1$ regularisation when using a more traditional nonlinear activation such as ReLU and hard tanh. Be sure to initialise them as in the edge of chaos theory. Include curves for these methods alongside your results, hoping that you can show an improvement over their approaches. Some thought will need to go into deciding what experiments to do here. Mostly one will be aiming to have their approaches give a similar fractional sparsity as your approach (for some given $\tau$), determine the parameters in their approach that will give this, and then plot things like training and test accuracy, plots of final test accuracy, etc…

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.