Git Product home page Git Product logo

explanations_can_be_manipulated's Introduction

Explanations can be manipulated and Geometry is to blame

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated \emph{arbitrarily} by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.

What we do

We manipulate images so their explanation resembles an arbitrary target map. Below you can see our algorithm in action:

In our paper we show how to achieve such manipulations. We discuss their nature and derive an upper bound on how much the explanation can change. Based on this bound we propose β-smoothing, a method that can be applied to any of the considered explanation methods to increase robustness against manipulations.

β-smoothing

We have demonstrated that one can drastically change the explanation map while keeping the output of the neural network constant. We argue that this vulnerability can be related to the large curvature of the output manifold of the neural network. We focus on the gradient method. The fact that the gradient can be drastically changed by slightly perturbing the input along the hypersurface suggests that the curvature of the hypersurface is large. If we replace the ReLU activations with softplus activations with parameter β, and reduce β we can reduce the curvature of the lines of equal network output. Below you can see the smoothing in action for a two layer neural network.

Links

NeurIPS paper

archiv version

google drive

Code

Install

Install dependencies using

 pip install -r requirements.txt 

Usage

Manipulate an image to reproduce a given target explanation using

python run_attack.py --cuda

For explanations beyond lrp you need to enable beta_growth so the second derivative of the activations is not zero.

python run_attack.py --cuda --method gradient --beta_growth

Plot softplus expanations for various values of beta using

python plot_expl.py --cuda 

To download patterns for pattern attribution, please use the following link:

https://drive.google.com/open?id=1RdvAiUZgfhSE8sVF2JOyURpnk1HQ_hZk

Copy the downloaded file in the models subdirectory.

License

This repository is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

explanations_can_be_manipulated's People

Contributors

annahdo avatar pankessel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.