Git Product home page Git Product logo

autowebcompat's Introduction

AutoWebCompat - Automatically detect web compatibility issues

Build Status

The aim of this project is creating a tool to automatically detect web compatibility issues without human intervention.

Collecting screenshots

The project uses Selenium to collect web page screenshots automatically on Firefox and Chrome.

The crawler loads web pages from the URLs on the webcompat.com tracker and tries to reproduce the reported issues by interacting with the elements of the page. As soon as the page is loaded and after every interaction with the elements, the crawler takes a screenshot.

The crawler repeats the same steps in Firefox and Chrome, generating a set of comparable screenshots.

The data/ directory contains the screenshots generated by the crawler (N.B.: This directory is not present in the repository itself, but it will be created automatically after you setup the project as described in the Setup paragraph).

Labeling

Now that the screenshots are available, they need to be labeled. The labeling phase operates on couples of comparable screenshots.

There are three possible labels:

  1. Y for couples of images that are clearly compatible;
  2. D for couples of images that are compatible, but with content differences (e.g. on a news site, two screenshots could be compatible even though they are showing two different news, simply because the news shown depends on the time the screenshot was taken and not on the fact that the browser is different);
  3. N for couples of images which are not compatible.

Here are some examples of the three labels:

Y

D

N

In the training phase, the best case is that we are able to detect between Y+D and N. If we are not able to do that, we should at least aim for the relaxed problem of detecting between Y and D+N. This is why we have this three labeling system.

The labeling technical details are described in this issue.

Training

Now that we have a dataset with labels, we can train a neural network to automatically detect screenshots that are incompatible. We are currently using a Siamese architecture with different Convolutional Neural Networks, but are open to test other ideas.

We plan to employ three training methodologies:

  1. Training from scratch on the entire training set;
  2. Finetuning a network previously pretrained on ImageNet (or other datasets);
  3. Finetuning a network previously pretrained in an unsupervised fashion.

For the unsupervised training, we are using a related problem for which we already have labels (detecting screenshots belonging to the same website). The pre-training can be helpful because we have plenty of data (as we don't need to manually label them) and we can fine-tune the network we pre-train for our problem of interest.

Structure of the project

  • The autowebcompat/utils.py module contains some utility functions;
  • The autowebcompat/network.py module contains neural network definition, optimizers definition, along with the loss and accuracy;
  • The collect.py script is the crawler that collects screenshots of web pages in different browsers;
  • The label.py script is a utility that helps labelling couples of screenshots (are they the same in the two browsers or are there differences?);
  • The pretrain.py script trains a neural network on the website screenshots for a slightly different problem (for which we know the solution), so that we can reuse the network weights for the training on the actual problem;
  • The train.py script trains the neural network on the website screenshots to detect compat issues;
  • The data_inconsistencies.py script checks the generated screenshots and takes note of any data inconsistency (e.g. screenshots that were taken in Firefox but not in Chrome).

Setup

Python 3 is required.

  • Install Git Large File Storage, either manually or through a package like git-lfs if available on your system.
  • Clone the repository with submodules: git clone --recurse-submodules [email protected]:marco-c/autowebcompat.git
  • Install the dependencies in requirements.txt: pip install -r requirements.txt.
  • Install the dependencies in test-requirements.txt: pip install -r test-requirements.txt.
  • Run the pretrain.py / train.py script to train the neural network.

Communication

Real-time communication for this project happens on Mozilla's IRC network, irc.mozilla.org, in the #webcompat channel.

autowebcompat's People

Contributors

aarushgupta avatar adbugger avatar manasvikundalia avatar marco-c avatar marxmit7 avatar poush avatar sagarvijaygupta avatar shashi456 avatar skulltech avatar sviter avatar trion129 avatar vedarth avatar vibss2397 avatar vikasmahato avatar zzoey avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.