Git Product home page Git Product logo

autowebcompat's People

Contributors

aarushgupta avatar adbugger avatar amaaniqbal avatar anujanegi avatar aru31 avatar gabriel-v avatar manasvikundalia avatar marco-c avatar marxmit7 avatar osmarcedron avatar poush avatar rhcu avatar sagarvijaygupta avatar sdv4 avatar shashi456 avatar skulltech avatar sviter avatar trion129 avatar vedarth avatar vibss2397 avatar vikasmahato avatar vrishank97 avatar zzoey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autowebcompat's Issues

Make collect.py support Mac and Windows too

Using TensorFlow backend.
10280
Traceback (most recent call last):
File "/Users/amit/Documents/GitHub/autowebcompat/collect.py", line 203, in
firefox_driver = webdriver.Firefox(firefox_profile=firefox_profile, firefox_binary='tools/nightly/firefox-bin')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py", line 152, in init
keep_alive=True)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in init
self.start_session(desired_capabilities, browser_profile)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute
self.error_handler.check_response(response)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: **Unable to find a matching set of capabilities
**

Setup .travis.yml to run flake8

We should add flake8 to test-requirements.txt and write a .travis.yml to run flake8.
The flake8 configuration file should ignore line limit errors (E501).

Try using Simnet architecture

I found this architecture 'Simnet' by Amazon Development Services, it uses a variation of Siamese network, they use 2 extra shallower CNN models trained on downsampled images alongside the ImageNet,
The final results are better.
https://arxiv.org/pdf/1709.08761.pdf
image

This would be a great experiment as per issue #1

Improve user-experience of get_dependencies.py

The get_dependencies.py script prints no message, and takes a long time to finish its execution (as it involves downloading and extracting a zip file of size 1.4 GB). So while the script runs, it results in a poor experience of the user (as he doesn't know what's going on, and it gives off the impression that it's stuck). It can be improved by

  • Printing some info while the execution occurs.
  • Showing a progress bar while it downloads the data.zip file.

The progress bar can be added using libraries like clint, but it also means adding an extra dependency. Although, as we polish this project, we will be needing proper CLI interface, so it won't hurt if we start adding them now.

Modify label.py to handle multiple people performing the labeling

Modify the label.py script to handle multiple people performing the labeling (I would suggest simply adding a parameter PERSON, and writing a labels_PERSON.csv according to this parameter). Then, write another script that generates labels.csv from all the labels_.csv files (basically storing the lines from labels_.csv which agree on the label).

Investigate cases where the crawler wasn't able to take screenshots

There are a few cases where the crawler was not able to take screenshots. We should figure out why and try to fix any issue that we notice.

The files under data/ are in the format WEBCOMPAT-ID_ELEMENT-ID_BROWSER.png. WEBCOMPAT-ID are the IDs from webcompat.com.
ELEMENT-ID are the element IDs where the crawler clicked before taking the screenshot.
BROWSER is the name of the browser.

We should investigate these cases:

  1. XXXX_firefox.png is present but XXXX_chrome.png is not present.
  2. XXXX_ELEMENT_firefox.png is present but XXXX_ELEMENT_chrome.png is present.

ModuleNotFoundError for Tkinter

sumit@HAL9000:~/Documents/autowebcompat$ python3 label.py 
Traceback (most recent call last):
  File "label.py", line 4, in <module>
    from Tkinter import Tk, Label
ModuleNotFoundError: No module named 'Tkinter'

This is happening because Tkinter is being imported as Tkinter. For Python 3 it should be imported as tkinter (see this SO answer).

Document that Git LFS is required and how to clone the repo with submodules

On doing a fresh recursive clone of autowebcompat, the submodule for data fetches the .png files by filename but inside they have git signatures instead of content.
So it downloaded just 5MB data instead of 959MB.

How to reproduce bug:

  1. git clone --recurse-submodules [email protected]:marco-c/autowebcompat.git autoweb
  2. cat autoweb/data/7_firefox.png

image

It is text instead of png binary gibbrish!

In data_inconsistencies.py, print how many screenshots we have collected and how many we could have collected

@marco-c
Once #78 is fixed, I think a script should be written which ensures that inconsistent data is excluded and/or what part of the data collected is actually being used to train so that we have an idea of how successful we are at data collection and maybe update the crawler script later on.
Although i do know that labels.py will only take a set of images if both are present, i haven't seen any enforcement measures.

Add UTF-8 Encoding to get_dependencies.py

On running python get_dependencies.py we get an error as:

File "get_dependencies.py", line 31
SyntaxError: Non-ASCII character '\xe2' in file get_dependencies.py on line 31, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

To solve this, the utf-8 encoding should be mentioned on get_dependencies.py

Downloading Images for training

I just forked and cloned the repository and ran the following commands as described in the README

  1. pip3 install -r requirements.txt
  2. pip3 install -r test-requirements.txt

But when I run python3 pretrain.pyI get the following error

/usr/local/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from 'float' to 'np.floating' is deprecated. In future, it will be treated as 'np.float64 == np.dtype(float).type'. from ._conv import register_converters as _register_converters

Using TensorFlow backend. Traceback (most recent call last): File "pretrain.py", line 14, in <module> image = utils.load_image(all_images[0]) IndexError: list index out of range

This might be due to missing training images and I am unable to figure out how to download the training images

Investigate running multiple crawlers in parallel

We should try to modify collect.py to run multiple instances of the browsers. Since the crawler is spending most of its time waiting (for the page to be fully loaded), running multiple instances can increase consistently the number of screenshots we can take per second.

Label dataset

The labeling can be performed using the label.py script.

This script will show you a couple of images, and then you can press 'y' to label them as being compatible, 'd' to label them as being compatible with content differences (e.g. on news site, two screenshots could be compatible even though they are showing two different news, simply because the news shown depends on the time the screenshot was taken and not on the different browser), 'n' to label them as not being compatible, 'RETURN' to skip them (in case you are not sure yet), 'ESCAPE' to terminate the current labeling session and store the current results.

More details about the three-labeling system are present in the documentation at https://github.com/marco-c/autowebcompat#labeling.

Make the crawler check if screenshots for the operations were taken

Right now, for each webcompat issue, we are just checking if we have taken screenshots for the main page. We should instead check if we have taken screenshots for all the possible operations too.

Basically, this should take care of the TODO here:

# Assume that if we generated the main file, we also generated the one with

Given the associated text file with the sequence of operations, we should check if a screenshot exist for every operation. When a screenshot doesn't exist, we should attempt to create it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.