Git Product home page Git Product logo

ui2code's Introduction

UI2Code implementation

Inspired by the original pix2code problem and dataset which translates a screenshot of a UI of a website to its corresponding code representation. This PyTorch codebase has used Timo Angerer and Marvin Knoll (@marvinknoll)'s university project (https://github.com/timoangerer/pix2code-pytorch) as a reference.

Setup and run the model

Follow the follwing steps to train and evaluate the model on Google Cloud Platform. Note: GCP already has Pytorch and necessary cuda dependencies hence those are not mentioned in the requirements.txt file.

Notebook named UI2Code.ipynb follows the following steps for training and evaluation.

  1. Clone the repository

  2. Install the dependencies

    Install the dependencies given in requirements.txt:

     pip install -r requirements.txt
    
  3. Dataset

    Unzip data.zip (https://drive.google.com/file/d/1IU42LKAk32yFPAiTt4d2F7FZmqo1Ug0G/view?usp=share_link) to find 3 datasets, D1, D2 and D3, in the folder named data.

    The folder structure of the data looks like:

     data
     ├── ...
     └── D3
         └── input
             ├── AF4840B2-2B9F-4ED0-A58D-E260B14858E1.png
             └── AF4840B2-2B9F-4ED0-A58D-E260B14858E1.gui
             └── ...
    
  4. Split the dataset

    The train.py and evaluate.py scripts assume the existence of 2 data split files train_dataset.txt and test_dataset.txt, each containing the IDs of the data examples for the respective data split. The data split files have to be at the same folder level as the folder containing the data examples. 85--15 rule is applied for splitting the dataset into train-test splits. Note The current implementation doesn't use the validation set to monitor the training process or for storing the best weights.

    Run split_data.py to generate the data split files for each dataset, for e.g.:

     python split_data.py --data_path=./data/D1/input
    
  5. Vocabulary file

    To generate a vocab.txt file that contains all the tokens the model should be able to predict, separated by whitespaces.

    Run build_vocab.py to generate a vocabulary file named vocab.txt at the same folder level as the folder containing the data examples, based on the tokens that appear in the specified dataset.

     python build_vocab.py --data_path=./data/D1/input
    
  6. Model Architecture

    A simple Encoder-Decoder architecture is used for modelling this problem, similar to an image captioning model.

    Encoder: Pretrained ResNet-152 without the last layer is used as a CNN-based feature extractor backbone. The last classification layer is replaced with FC layer based on the embedding size.

    Embedding Size used : 256

    Decoder: At its core, the decoder constitutes a LSTM network.

    Hidden Layer size: 512, Number of recurrent Layers = 1

    The encoder output (i.e., the image features), is concatenated with the token embeddings, which are fed as inputs to the decoder. The decoder then learns to predict the next token in the sequence one by one until it detects the end of the sequence. To predict the token at time step t, the model receives the image features, as well as the embedding of the previous token (at time step t-1), as an additional context. During training, embedding of the token at the t-1 position in the ground-truth sequence is fed to facilitate faster convergence. However, during inference, the decoder's output at time step t-1 is fed as input to the decoder at time step t.

    At t=0, the token embeddings are initialized randomly. One can improve this initialization by using pretrained embeddings from transformer-based models. Another area of improvement is to increase the context window, which is currently set to 1.

  7. Train the model

    Run the following command to train the model:

     python train.py --data_path=./data/D1/input --epochs=50 --save_after_epochs=10 --batch_size=4 --split=train --models_dir=./models/D1/
    
  8. Evaluate the model

    Run the following command to evaluate the model and calculate BLEU scores:

     python evaluate.py --data_path=./data/D1/input --model_file_path=<path_to_model_file> --split=test 
    

References & credits

ui2code's People

Contributors

gangulypritha avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.