Git Product home page Git Product logo

skynet-train's Introduction

SegNet training and testing scripts

These scripts are for use in training and testing the SegNet neural network, particularly with OpenStreetMap + Satellite Imagery training data generated by skynet-data.

Contributions are very welcome!

Quick start

The quickest and easiest way to use these scripts is via the developmentseed/skynet-train docker image, but note that to make this work with a GPU--necessary for reasonable training times---you will need a machine set up to use nvidia-docker. (The start_instance script uses docker-machine to spin up an AWS EC2 g2 instance and set it up with nvidia-docker. The start_spot_instance script does the same thing but creates a spot instance instead of an on demand one.)

  1. Create a training dataset with skynet-data.
  2. Run:
nvidia-docker run \
    -v /path/to/training/dataset:/data \
    -v /path/to/training/output:/output \
    -e AWS_ACCESS_KEY_ID=... \
    -e AWS_SECRET_ACCESS_KEY=... \
    developmentseed/skynet-train:gpu \
      --sync s3://your-bucket/training/blahbla

This will kick off a training run with the given data. Every 10000 iterations, the model will be snapshotted and run on the test data, the training "loss" will be plotted, and all of this uploaded to s3. (Omit the --sync argument and AWS creds to skip the upload.)

Each batch of test results includes a view.html file that shows a bare-bones viewer allowing you to browse the results on a map and compare model outputs to the ground truth data. Use it like:

Customize the training run with these params:

--model MODEL # segnet or segnet_basic, defaults to segnet
--output OUTPUT # directory in which to output training assets
--data DATA # training dataset
[--fetch-data FETCH_DATA] # s3 uri from which to download training data into DATA
[--snapshot SNAPSHOT] # snapshot frequency
[--cpu] # sets cpu mode
[--gpu [GPU [GPU ...]]] # set gpu devices to use
[--display-frequency DISPLAY_FREQUENCY] # frequency of logging output (affects granularity of plots)
[--iterations ITERATIONS] # total number of iterations to run
[--crop CROP] # crop trianing images to CROPxCROP pixels
[--batch-size BATCH_SIZE] # batch size (adjust this up or down based on GPU size. defaults to 6 for segnet and 16 for segnet_basic)
[--sync SYNC]

Monitoring

On an instance where training is happening, expose a simple monitoring page with:

docker run --rm -it -v /mnt/training:/output -p 80:8080 developmentseed/skynet-monitor

Details

Prerequisites / Dependencies:

  • Node and Python
  • As of now, training SegNet requires building the caffe-segnet fork fork of Caffe.
  • Install node dependencies by running npm install in the root directory of this repo.

Set up model definition

After creating a dataset with the skynet-data scripts, set up the model prototxt definition files by running:

segnet/setup-model --data /path/to/dataset/ --output /path/to/training/workdir

Also copy segnet/templates/solver.prototxt to the training work directory, and edit it to (a) point to the right paths, and (b) set up the learning "hyperparameters".

(NOTE: this is hard to get right at first; when we post links to a couple of pre-trained models, we'll also include a copy of the solver.prototxt we used as a reference / starting point.)

Train

Download the pre-trained VGG weights VGG_ILSVRC_16_layers.caffemodel from http://www.robots.ox.ac.uk/~vgg/research/very_deep/

From your training work directory, run

$CAFFE_ROOT/build/tools/caffe train -gpu 0 -solver solver.txt \
    -weights VGG_ILSVRC_16_layers.caffemodel \
    2>&1 | tee train.log

You can monitor the training with:

segnet/util/plot_training_log.py train.log --watch

This will generate and continually update a plot of the "loss" (i.e., training error) which should gradually decrease as training progresses.

Testing the Trained Network

segnet/run_test --output /path/for/test/results/ --train /path/to/segnet_train.prototxt --weights /path/to/snapshots/segnet_blahblah_iter_XXXXX.caffemodel --classes /path/to/dataset/classes.json

This script essentially carries out the instructions outlined here: http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html

Inference

After you have a trained and tested network, you'll often want to use it to predict over a larger area. We've included scripts for running this process locally or on AWS.

Local Inference

To run predictions locally you'll need:

  • Raster imagery (as either a GeoTIFF or a VRT)
  • A line delimited list of XYZ tile indices to predict on (e.g. 49757-74085-17. These can be made with geodex)
  • A skynet model, trained weights, and class definitions ( .prototxt, .caffemodel, .json)

To run:

docker run -v /path/to/inputs:/inputs -v /path/to/model:/model -v /path/to/output/:/inference \
  developmentseed:/skynet-run:local-gpu /inputs/raster.tif /inputs/tiles.txt \
  --model /model/segnet_deploy.prototxt
  --weights /model/weights.caffemodel
  --classes /model/classes.json
  --output /inference

If you are running on a CPU, use the :local-cpu docker image and add --cpu-only as a final flag to the above command.

The predicted rasters and vectorized geojson outputs will be located in /inference (and the corresponding mounted volume)

AWS Inference

TODO: for now, see command line instructions in segnet/queue.py and segnet/batch_inference.py

GPU

These scripts were originally developed for use on an AWS g2.2xlarge instance. For support on newer GPUs, it may be required to:

  • use a newer NVIDIA driver
  • use a newer version of CUDA. To support CUDA8+, you can use the docker images tagged with :cuda8. They are built off an updated caffe-segnet fork with support for cuDNN5.

skynet-train's People

Contributors

anandthakker avatar drewbo avatar ellielitwack avatar felskia avatar wronk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

skynet-train's Issues

How to add access token for viewing results?

This is just what I was looking for thanks, but is there a way to input the mapbox access token so users can view the original input image? When I click the 'view results' link, I can view the osm annotation and prediction images, but it's not too helpful without being able to see what type of input image it is. Thanks.

"Dedupe" main run command

The standard training start command has some duplication in the mounting of two folders and also referencing them in the skynet command as parameters

start_instance script fails when installing

Steps to reproduce:

  1. Run start_instance --ssh-keypath <SSH_KEYPATH> skynet-test Where <SSH_KEYPATH> is a file path that points to a ssh key.

Results:
The script runs successfully until the step Setting Docker configuration on the remote daemon.... Then, it produces this error:

Error creating machine: Error running provisioning: ssh command error:
command : sudo systemctl -f start docker
err     : exit status 1
output  : Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

When ssh'd into the newly created instance, running systemctl status docker.service produces:

● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─10-machine.conf
   Active: inactive (dead) (Result: exit-code) since Wed 2017-07-05 19:07:31 UTC; 2min 37s ago
     Docs: https://docs.docker.com
  Process: 5209 ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver aufs --tlsverify --tlscacert /etc/docker/ca.pem --tlscert 
 Main PID: 5209 (code=exited, status=1/FAILURE)

Jul 05 19:07:31 skynet-test systemd[1]: Failed to start Docker Application Container Engine.
Jul 05 19:07:31 skynet-test systemd[1]: docker.service: Unit entered failed state.
Jul 05 19:07:31 skynet-test systemd[1]: docker.service: Failed with result 'exit-code'.
Jul 05 19:07:31 skynet-test systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jul 05 19:07:31 skynet-test systemd[1]: Stopped Docker Application Container Engine.
Jul 05 19:07:31 skynet-test systemd[1]: docker.service: Start request repeated too quickly.
Jul 05 19:07:31 skynet-test systemd[1]: Failed to start Docker Application Container Engine.

and running journalctl -xe produces:

-- The start-up result is done.
Jul 05 19:09:44 skynet-test systemd[5218]: Startup finished in 8ms.
-- Subject: System start-up is now complete
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- All system services necessary queued for starting at boot have been
-- successfully started. Note that this does not mean that the machine is
-- now idle as services might still be busy with completing start-up.
-- 
-- Kernel start-up required KERNEL_USEC microseconds.
-- 
-- Initial RAM disk start-up required INITRD_USEC microseconds.
-- 
-- Userspace start-up required 8643 microseconds.
Jul 05 19:09:44 skynet-test systemd[1]: Started User Manager for UID 1000.
-- Subject: Unit [email protected] has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit [email protected] has finished starting up.
-- 
-- The start-up result is done.

Segmentation vs Object detection?

Is there a reason why you guys chose to do semantic segmentation approach vs object detection? Wouldn't it be easier to train and have better accuracy if you are just trying to label the object and not worried about correctness/completeness pixel metrics?

Add CI build

Mainly to centralize/systematize docker builds

Add simple monitoring server

Make a new docker image for use on a training instance that exposes a web server with:

  • /training: a page showing current training progress parsed out from the training log every few minutes (plot of loss vs. iterations; learning rate; etc.).
  • /data: a simple viewer that lets us browse the training and validation datasets.

Release of trained caffemodel?

I'm looking to use this as a pretraining step for fine-tuning later on. Would it be possible to have the pre-trained caffemodel that you've created?

I'm working on a similar task, and I believe transferring weights from ImageNet or Pascal for example to this overhead imagery case may be problematic in more difficult situations. So using a pretrained network in the correct domain of overhead imagery would improve performance significantly I believe. Thanks.

Script the training process

Currently running training like:

caffe train --solver solver.prototxt [--snapshot snapshots/segnet_basic_iter_xxxxx.solverstate] 2>&1 | tee -a train-xxx.log

And then separately running plot_training_log.py to continuously re-plot loss vs. iterations.

It works, but it would be great to use the pycaffe API to script the process to:

  • Automatically restart from the most recent solverstate
  • Persist to s3 when a snapshot is made
  • Every so often, run inference on on the test data and push results to s3

Attempt to download .caffemodel from S3: Forbidden

After configuring my AWS account with access key + secret access key + region + output type, I ran:

aws s3 --no-verify-ssl cp s3://skynet-models/segnet-ts12-1/manifest.txt .

Which returned error:

A client error (403) occurred when calling the HeadObject operation: Forbidden

Completed 1 part(s) with ... file(s) remaining

Is there some piece to allow it to be accessed?

Branch:script-training:how to use segnet/demo examples

I am using Branch:script-training, I want to see demo result in Browser, My operation as follows :
(1)start the predect server:
python segnet/demo.py --model model/segnet_deploy.prototxt --weights del/test_weights.caffemodel --classes model/classes.json
success ~
(2)start webserver
npm run bundle-demo
npm run bundle-viewer
test1:
node segnet/demo.js
error:
exports.easeCubicInOut = function(t: number): number ...
test2:
node segnet/static/demo.js
error: module.exports = self;ReferenceError: self is not defined..

can you tell me what should I do ? can you describe more info about this demo?

about share trained model

from question--->>> Release of trained caffemodel? #7
Share experiments #22
aws s3 not contain prototxt files
just checking back to see if you could please release prototxt files used to train caffemodels in share-experiments??

Compute completeness and correctness on test results

The most common metrics for evaluating road detection systems are correctness and
completeness [17]. The completeness of a set of predictions is the fraction of true roads
that were correctly detected, while the correctness is the fraction of predicted roads that
are true roads. Since the road centreline locations that we used to generate ground truth
are often noisy we compute relaxed completeness and correctness scores. Namely, in
our experiments completeness represents the fraction of true road pixels that are within
ρ pixels of a predicted road pixel, while correctness measures the fraction of predicted
road pixels that are within ρ pixels of a true road pixel.

  • Mnih, Volodymyr, and Geoffrey E. Hinton. "[Learning to detect roads in high-resolution aerial images.][1]" European Conference on Computer Vision. Springer Berlin Heidelberg, 2010.

how to run docker

I try to run spacenet example and i use this instruction to create the data
but the skynet-train docker dont have access to the data in spacenet-data docker and even if it had
the paths in train.txt are not realy exist
like -"/spacenet/processedData/3band/3band_013022232022_Public_img6005.tif /data/labels/grayscale/013022232022_Public_img6005.tif"
so what i need to change for succeed to run the skynet-train docker?
thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.