OSSP

Open Source Sea-ice Processing

Open Source Algorithm for Detecting Sea Ice Surface Features in High Resolution Optical Imagery

Nicholas Wright and Christopher Polashenski

Introduction

Welcome to OSSP; a set of tools for detecting surface features in high resolution optical imagery of sea ice. The primary focus is on the detection of and differentiation between open water, melt ponds, and snow/ice.

The Anaconda distribution of Python is recommended, but any distribution with the appropriate packages will work. You can download Anaconda, version 2.7, here: https://www.continuum.io/downloads

Dependencies

Python 2.7
gdal (v2.0 or above)
numpy
scipy
h5py
skimage
sklearn
matplotlib
Tkinter

Usage

For detailed usage and installation instructions, see the pdf document 'Algorithm_Instructions.pdf'

batch_process_mp.py

This combines all steps of the image classification scheme into one script. This script finds all appropriately formatted files in the input directory (.tif and .jpg) and queues them for processing. For each image, this script processes them as follows: Image Subdivision (Splitter.py) -> Segmentation (Watershed.py) -> Classification (RandomForest.py) -> Calculate statistics -> Recompile classified splits. batch_process_mp.py is able to utilize more than one core of the processor for the segmentation and classification phases.

Required Arguments

input directory: directory containing all of the images you wish to process Note that all .jpg and .tif images in the input directory as well as all sub-directories of it will be processed.
image type: {‘srgb’, ‘wv02_ms’, ‘pan'}: the type of imagery you are processing.
1. 'srgb': RGB imagery taken by a typical camera
2. 'wv02_ms': DigitalGlobe WorldView 2 multispectral imagery
3. 'pan': High resolution panchromatic imagery
training dataset file: complete filepath of the training dataset you wish to use to analyze the input imagery

Optional Arguments

-s | --splits: The number of times to split the input image for improved processing speed. This is rounded to the nearest square number. Default = 9.
-p | --parallel: The number of parallel processes to run (i.e. number of cpu cores to utilize). Default = 1.
--training_label: The label of a custom training dataset. See advanced section for details. Default = image_type.

Notes:

Example: batch_process_mp.py input_dir im_type training_dataset_file -s 4 -p 2

This example will process all .tif and .jpg files in the input directory, using the training data found in training_dataset_file using two processors, and splitting the image into four sections

In general, images should be divided into parts small enough to easily load into RAM. This depends strongly on the computer running these scripts. Segments should typically not exceed 1 or 2gb in size for best results. For the 5-7mb RGB images provided as test subjects, subdivision is not required (use –s 1 as the optional argument). Processing speed can be increased by combining subdivision with multiple cores. For a full multispectral WorldView scene, which may be 16gb or larger, 9 or 16 segments are typically needed. The number of parallel processes to run should be selected such that num_cores * subdivision filesize << available system ram.

Splitter.py

This script reads in a raw image, stretches the pixel intensity values to the full 8-bit range, and subdivides the image into s number of subimages. The output file is in hdf5 format, and is ready to be ready by Watershed.py.

Positional Arguments

input_dir: Directory path of the input image
filename: Name of the image to be split
image type: {‘srgb’, ‘wv02_ms’, ‘pan'}: the type of imagery you are processing.
1. 'srgb': RGB imagery taken by a typical camera
2. 'wv02_ms': DigitalGlobe WorldView 2 multispectral imagery,
3. 'pan': High resolution panchromatic imagery

Optional Arguments

--output_dir: Directory path for output images.
-s | --splits: The number of times to split the input image for improved processing speed. This is rounded to the nearest square number. Default = 9.
-v | --verbose: Display text information and progress of the script.

Watershed.py

This script loads the output of Splitter.py, and segments the image using an edge detection followed by watershed segmentation.

Positional Arguments

input_dir: Directory path of the input image.
filename: Name of the segmented image file (Splitter output: .h5).

Optional Arguments

--output_dir: Directory path for output files.
--histogram: Display histogram of pixel intensity values before segmentation.
-c | --color: Save a color (rgb) version of the input image.
-t | --test: Inspect segmentation results results before saving output files.
-v | --verbose: Display text information and progress of the script.

RandomForest.py

Classified the segmented image (output of Watershed.py) using a Random Forest machine learning algorithm. Training data can be created on a segmented image using the GUI in training_gui.py.

Positional Arguments

input_filename: Directory and filename of image watersheds to be classified.
training_dataset: Directory and filename of the training dataset (.h5)
training_label: name of training classification list

Optional Arguments

-q | --quality: Display a quality assessment (OOB score and attribute importance) of the training dataset.
--debug: Display one classified subimage at a time, with the option of quitting after each.
-v | --verbose: Display text information and progress of the script.

training_gui.py

Positional Arguments:

input: In mode 1 this is a folder containing the training images. In mode 2, the input is a classified image file (.h5).
image type: {‘srgb’, ‘wv02_ms’, ‘pan'}: the type of imagery you are processing.
1. 'srgb': RGB imagery taken by a typical camera
2. 'wv02_ms': DigitalGlobe WorldView 2 multispectral imagery,
3. 'pan': High resolution panchromatic imagery
-m | --mode: {1,2}. How you would like to the training GUI. 1: create a training dataset from folder of raw images. 2: assess the accuracy of a classified image (output of RandomForest.py).

Optional arguments:

--tds_file: Only used for mode 1. Existing training dataset file. Will create a new one with this name if none exists. Default = <image_type>_training_data.h5.
-s | --splits: The number of times to split the input image for improved processing speed. This is rounded to the nearest square number. Default = 9.

Contact

Nicholas Wright

alan19922015 / ossp Goto Github PK

ossp's Introduction

OSSP

Open Source Sea-ice Processing

Open Source Algorithm for Detecting Sea Ice Surface Features in High Resolution Optical Imagery

Nicholas Wright and Christopher Polashenski

Introduction

Dependencies

Usage

batch_process_mp.py

Required Arguments

Optional Arguments

Notes:

Splitter.py

Positional Arguments

Optional Arguments

Watershed.py

Positional Arguments

Optional Arguments

RandomForest.py

Positional Arguments

Optional Arguments

training_gui.py

Positional Arguments:

Optional arguments:

Contact

ossp's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org