Git Product home page Git Product logo

odc-tools's Introduction

Build Status

DEA Prototype Code

  • AWS s3 tools
  • Rasterio from S3 investigations
  • Utilities for data visualizations in notebooks

Installation

This repository provides a number of small libraries and CLI tools.

Full list of libraries, and install instructions:

  • odc.ui tools for data visualization in notebook/lab
  • odc.index extra utils for working with datacube database
  • odc.geom geometry utils and prototypes
  • odc.algo algorithms (GeoMedian wrapper is here)
  • odc.io common IO utilities, used by apps mainly
  • odc.aws AWS/S3 utilities, used by apps mainly
  • odc.aio faster concurrent fetching from S3 with async, used by apps
  • odc.dscache experimental key-value store where key=UUID, value=Dataset
  • odc.dtools tools/experiments in the area of dask.distributed/dask<>datacube integration
  • odc.ppt parallel processing helper methods, internal lib

Installation requires using custom package repo https://packages.dea.ga.gov.au.

pip install --extra-index-url="https://packages.dea.ga.gov.au" \
  odc-ui \
  odc-index \
  odc-geom \
  odc-algo \
  odc-io \
  odc-aws \
  odc-aio \
  odc-dscache \
  odc-dtools

NOTE: on Ubuntu 18.04 default pip version is awfully old and does not support --extra-index-url command line option, so make sure to upgrade pip first: pip3 install --upgrade pip.

CLI Tools

Installation

  1. For cloud (AWS only)
    pip install --extra-index-url="https://packages.dea.ga.gov.au" odc-apps-cloud
    
  2. For cloud (GCP, THREDDS and AWS)
    pip install --extra-index-url="https://packages.dea.ga.gov.au" 'odc-apps-cloud[GCP,THREDDS]'
    
  3. For dc-index-from-tar (indexing to datacube from tar archive)
    pip install --extra-index-url="https://packages.dea.ga.gov.au" odc-apps-dc-tools
    

NOTE: cloud tools depend on aiobotocore which has a dependency on a specific version of botocore, boto3 also depends on a specific version of botocore as a result having both aiobotocore and boto3 in one environment can be a bit tricky. The easiest way to solve this is to install aiobotocore[awscli,boto3] before anything else, which will pull in a compatible version of boto3 and awscli into the environment.

pip install -U 'aiobotocore[awscli,boto3]'

Apps

  1. s3-find list S3 bucket with wildcard
  2. s3-to-tar fetch documents from S3 and dump them to tar archive
  3. gs-to-tar search GS for documents and dump them to tar archive
  4. dc-index-from-tar read yaml documents from tar archive and add them to datacube

Example:

#!/bin/bash

s3_src='s3://dea-public-data/L2/sentinel-2-nrt/**/*.yaml'

s3-find "${s3_src}" | \
  s3-to-tar | \
    dc-index-from-tar --env s2 --ignore-lineage

Fastest way to list regularly placed files is to use fixed depth listing:

#!/bin/bash

# only works when your metadata is same depth and has fixed file name
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/*/*/ARD-METADATA.yaml'

s3-find --skip-check "${s3_src}" | \
  s3-to-tar | \
    dc-index-from-tar --env s2 --ignore-lineage

When using Google Storage:

#!/bin/bash

# Google Storage support
gs-to-tar --bucket data.deadev.com --prefix mangrove_cover
dc-index-from-tar --protocol gs --env mangroves --ignore-lineage metadata.tar.gz

odc-tools's People

Contributors

kirill888 avatar ashoka1234 avatar gypsybojangles avatar omad avatar whatnick avatar tom-butler avatar alexgleith avatar jeremyh avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.