Git Product home page Git Product logo

kylebarron / naip-cogeo-mosaic Goto Github PK

View Code? Open in Web Editor NEW
36.0 3.0 1.0 26.11 MB

Serverless high-resolution aerial imagery map tiles from Cloud-Optimized GeoTIFFs for the U.S.

Home Page: https://kylebarron.dev/naip-cogeo-mosaic

License: MIT License

Python 45.50% HTML 8.82% CSS 4.16% JavaScript 41.53%
naip-imagery mosaic tiles lambda serverless aerial-imagery aws-lambda cogeo-mosaic cloud-optimized-geotiff python

naip-cogeo-mosaic's Introduction

naip-cogeo-mosaic

Serverless high-resolution NAIP map tiles from Cloud-Optimized GeoTIFFs for the lower 48 U.S. states.

Interactive example

60cm-resolution NAIP imagery of the Grand Canyon from 2017.

Overview

The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. All NAIP imagery between 2011 and 2018 is stored in an AWS S3 public dataset, and crucially the naip-visualization bucket stores images in Cloud-Optimized GeoTIFF (COG) format. Because this data format supports streaming portions of the image at a time, it enables serving a basemap of imagery on demand without needing to preprocess and store any imagery.

This repository is designed to create MosaicJSON files representing collections of NAIP imagery that can be used with titiler to serve map tiles on demand.

Using

If you'd like to get running quickly, you can use a built mosaicJSON file in the filled/ folder and skip down to "Deploy".

Otherwise, the following describes how to create a custom mosaicJSON file from specified years of NAIP imagery available on AWS.

Install

Clone the repository and install Python dependencies.

git clone https://github.com/kylebarron/naip-cogeo-mosaic
cd naip-cogeo-mosaic
conda env create -f environment.yml
source activate naip-cogeo-mosaic

If you prefer using pip, you can run

pip install awscli click pygeos 'rio-cogeo>=2.0' 'cogeo-mosaic>=3.0a5'

Select TIF URLs

This section outlines methods for selecting files that represent a country-wide mosaic of NAIP imagery, which can then be put into a MosaicJSON file for serving.

Download manifest.txt. This file has a listing of all files stored on the naip-visualization bucket.

aws s3 cp s3://naip-visualization/manifest.txt ./manifest.txt --request-payer

In the NAIP program, different states are photographed in different years, with a target of imaging all states within every 3 years. Here's an interactive map of when each state was photographed, (though it doesn't appear to include 2018 yet; this graphic shows which states were photographed in 2018).

All (lower 48) states were photographed between 2011-2013, and again in 2014-2015. All states except Maine were photographed in 2016-2017. All states except Oregon were photographed in 2017-2018.

Therefore, I'll generate four MosaicJSONs, with each spanning a range of 2011-2013, 2014-2015, 2015-2017, and 2016-2018. For the last two, I include an extra start year just for Maine/Oregon, but set each to use the latest available imagery, so only Maine takes imagery from 2015 and only Oregon takes imagery from 2016, respectively.

The following code block selects imagery for each time span. You can run python code/naip.py --help for a full description of available options.

python code/naip.py manifest \
    -s 2011 \
    -e 2013 \
    --select-method last \
    manifest.txt \
    | sed -e 's|^|s3://naip-visualization/|' \
    > urls_2011_2013.txt
python code/naip.py manifest \
    -s 2014 \
    -e 2015 \
    --select-method last \
    manifest.txt \
    | sed -e 's|^|s3://naip-visualization/|' \
    > urls_2014_2015.txt
python code/naip.py manifest \
    -s 2015 \
    -e 2017 \
    --select-method last \
    manifest.txt \
    | sed -e 's|^|s3://naip-visualization/|' \
    > urls_2015_2017.txt
python code/naip.py manifest \
    -s 2016 \
    -e 2018 \
    --select-method last \
    manifest.txt \
    | sed -e 's|^|s3://naip-visualization/|' \
    > urls_2016_2018.txt

Each output file includes one filename for each quad identifier, deduplicated across years.

> head -n 5 urls_2016_2018.txt
s3://naip-visualization/al/2017/100cm/rgb/30085/m_3008501_ne_16_1_20171018.tif
s3://naip-visualization/al/2017/100cm/rgb/30085/m_3008501_nw_16_1_20171006.tif
s3://naip-visualization/al/2017/100cm/rgb/30085/m_3008502_ne_16_1_20170909.tif
s3://naip-visualization/al/2017/100cm/rgb/30085/m_3008502_nw_16_1_20170909.tif
s3://naip-visualization/al/2017/100cm/rgb/30085/m_3008503_ne_16_1_20171017.tif

Additionally, files along state borders are deduplicated. Often, cells on state borders are duplicated across years. For example, this image is duplicated in both Texas's and Louisiana's datasets:

tx/2012/100cm/rgb/29093/m_2909302_ne_15_1_20120522.tif
la/2013/100cm/rgb/29093/m_2909302_ne_15_1_20130702.tif

As you can tell by the cell and name, these are the same position across different years. I deduplicate these to reduce load on the lambda function parsing the mosaicJSON.

To visualize the quads covered by a list of urls, you can visualize its footprint. For example, to get the mosaic footprint of Rhode Island from the 2011-2013 mosaic:

export AWS_REQUEST_PAYER="requester"
cat urls_2011_2013.txt \
    | grep "^s3://naip-visualization/ri/" \
    | cogeo-mosaic footprint - > footprint.geojson

And inspect it with kepler.gl:

kepler footprint.geojson

Here you can see that the tiles to be used in the mosaic of Rhode Island don't include the state's border. That's because the Python script to parse the manifest deduplicates tiles on the border when they're include in both states. If you looked at the footprint of Connecticut, you'd see the missing tiles on the border.

Total number of files

> wc -l urls_2011_2013.txt
  213197 urls_2011_2013.txt

Create MosaicJSON

NAIP imagery tiffs are in a requester pays bucket. In order to access them, you need to set the AWS_REQUEST_PAYER environment variable:

export AWS_REQUEST_PAYER="requester"

I also found that on an AWS EC2 instance; cogeo-mosaic create was failing while it was working on my local computer. In general, if cogeo-mosaic create isn't working for some URL; you should run rio info <URL> and see what the error is, since cogeo-mosaic uses rasterio internally, but doesn't currently print rasterio errors to stdout. In my case, I had to set the certificates path (see cogeotiff/rio-tiler#19, mapbox/rasterio#942).

export CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

I don't know how much data cogeo-mosaic create downloads (it only requests the GeoTIFF headers of each file), but it might be wise to run the mosaic creation on an AWS EC2 instance in the us-west-2 region (the same region where the NAIP imagery is located), so that you don't have to pay for egress bandwidth on the requests. I found that creating the mosaic took about 1.5GB of memory; it finished in about 7 hours per mosaic on a t2.small instance.

Then create the MosaicJSON file. GET requests are priced at $0.0004 per 1000 requests, so creating the MosaicJSON should cost 0.0004 * (200000 / 1000) = 0.08. 8 cents!

cat urls_2011_2013.txt \
    | cogeo-mosaic create - \
    > naip_2011_2013_mosaic.json
cat urls_2014_2015.txt \
    | cogeo-mosaic create - \
    > naip_2014_2015_mosaic.json
cat urls_2015_2017.txt \
    | cogeo-mosaic create - \
    > naip_2015_2017_mosaic.json
cat urls_2016_2018.txt \
    | cogeo-mosaic create - \
    > naip_2016_2018_mosaic.json

Fill in missing quadkeys

Some of these years have small missing areas. For example, in some years parts of Montana weren't photographed. fill_mosaic_holes.py is a simple script to fill mosaic quadkeys across years.

This following tells the script to look at all the mosaics data/*.json and create new filled scripts output to the filled/ folder.

python code/fill_mosaic_holes.py -o filled data/naip_201*.json

Note that this fills in entire quadkeys that are missing in one year but that exist in another. However, if a year is missing some areas, there will be quadkeys that exist but only have partial data. So without more effort there can still be some small holes in the data. See issue #8.

Deploy

The older cogeo-mosaic-tiler is being deprecated in favor of the newer, more stable titiler. Refer to titiler's documentation for deployment instructions. Note that since NAIP images are in a requester-pays bucket, you'll need to set AWS_REQUEST_PAYER="requester" in the environment.

Upload MosaicJSON files

In order for titiler to create your tiles on demand, it needs to access a MosaicJSON file, which you need to host somewhere accessible by titiler (preferably in the same AWS region).

Generally the simplest method is uploading the JSON file to S3. However since these files are so large (~64MB uncompressed), I found that it was taking 2.5s to load and parse the JSON.

As of v3, cogeo-mosaic (and thus also titiler) support alternate backends, such as DynamoDB. DynamoDB is a serverless database that makes loading the MosaicJSON fast, because the tiler only needs one or two reads, which each take around 10ms (as long as the DynamoDB table is in the same region as the tiler).

For full backend docs, see cogeo-mosaic's documentation.

DynamoDB

If you wish to connect the tiler to one or more DynamoDB tables, you need to deploy with

sls deploy --bucket your-mosaic-bucket --aws-account-id your-aws-account-id

You can find your AWS account ID with

aws sts get-caller-identity

To upload a MosaicJSON to DynamoDB, run:

pip install -U cogeo-mosaic>=3.0a5
cogeo-mosaic upload --url 'dynamodb://{aws_region}/{table_name}' mosaic.json

That uploads the MosaicJSON to the given table in the specified region, creating the table if necessary.

Proxy your endpoint with Cloudflare

I like to proxy through Cloudflare to take advantage of their free caching. You can read my blog post here to see how to do that.

Low Zoom Overviews

The NAIP imagery hosted on AWS in the naip-visualization bucket has full-resolution imagery plus 5 levels of internal overviews within each GeoTIFF. That means that it's fast to read image data for 6 or 7 zoom levels, but will slow down as you zoom out, since you'll necessarily need to read and combine many images, and perform downsampling on the fly.

The native zoom range for source NAIP COG imagery is roughly 12-18. That means that for one zoom 6 tile, you'd have to combine imagery for 4^6 = 4,096 COGs on the fly.

A way to solve this issue is to pregenerate lower zoom overviews given a mosaic. This removes some flexibility, as you have to choose upfront how to combine the higher-resolution images, but enables performant serving of lower-resolution imagery.

If you don't want to generate your own overviews, pregenerated overviews are available in a requester pays bucket of mine, and overview mosaics are available in the filled/ directory. My overview COGs are available at s3://rp.kylebarron.dev/cog/naip/deflate/.

This section outlines how to create these overviews, which are themselves COGs. After creating the overviews, you'll have fast low-zoom imagery, as you can see in this screenshot of Colorado.

Note, this code is experimental.

I want to have a seamless downsampled map. To do this I'll create overview COGs, and then create another mosaic from these downsampled images. For zooms 12 and up, the map will fetch images using the full-resolution mosaic; for zooms 6-12, the map will use the lower-resolution mosaic.

To make this mosaic, I'll create a new overview COG for each zoom 6 mercator tile. Then within a zoom 6 tile, only one source COG will need to be read.

First, split the large, U.S.-wide mosaic into mosaics for each zoom 6 quadkey, using a script in code/overviews.py.

python code/overviews.py split-mosaic \
    -z 6 \
    -o overview-mosaics \
    --prefix naip_2011_2013_ \
    filled/naip_2011_2013_mosaic.json
python code/overviews.py split-mosaic \
    -z 6 \
    -o overview-mosaics \
    --prefix naip_2014_2015_ \
    filled/naip_2014_2015_mosaic.json
python code/overviews.py split-mosaic \
    -z 6 \
    -o overview-mosaics \
    --prefix naip_2015_2017_ \
    filled/naip_2015_2017_mosaic.json
python code/overviews.py split-mosaic \
    -z 6 \
    -o overview-mosaics \
    --prefix naip_2016_2018_ \
    filled/naip_2016_2018_mosaic.json

This creates about 50 temporary mosaics for each country-wide mosaic, and just over 200 total. Each of these mosaics defines the source input imagery necessary to create one overview COG.

Then we need a loop that will create an overview image for each of the above temporary mosaics. This step uses the overview code in cogeo-mosaic.

I highly recommend to do this step on an EC2 instance in the same region as the NAIP data (us-west-2). Since NAIP data is in a requester pays bucket, if you do this step elsewhere, you'll have to pay egress fees for intermediate imagery taken out of the region. Additionally, this code is not yet memory- optimized, so you may need an instance with a good amount of memory.

# NAIP imagery is in a requester pays bucket
export AWS_REQUEST_PAYER="requester"
# Necessary on an EC2 instance
export CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
# Necessary to prevent expensive, unnecessary S3 LIST requests!
export GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR
mkdir out && cd out
python ../code/overviews.py create-overviews \
    `# Number of processes to use in parallel` \
    -j 2 \
    `# Input directory of temporary mosaics created above` \
    `# Output dir is current directory` \
    ../overview-mosaics

The output images should be in COG already, but to make sure I'll use rio-cogeo to convert them before final storage on S3. You should use a blocksize and overview-blocksize that matches the size of imagery you use in your website.

mkdir ../cog_deflate/
for file in $(ls *.tif); do
    rio cogeo create \
        --blocksize 256 \
        --overview-blocksize 256 \
        $file ../cog_deflate/$file;
done

Then upload your output images to an S3 bucket of yours, and finally, create new overview mosaics from these images. For full information, see its documentation (code/overviews.py create-overview-mosaic --help). As input you should pass a list of S3 urls to your overview COG files you just created. Don't rename the filenames before passing to the script.

naip-cogeo-mosaic's People

Contributors

dependabot[bot] avatar kylebarron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

cuulee

naip-cogeo-mosaic's Issues

Make sure generated mosaic is seamless

In the 2016-2018 mosaic, there are two sets of holes. One in the middle of Nevada, which presumably is forbidden from photography, and one in Montana:
image

This also shows up in the 2017 index/*.shp:
image

In 2015, there are two shapefiles within the index folder. Together it looks like they span Montana completely.

image
image

So you'll probably want to check each generated mosaic to make sure it has no holes. Here's the script:

from keplergl_cli import Visualize
import geojson
import mercantile
import json

with open('data/naip_2016_2018_mosaic.json') as f:
    data = json.load(f)

tiles = data['tiles']

features = []
for k, v in tiles.items():
    feature = mercantile.feature(mercantile.quadkey_to_tile(k))
    feature['properties']['length'] = len(v)
    features.append(feature)

fc = geojson.FeatureCollection(features)

Visualize(fc)

tile/mosaic assets reproduction code

image

from keplergl_cli import Visualize
import geojson
import mercantile
import json

with open('data/naip_2016_2018_mosaic.json') as f:
    data = json.load(f)

tiles = data['tiles']

features = []
for k, v in tiles.items():
    feature = mercantile.feature(mercantile.quadkey_to_tile(k))
    feature['properties']['length'] = len(v)
    features.append(feature)

fc = geojson.FeatureCollection(features)

Visualize(fc)

lengths = []
for k, v in tiles.items():
    lengths.append(len(v))

import pandas as pd
s = pd.Series(lengths)


x = 701
y = 1635
z = 12
tile = mercantile.Tile(x, y, z)
qk = mercantile.quadkey(tile)
tile_feature = mercantile.feature(tile)

import rasterio
import os

os.environ['AWS_REQUEST_PAYER'] = 'requester'

meta = []
for fname in tiles[qk]:
    with rasterio.open(fname) as src:
        meta.append({
            'crs': src.crs,
            'bounds': src.bounds
        })

from shapely.geometry import box, mapping
def bbox_to_geojson(bbox, src_crs):
    ll = bbox.left, bbox.bottom
    ur = bbox.right, bbox.top

    outProj = Proj('epsg:4326')

    ll_wgs84 = transform(src_crs, outProj, *ll)
    ur_wgs84 = transform(src_crs, outProj, *ur)

    # For some reason, x,y are reversed
    ll_wgs84 = ll_wgs84[::-1]
    ur_wgs84 = ur_wgs84[::-1]
    return {
        'type': "Feature",
        'geometry': mapping(box(*ll_wgs84, *ur_wgs84))
    }

features = []
for asset in meta:
    features.append(bbox_to_geojson(asset['bounds'], asset['crs']))

fc = {'type': 'FeatureCollection', 'features': features}
Visualize([tile_feature, fc])



src_crs = x['crs']
bbox = x['bounds']

x = meta[0]
ll = x['bounds'].left, x['bounds'].bottom
crs = x['crs']

from pyproj import Proj, transform

Proj.
inProj = Proj(init='epsg:3857')
x1,y1 = -11705274.6374,4826473.6922
x2,y2 = transform(inProj,outProj,x1,y1)
transform(crs, outProj, *ll)
print x2,y2

meta

f =
fname
raste

tiles[qk]

Struggling to deploy on my own endpoint

Hi,

feel free to close this without answering.. I'm also an Open Source maintainer and I sometimes don't have the time to provide support.

First, many thanks for all your open work on COGs and map related tools... As an user without much experience in those things it opens lots of possibilities to be able to serve my own tiles from some providers like NAIP ๐Ÿ˜Ž๏ธ

I read carefully your blog post and the README, but I'm struggling and I don't succeed to deploy my own endpoint for this.

It is the first time I work with Lambda and DynamoDB so I might be doing something wrong.. maybe you have an idea of something I'm doing wrong ?

Here is what I did:

  • cloned the cogeo-mosaic-tiler repo inside this one
  • make package
  • go back to root, deploy on Lambda with: sls deploy --region us-west-2 --bucket mybucket --aws-account-id my-aws-account-id replacing with my bucket and my aws account id

I get this back from the CLI telling my the lambda deploy went fine

Service Information
service: naip-cogeo-mosaic
stage: production
region: us-west-2
stack: naip-cogeo-mosaic-production
resources: 9
api keys:
  None
endpoints:
  ANY - https://8hrrigck3c.execute-api.us-west-2.amazonaws.com/{proxy+}
functions:
  app: naip-cogeo-mosaic-production-app

Then I uploaded naip_2016_2018_mosaic.json on DynamoDB (the one from the filled directory) using this command

cogeo-mosaic upload --url 'dynamodb://us-west-2/naip20162018' naip_2016_2018_mosaic.json

I took example from your live deployment to see if I can request the same tiles.. for example on your deployment this request leads to a tile: https://us-west-2-lambda.kylebarron.dev/naip/14/3105/6429.jpg?url=dynamodb%3A%2F%2Fus-west-2%2F94c61bd217e1211db47cf7f8b95bbc8e5e7d68a26cd9099319cf15f9

So I construct from there the same request adapted to my endpoint, I did this:

https://8hrrigck3c.execute-api.us-west-2.amazonaws.com/14/3105/6429.jpg?url=dynamodb://us-west-2/naip20162018

But the response is: {"message":"Internal Server Error"}

Then I went to see the logs of the lambda funcion, and it seems there is a python error:

[ERROR] Runtime.ImportModuleError: Unable to import module 'cogeo_mosaic_tiler.handlers.app': cannot import name 'tile' from 'rio_tiler.io.cogeo' (/var/task/rio_tiler/io/cogeo.pyc)

I'm gonna have a look at the https://github.com/developmentseed/cogeo-mosaic-tiler directly as I think the issue comes from there but if this rings any idea let me know.

Thanks again

Thibault

Performance

Test performance with

--tile-cover-sort --minzoom 13

(where you use 13 as the quadkey zoom, and then set the minzoom back to 12)

Check coverage in Montana

Am I missing tiles or is the backend broken?

You know, I'm guessing this has to do with how I'm filling-in tiles. I currently fill in when a tile doesn't exist. But here only part of the mercator tile isn't covered. So I think this is just due to some missing data in this part of Montana.

Not sure if a good way to fix it.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.