Git Product home page Git Product logo

osio's People

Contributors

tbonfort avatar thomascoquet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

osio's Issues

Bug in HTTP Handler

Hello Tom,
I tested a combo of http+zip for osio+godal but never plain HTTP. I guess I misread the following VSIKeyReader comment:

// When registering a reader with
//  RegisterVSIHandler("scheme://",handler)
// calling Open("scheme://myfile.txt") will result in godal making calls to
//  VSIReader("myfile.txt")

In the HTTP Handler, I assumed the full URL would be returned, instead it is called without the scheme (ie www.google.com instead of https://www.google.com).

I was wondering how you wanted this to be fixed?

For example:

httpr, _ := osio.HTTPHandle(ctx)
httpsr, _ := osio.HTTPSHandle(ctx)

httpa, _ := osio.NewAdapter(httpr)
httpsa, _ := osio.NewAdapter(httpsr)

godal.RegisterVSIAdapter("http://", httpa)
godal.RegisterVSIAdapter("https://", httpsa)

Thomas

reading img and bmp file from s3 failed

Hi, I met a problem when reading ERDAS IMG and bmp file from amazon s3 using godal (v0.0.3) and osio (v0.0.3), the godal.Open function is called and the error is returned which shows "no such file or directory". However, reading tif/jpg/png from s3 and img/bmp from local path are both OK, so I guess maybe the osio library doesn't support reading img/bmp from s3.
Is there a plan to support img/bmp in the near future? If there isn't, do I need to change the source code to make it work.
Thank you so much.

[improvement] Stream blocks

Hello Thomas,

I would like to suggest an improvement over the current main interface KeyReaderAt. Taking a look at the handler implementations, it appears most of them use io.ReadFull(r.Body, p) to return the buffer to the adapter.

Taking advantage of that, I think we could define as well a KeyStreamerAt interface:

type KeyStreamerAt interface {
	StreamAt(key string, off int64, n int64) (io.ReadCloser, int64, error)
}

It is to io.SectionReader what is osio.KeyReaderAt to io.ReaderAt.

The key idea is that when ranges are fetched, all mutexes are blocked whereas we could release them progressively to decrease the contention on other reads.

Example:

  • GDAL needs range .BCDEF for first read,
  • GDAL needs range AB for second read.

The second read needs to wait for the block range request to finish before serving the second range. With the new implementation, the Adapter can return sooner for the second read.

It gives something like that in my current implementation for the adapter:

	if nToFetch == len(blocks) && a.canStream {
		r, err := a.srcStreamAt(key, rng.start*a.blockSize, (rng.end-rng.start+1)*a.blockSize)
		if err != nil {
			for i := rng.start; i <= rng.end; i++ {
				blockID := a.blockKey(key, i)
				a.blmu.Unlock(blockID)
			}
			return nil, err
		}
		defer r.Close()
		for bid := int64(0); bid <= rng.end-rng.start; bid++ {
			blockID := a.blockKey(key, bid+rng.start)
			buf := make([]byte, a.blockSize)
			n, err := io.ReadFull(r, buf)
			if err == io.ErrUnexpectedEOF {
				err = io.EOF
			}
			if err == nil || err == io.EOF {
				blocks[bid] = buf[:n]
				a.cache.Add(key, uint(rng.start+bid), blocks[bid])
			}
			if err != nil {
				for i := rng.start + bid; i <= rng.end; i++ {
					a.blmu.Unlock(a.blockKey(key, i))
				}
				if err == io.EOF {
					break
				}
				return nil, err
			}
			a.blmu.Unlock(blockID)
		}
		return blocks, nil
	}

It could be backward compatible by simple checking if the handler implements the interface:

func NewAdapter(reader KeyReaderAt, opts ...AdapterOption) (*Adapter, error) {
	_, canStream := reader.(KeyStreamerAt)
	bc := &Adapter{
		...
		canStream:       canStream,
	}
        ...
}

Thomas

GDAL fallback to vsicurl

Hello Thomas,

I noticed in some cases GDAL is falling back to VSICurl when using osio. Not sure exactly where to start looking, I had no success in GDAL code base looking for such pattern.

Summary is that when osio returns a 403 (with the geosjon driver enabled, I cannot explain that), GDAL seems to be falling back to vsicurl (or vsicurl_streaming) and downloads the entire GeoTIFF file.


Given the following:

func TestHead(t *testing.T) {
	ctx := context.Background()
	hdl, _ := osio.HTTPHandle(ctx)
	key := "https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx"

	// godal
	godal.RegisterRaster(godal.GTiff)
	//godal.RegisterVector(godal.GeoJSON)
	adp, _ := osio.NewAdapter(hdl)
	godal.RegisterVSIHandler("http://", adp)
	godal.RegisterVSIHandler("https://", adp)

	// open
	_, err := godal.Open(key)
	assert.Nil(t, err)
}

Here is what I see (adding some logs in http.go):

request [off: 0, len: 131072] https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
head request [err: <nil>, status: 403]

request [off: 0, len: 131072] https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
head request [err: <nil>, status: 403]

Basically, S3 returns 403 for the HEAD request and the code is logically failing.


However, when including the GeoJSON driver:

func TestHead(t *testing.T) {
	ctx := context.Background()
	hdl, _ := osio.HTTPHandle(ctx)
	key := "https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx"

	// godal
	godal.RegisterRaster(godal.GTiff)
	godal.RegisterVector(godal.GeoJSON)
	adp, _ := osio.NewAdapter(hdl)
	godal.RegisterVSIHandler("http://", adp)
	godal.RegisterVSIHandler("https://", adp)

	// open
	_, err := godal.Open(key)
	assert.Nil(t, err)
}

Here is what I am seeing:

request [off: 0, len: 131072] https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
head request [err: <nil>, status: 403]

request [off: 0, len: 131072] https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
head request [err: <nil>, status: 403]

GDAL: HTTP: Fetch(https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx)
GDAL: HTTP: libcurl/7.58.0 GnuTLS/3.5.18 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
GDAL: HTTP: These HTTP headers were set: Accept: text/plain, application/json
* Couldn't find host sentinel-s1-l1c.s3.amazonaws.com in the .netrc file; using defaults
*   Trying 52.219.171.111...
* TCP_NODELAY set
* Connected to sentinel-s1-l1c.s3.amazonaws.com (52.219.171.111) port 443 (#0)
* found 128 certificates in /etc/ssl/certs/ca-certificates.crt
* found 387 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification OK
*        server certificate status verification SKIPPED
*        common name: *.s3.amazonaws.com (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=*.s3.amazonaws.com
*        start date: Wed, 15 Dec 2021 00:00:00 GMT
*        expire date: Sat, 03 Dec 2022 23:59:59 GMT
*        issuer: C=US,O=Amazon,OU=Server CA 1B,CN=Amazon
*        compression: NULL
* ALPN, server did not agree to a protocol
> GET https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
 HTTP/1.1
Host: sentinel-s1-l1c.s3.amazonaws.com
Accept-Encoding: gzip
Accept: text/plain, application/json

< HTTP/1.1 200 OK
< x-amz-id-2: xxx
< x-amz-request-id: xxx
< Date: Thu, 10 Feb 2022 10:32:57 GMT
< x-amz-request-charged: requester
< Last-Modified: Sat, 06 Mar 2021 01:34:22 GMT
< ETag: "xxx"
< x-amz-storage-class: INTELLIGENT_TIERING
< Accept-Ranges: bytes
< Content-Type: image/tiff
< Server: AmazonS3
< Content-Length: 684991437
< 

The issue here is that GDAL is downloading the whole file (no range request as you can see in the logs) whereas osio returned a 403.


Happy to file an issue at osgeo/gdal if you think it is more appropriate and not due to osio behavior.

Thomas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.