airbusgeo / osio Goto Github PK
View Code? Open in Web Editor NEWObject Storage IO wrapper for golang
License: Apache License 2.0
Object Storage IO wrapper for golang
License: Apache License 2.0
Hello Tom,
I tested a combo of http+zip for osio+godal but never plain HTTP. I guess I misread the following VSIKeyReader comment:
// When registering a reader with
// RegisterVSIHandler("scheme://",handler)
// calling Open("scheme://myfile.txt") will result in godal making calls to
// VSIReader("myfile.txt")
In the HTTP Handler, I assumed the full URL would be returned, instead it is called without the scheme (ie www.google.com
instead of https://www.google.com
).
I was wondering how you wanted this to be fixed?
For example:
httpr, _ := osio.HTTPHandle(ctx)
httpsr, _ := osio.HTTPSHandle(ctx)
httpa, _ := osio.NewAdapter(httpr)
httpsa, _ := osio.NewAdapter(httpsr)
godal.RegisterVSIAdapter("http://", httpa)
godal.RegisterVSIAdapter("https://", httpsa)
Thomas
Hi, I met a problem when reading ERDAS IMG and bmp file from amazon s3 using godal (v0.0.3) and osio (v0.0.3), the godal.Open function is called and the error is returned which shows "no such file or directory". However, reading tif/jpg/png from s3 and img/bmp from local path are both OK, so I guess maybe the osio library doesn't support reading img/bmp from s3.
Is there a plan to support img/bmp in the near future? If there isn't, do I need to change the source code to make it work.
Thank you so much.
Hello Thomas,
I would like to suggest an improvement over the current main interface KeyReaderAt
. Taking a look at the handler implementations, it appears most of them use io.ReadFull(r.Body, p)
to return the buffer to the adapter.
Taking advantage of that, I think we could define as well a KeyStreamerAt
interface:
type KeyStreamerAt interface {
StreamAt(key string, off int64, n int64) (io.ReadCloser, int64, error)
}
It is to io.SectionReader
what is osio.KeyReaderAt
to io.ReaderAt
.
The key idea is that when ranges are fetched, all mutexes are blocked whereas we could release them progressively to decrease the contention on other reads.
Example:
.BCDEF
for first read,AB
for second read.The second read needs to wait for the block range request to finish before serving the second range. With the new implementation, the Adapter can return sooner for the second read.
It gives something like that in my current implementation for the adapter:
if nToFetch == len(blocks) && a.canStream {
r, err := a.srcStreamAt(key, rng.start*a.blockSize, (rng.end-rng.start+1)*a.blockSize)
if err != nil {
for i := rng.start; i <= rng.end; i++ {
blockID := a.blockKey(key, i)
a.blmu.Unlock(blockID)
}
return nil, err
}
defer r.Close()
for bid := int64(0); bid <= rng.end-rng.start; bid++ {
blockID := a.blockKey(key, bid+rng.start)
buf := make([]byte, a.blockSize)
n, err := io.ReadFull(r, buf)
if err == io.ErrUnexpectedEOF {
err = io.EOF
}
if err == nil || err == io.EOF {
blocks[bid] = buf[:n]
a.cache.Add(key, uint(rng.start+bid), blocks[bid])
}
if err != nil {
for i := rng.start + bid; i <= rng.end; i++ {
a.blmu.Unlock(a.blockKey(key, i))
}
if err == io.EOF {
break
}
return nil, err
}
a.blmu.Unlock(blockID)
}
return blocks, nil
}
It could be backward compatible by simple checking if the handler implements the interface:
func NewAdapter(reader KeyReaderAt, opts ...AdapterOption) (*Adapter, error) {
_, canStream := reader.(KeyStreamerAt)
bc := &Adapter{
...
canStream: canStream,
}
...
}
Thomas
can you supper minio
Hello Thomas,
I noticed in some cases GDAL is falling back to VSICurl when using osio
. Not sure exactly where to start looking, I had no success in GDAL code base looking for such pattern.
Summary is that when osio
returns a 403 (with the geosjon
driver enabled, I cannot explain that), GDAL seems to be falling back to vsicurl
(or vsicurl_streaming
) and downloads the entire GeoTIFF file.
Given the following:
func TestHead(t *testing.T) {
ctx := context.Background()
hdl, _ := osio.HTTPHandle(ctx)
key := "https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx"
// godal
godal.RegisterRaster(godal.GTiff)
//godal.RegisterVector(godal.GeoJSON)
adp, _ := osio.NewAdapter(hdl)
godal.RegisterVSIHandler("http://", adp)
godal.RegisterVSIHandler("https://", adp)
// open
_, err := godal.Open(key)
assert.Nil(t, err)
}
Here is what I see (adding some logs in http.go
):
request [off: 0, len: 131072] https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
head request [err: <nil>, status: 403]
request [off: 0, len: 131072] https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
head request [err: <nil>, status: 403]
Basically, S3 returns 403 for the HEAD request and the code is logically failing.
However, when including the GeoJSON driver:
func TestHead(t *testing.T) {
ctx := context.Background()
hdl, _ := osio.HTTPHandle(ctx)
key := "https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx"
// godal
godal.RegisterRaster(godal.GTiff)
godal.RegisterVector(godal.GeoJSON)
adp, _ := osio.NewAdapter(hdl)
godal.RegisterVSIHandler("http://", adp)
godal.RegisterVSIHandler("https://", adp)
// open
_, err := godal.Open(key)
assert.Nil(t, err)
}
Here is what I am seeing:
request [off: 0, len: 131072] https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
head request [err: <nil>, status: 403]
request [off: 0, len: 131072] https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
head request [err: <nil>, status: 403]
GDAL: HTTP: Fetch(https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx)
GDAL: HTTP: libcurl/7.58.0 GnuTLS/3.5.18 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
GDAL: HTTP: These HTTP headers were set: Accept: text/plain, application/json
* Couldn't find host sentinel-s1-l1c.s3.amazonaws.com in the .netrc file; using defaults
* Trying 52.219.171.111...
* TCP_NODELAY set
* Connected to sentinel-s1-l1c.s3.amazonaws.com (52.219.171.111) port 443 (#0)
* found 128 certificates in /etc/ssl/certs/ca-certificates.crt
* found 387 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
* server certificate verification OK
* server certificate status verification SKIPPED
* common name: *.s3.amazonaws.com (matched)
* server certificate expiration date OK
* server certificate activation date OK
* certificate public key: RSA
* certificate version: #3
* subject: CN=*.s3.amazonaws.com
* start date: Wed, 15 Dec 2021 00:00:00 GMT
* expire date: Sat, 03 Dec 2022 23:59:59 GMT
* issuer: C=US,O=Amazon,OU=Server CA 1B,CN=Amazon
* compression: NULL
* ALPN, server did not agree to a protocol
> GET https://sentinel-s1-l1c.s3.amazonaws.com/GRD/2021/1/16/IW/DV/S1A_IW_GRDH_1SDV_20210116T002753_20210116T002818_036156_043D3C_9576/measurement/iw-vv.tiff?X-Amz-Signature=xxx
HTTP/1.1
Host: sentinel-s1-l1c.s3.amazonaws.com
Accept-Encoding: gzip
Accept: text/plain, application/json
< HTTP/1.1 200 OK
< x-amz-id-2: xxx
< x-amz-request-id: xxx
< Date: Thu, 10 Feb 2022 10:32:57 GMT
< x-amz-request-charged: requester
< Last-Modified: Sat, 06 Mar 2021 01:34:22 GMT
< ETag: "xxx"
< x-amz-storage-class: INTELLIGENT_TIERING
< Accept-Ranges: bytes
< Content-Type: image/tiff
< Server: AmazonS3
< Content-Length: 684991437
<
The issue here is that GDAL is downloading the whole file (no range request as you can see in the logs) whereas osio
returned a 403.
Happy to file an issue at osgeo/gdal if you think it is more appropriate and not due to osio behavior.
Thomas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.