Git Product home page Git Product logo

earthspy's People

Contributors

adrienwehrle avatar lgtm-com[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

earthspy's Issues

Generalize reprojection with Sentinelhub to_utm_bbox

Default reprojection until now was EPSG:3413 (because I initially developed earthspy for monitoring in the Arctic). Sinergise developed a method for the detection of the UTM zone and conversion of the bbox coordinates to this new EPSG. Too useful not to use it!

Add Landsat constellation

At the moment download is focused on the Sentinel constellation. The data of the Landsat constellation is ready to use with the sentinelhub Python package, so let's add it to earthspy!

bounding_box

for bounding_box, documentation should indicate whether bounding_box parameters are upper left, lower right?

Only target request files for raster merge

At the moment all files acquired on the same date are merged in the store folder. Only files in the current request should be, as there could be other files from previous requests (which shouldn't be modified).

[WinError 32] The process cannot access the file because it is being used by another process

Hello,

I get these error after downloading is done

Traceback (most recent call last):

  File "C:\Users\dba.cihad.arslanoglu\.conda\envs\tbk_v2\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "c:\users\dba.cihad.arslanoglu\desktop\tarimsal_beyan_kontrol_v3\download_images.py", line 48, in <module>
    response = job.send_sentinelhub_requests()

  File "C:\Users\dba.cihad.arslanoglu\.conda\envs\tbk_v2\lib\site-packages\earthspy\earthspy.py", line 861, in send_sentinelhub_requests
    self.merge_rasters()

  File "C:\Users\dba.cihad.arslanoglu\.conda\envs\tbk_v2\lib\site-packages\earthspy\earthspy.py", line 1089, in merge_rasters
    os.remove(file)

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\cihad\\tmp/2022-04-01_SENTINEL2_L2A_0.tif'

Thanks for your help.

Return mosaic as local variables

Split boxes are returned as a 4D array in local variable, not the entire mosaic at once. Fix to return the entire mosaic after creation.

Fill in docstrings

Initial docstrings automatically generated in Sphinx format, now fill out the descriptions.

Creating just one image with time intervals

When we set time intervals like ["2022-04-01", "2022-04-15"], it returns a lot of scene images. However, I just want to have one image which includes the all bounding box.

As you can see there are many images, can we merge them and get just one image ? Maybe we can do it when mergin the patches as well ?
image

Here code I used

import earthspy.earthspy as es
import cv2

script = """

  //VERSION=3 (auto-converted from 1)

//Basic initialization setup function
function setup() {
  return {
	//List of all bands, that will be used in the script, either for visualization or for choosing best pixel
    input: [{
      bands: [
         "B04",
         "B08",
         "SCL"
      ]
    }],
	//This can always be the same if one is doing RGB images
    output: { bands: 1 },
    mosaicking: "ORBIT"
  }
}

/*
In this function we limit the scenes, which are used for processing. 
These are based also on input variables, coming from Playground. 
E.g. if one sets date "2017-03-01" ("TO date") and cloud coverage filter 30%, 
all scenes older than 2017-03-01 with cloud coverage 30% will be checked against
further conditions in this function.
The more scenes there are, longer it will take to process the data.
After 60 seconds of processing, there will be a timeout.
*/

function filterScenes (scenes, inputMetadata) {
    return scenes.filter(function (scene) {
//Here we limit data between "(TO date - 1 month) to (TO date)
	  return scene.date.getTime()>=(inputMetadata.to.getTime()-1*31*24*3600*1000) ;
    });
}

function calcNDVI(sample) {
  var denom = sample.B04+sample.B08;
  switch(sample.SCL){
      
      // Unclassified (dark grey)
    case 7: return [0];
    
    // Cloud medium probability (grey)
    case 8: return [0];
        
    // Cloud high probability (white)
    case 9: return [0];
      }
  return ((denom!=0) ? (sample.B08-sample.B04) / denom : 0.0);
}
function evaluatePixel(samples) {  
  var max = 0;
  for (var i=0;i<samples.length;i++) {
      var ndvi = calcNDVI(samples[i]);
    max = ndvi > max ? ndvi:max;
  }



if (max>0.53) return [1];
else return [0]

}

"""

job = es.EarthSpy("auth.txt")

job.set_query_parameters(
    bounding_box=[
  37.836908,
  36.527295,
  43.626709,
  38.718198
],
    time_interval=["2022-04-01", "2022-04-15"],
    evaluation_script=script,
    data_collection="SENTINEL2_L2A",
    download_mode = 'SM',
    store_folder = 'images',
    save_name = 'test'
    
)
image = job.send_sentinelhub_requests()

Unbound local variable error on readme example run

I'm getting an error when trying out a slightly modified version of the readme example:

Code

import earthspy.earthspy as es

job = es.EarthSpy("auth.txt")

job.set_query_parameters(
    bounding_box=[17.0055,77.8204,17.1421, 77.866],
    time_interval=30,
    data_collection="SENTINEL2_L2A"
)

job.send_sentinelhub_requests()

Exception

> python run.py

Initial bounding box split into a (2, 2) grid
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/erik/Projects/SmallProjects/earthspy/earthspy/earthspy.py", line 645, in sentinelhub_r
equest
    self.outputs[f"{date_string}_{split_box_id}"] = outputs
UnboundLocalError: local variable 'outputs' referenced before assignment
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/erik/Projects/SmallProjects/earthspy/run.py", line 7, in <module>
    job.send_sentinelhub_requests()
  File "/home/erik/Projects/SmallProjects/earthspy/earthspy/earthspy.py", line 723, in send_sentinel
hub_requests
    for shb_requests in p.map(
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 771, in get
    raise self._value
UnboundLocalError: local variable 'outputs' referenced before assignment

Proposed solution

I'm pretty sure this is because self.outputs is referenced here:

self.outputs[f"{date_string}_{split_box_id}"] = outputs

but it's not initialized in the def __init__(...).
A solution is probably as easy as initializing the variable in the init. If I understand correctly, it's supposed to be a dictionary, right? If so, just add:

self.outputs = {}

in this function:

def __init__(self, CLIENT_credentials_file: str) -> None:

For bonus points with mypy, add type annotations!

self.outputs: dict[something, something_else] = {}

This will need another line in the top to make it work with python<3.10:

from __future__ import annotations

Download data at point of interest or at a given radius around point

At the moment, a bounding box must be given by user to download data. Some users expressed the wish to download data at a given point and its surroundings, which then requires to compute the coordinates at a given distance around the point of interest.

This could be implemented in earthspy by downloading given bounding_box a list of two values (corresponding to the coordinates of a point) instead of 4 (the 4 coordinates of the bounding box) and a new optional argument for the distance around point. If distance equals 0 (default), only return the very pixel.

Replace `osgeo_utils.gdal_merge` by `rasterio.merge`

The full installation of GDAL can be painful and is OS-dependent. Let's remove this direct dependency by using rasterio.merge instead of osgeo_utils.gdal_merge. See an example here. Of course rasterio is using GDAL but only the C library and not the Python binding, which is more efficient. Simple is better than complex!

Replace custom multiprocessing pipeline with `SentinelHubDownloadClient` multithreading

A multiprocessing strategy is currently set depending on the number of split boxes and days to download (because it doesn't make sense to send download dates in parallel if there are only e.g. two days, but each have e.g. 10 split boxes that are downloaded in sequence!).
However I realized I'm reinventing the wheel on that one: I didn't see a multithreading capability already exists in sentinelhub.SentinelHubDownloadClient! See below. Let's use that one instead, it will decrease code complexity and increase efficiency!

# create a list of requests
list_of_requests = [get_true_color_request(slot) for slot in slots]
list_of_requests = [request.download_list[0] for request in list_of_requests]

# download data with multiple threads
data = SentinelHubDownloadClient(config=config).download(list_of_requests, max_threads=5)

`validators` package not in install_requires

I'm trying out the package as I'm writing this! I'm installing it with pip as is recommended on the README, and I did not have validators installed, which raised a ModuleNotFoundError.

I suggest one of two things:

  1. Add validators to setup.cfg:
    gdal;python_version>='3.3.1'
  2. If the package is not necessary for normal use (e.g. it's only for testing), move the import outside of top-level so it doesn't have to be imported.

collection="LANDSAT_MSS_L1"

DownloadFailedException: Failed to download from:
https://services.sentinel-hub.com/api/v1/catalog/search
with HTTPError:
404 Client Error: Not Found for url: https://services.sentinel-hub.com/api/v1/catalog/search
Server response: "{"code":404,"description":"Collection not found."}"

    time_interval=["1995-08-08", "1995-08-08"], # MSS 1 
region="Qaleralik" ; region2=region
lat_n,lon_w=61.09215496, -46.87107682
lat_s,lon_e=60.99065626, -46.57056082

collection="LANDSAT_MSS_L1"

# for date in dates:
    
#     x=[date,date]
#     print(x)
   ## %%
    # as simple as it gets
job.set_query_parameters(
    bounding_box=[lon_w,lat_s, lon_e,lat_n],
    # time_interval=["2022-07-01", "2022-07-31"], # 2022-08-07 is a nice image
    # time_interval=["2022-08-01", "2022-08-19"], # 2022-08-07 is a nice image
    # time_interval=["2022-08-20", "2022-08-31"], # best? = 8-27
    # time_interval=["2022-09-04", "2022-09-04"], # Pamiut_N
    # time_interval=["2022-08-22", "2022-08-22"], # Pamiut_N
    # time_interval=["2022-08-27", "2022-08-27"], # Pamiut_N
    # time_interval=x,
    # time_interval=["2022-09-08", "2022-09-08"], # S2 
    time_interval=["1995-08-08", "1995-08-08"], # MSS 1 
    evaluation_script=eval_script,
    data_collection=collection,
    store_folder=path,
    remove_splitboxes=True
)

# and off it goes!
job.send_sentinelhub_requests()

eval_script = """

//VERSION=3

function evaluatePixel(samples) {
return val = [2.5samples.B04,
2.5
samples.B02,
2.5*samples.B01,
samples.dataMask];
}

function setup() {
return {
input: [{
bands: [
"B01",
"B02",
"B04",
"dataMask"
]
}],
output: {
bands: 4
}
}
}

"""

Use `save_data()` or `get_data()`, not both

At the moment get_data() is called by default during the download, then save_data() if store_folder is defined. Actually save_data() runs the data download in the background too and doesn't use the data previously processed with get_data(). So either use get_data() or save_data(), or even better get_data(save_data=True), but not both.

Don't merge correctly with spesific parameters

Patches are not merged when store_folder is image_tmp. However, if I set store_folder as 'tmp', there is no problem.

time_interval = ["2022-08-1", "2022-08-15"]
bounding_box = [
  12.341686,
  41.751644,
  12.682514,
  42.015427
]
data_collection = "SENTINEL2_L2A"
download_mode = 'SM'
script_name = 'min_ndvi_limited.txt'
tmp_store_folder = 'image_tmp'
job = earthspy.earthspy.EarthSpy("config/accounts/account.txt")



job.set_query_parameters(
    bounding_box=bounding_box,
    time_interval=time_interval,
    evaluation_script=script,
    data_collection=data_collection,
    download_mode = download_mode,
    store_folder = tmp_store_folder
    
)

response = job.send_sentinelhub_requests()


As you can see there is just one mosaic image but it should be 3
image

Add ROI JSON files

Initiate a small database of Region of Interests (ROIs) so users can simply set a footprint by calling the ROI's name. For instance, adding Ilulissat.json containing a bounding box (format [lon_min, lat_min, lon_max, lat_max]) would allow to pass bounding_box="Ilulissat" as shown below (example from README.org):

import earthspy.earthspy as es

# auth.txt should contain username and password (first and second row)
job = es.EarthSpy("/path/to/auth.txt")

# as simple as it gets
job.set_query_parameters(
    bounding_box="Ilulissat",
    time_interval=["2019-08-03", "2019-08-10"],
    evaluation_script="https://custom-scripts.sentinel-hub.com/custom-scripts/sentinel-2/true_color/script.js",
    data_collection="SENTINEL2_L2A",
)

# and off it goes!
job.send_sentinelhub_requests()

Add test files to check post download operations

File download itself could be tested but Github-based runners don't have several CPUs for the moment, therefore only a very small part of earthspy's capabilities would be tested. Instead, store small output examples and test post download operations on them.

Use the Catalog API to look for available scenes

Currently, the data set associated with a given request is:

  • affected to the outputs attribute (list)
  • then each element of the list is checked to remove empty ones (in the case where no scene was available in the bounding box, at the selected date)
  • and finally the data set is stored in geotiffs

This is relatively inefficient. Better would be to properly check for available data before the request and directly point at the existing scenes instead of scanning the entire period. This can be done with the Catalog API, examples here and there.

Combine request in one method

Implement es.request as the current es.set_query_parameters but including es.configure_connection and es.send_requests. It would only require one more argument (the authentication file). __init__ can be turned into a __call__.

Add png visuals

Write a method to produce simple, light png visuals for very superficial checks e.g. NRT.

Automatically detect data collection if custom script URL is specified

If a custom script is passed as a string, there is no way to guess the data collection on which it should be applied. However, the custom scripts on the Sentinel Hub data base are stored per satellite, it's therefore possible to detect the data collection without specifying it!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.