Git Product home page Git Product logo

earthaccess's Introduction

earthaccess

earthaccess, a python library to search, download or stream NASA Earth science data with just a few lines of code

DOI Art Designer: Allison Horst Package version Conda Versions Python Versions Documentation Status

earthaccess is a python library to search for, and download or stream NASA Earth science data with just a few lines of code.

Visit our documentation to learn more!

Try it in your browser without installing anything! Binder

Why earthaccess

During several workshops organized by NASA Openscapes, the need to provide easy-to-use tools to our users became evident. Open science is a collaborative effort; it involves people from different technical backgrounds, and the data analysis to solve the pressing problems we face cannot be limited by the complexity of the underlying systems. Therefore, providing easy access to NASA Earthdata regardless of the data storage location (hosted within or outside of the cloud) is the main motivation behind this Python library.

How to Get Started with earthaccess

How to install

To install earthaccess go to your terminal and install it using pip:

pip install earthaccess

Visit our quick start guide to learn how to install and see a simple example of using earthaccess.

Compatibility

Only Python 3.8+ is supported.

How to Contribute to earthaccess

If you want to contribute to earthaccess checkout the Contributing Guide.

Contributors

Contributors

earthaccess's People

Contributors

abarciauskas-bgse avatar andypbarrett avatar asteiker avatar battistowx avatar betolink avatar botanical avatar cgentemann avatar chuckwondo avatar danielfromearth avatar dependabot[bot] avatar doug-newman-nasa avatar itcarroll avatar jessicas11 avatar jhkennedy avatar jrbourbeau avatar jroebuck932 avatar jules32 avatar kvenkman avatar luzpaz avatar mfisher87 avatar pre-commit-ci[bot] avatar psarka avatar rsignell avatar rupesh2 avatar sherwin-14 avatar trey-stafford avatar vincentsarago avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

earthaccess's Issues

We need a cli (command line interface)

It would be handy to use the library in a stand-alone mode with a command line interface (CLI), this way we can cover more use cases. The idea is to support most if not all of the current features including search and access.

Credentials for the cloud provider GES_DISC are not available

Hi!

I'm Jorge and im triying to download a few satellite retrievals. When I run the code an error appeared saying "Credentials for the cloud provider GES_DISC are not available". I'm registered in Earth Data Search and I activated the application NASA GESDISC DATA ARCHIVE i dont know if i have to do something else to get that credentials.

this is the code im using:

from earthdata import Auth, DataGranules, Store

auth = Auth().login(strategy='interactive')

granules = DataGranules().concept_id("C1239966842-GES_DISC").temporal("2004-10-01", "2004-10-31").point(-3.64746,37.19949).get_all()

files = Store(auth).get(granules, local_path='./data')

and the code returns:

You're now authenticated with NASA Earthdata Login
Getting 36 granules, approx download size: 0.81 GB
Credentials for the cloud provider GES_DISC are not available

thank you ver much

Jorge

is there a detailed user manual?

such as:

  1. how to set the configuration content of .netrc file. EARTHDATA_USERNAME=[],EARTHDATA_PASSWORD=[]
  2. What are the optional parameters for variables? I want to search MOD021KM, how to set the short_name or sensor?
  3. ...

Use Short_name to search/query

The demos use concept_id to identify a particular dataset. It will be handy to add short_name as an option as an identifier.

Add new S3 credentials endpoint for Giovanni zarr store

A new s3 credentials endpoint for the GES DISC's Giovanni Zarr store is now available:
https://api.giovanni.earthdata.nasa.gov/s3credentials

We should make sure this is discoverable through earthaccess. Initially, the associated data collection, GPM_3IMERGHH v6 will have a new RelatedURL that points to documentation on how to access the store. So, we won't have any direct programmatic discovery means of going from the collection CMR record to the zarr store S3 URI until further work is done to extend CMR to support this. But it is great progress in the right direction for end-to-end zarr support.

search_data() function fails on unexpected keyword argument 'keyattr_enabled'

Attempting to use search_data() with the package readme example returns an unexpected keyword argument error. The granule search seems to work and the number of granules is returned, but the remainder of the function fails and the 'results' variable is not assigned. Any assistance here would be appreciated, thanks!

import earthaccess
auth = earthaccess.login()

some authentication lines hidden for privacy...

You're now authenticated with NASA Earthdata Login
Using token with expiration date: 06/17/2023
Using user provided credentials for EDL
results = earthaccess.search_data(
    short_name='ATL06',
    version="005",
    cloud_hosted=True,
    bounding_box=(-10, 20, 10, 50),
    temporal=("2020-02", "2020-03"),
    count=100
)

Granules found: 26

TypeError                                 Traceback (most recent call last)
Cell In[17], line 1
----> 1 results = earthaccess.search_data(
      2     short_name='ATL06',
      3     version="005",
      4     cloud_hosted=True,
      5     bounding_box=(-10, 20, 10, 50),
      6     temporal=("2020-02", "2020-03"),
      7     count=100
      8 )

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\earthaccess\api.py:107, in search_data(count, **kwargs)
    105 print(f"Granules found: {granules_found}")
    106 if count > 0:
--> 107     return query.get(count)
    108 return query.get_all()

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\earthaccess\search.py:541, in DataGranules.get(self, limit)
    539     else:
    540         cloud = False
--> 541     latest = list(
    542         DataGranule(granule, cloud_hosted=cloud)
    543         for granule in response.json()["items"]
    544     )
    545 else:
    546     latest = []

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\earthaccess\search.py:542, in <genexpr>(.0)
    539     else:
    540         cloud = False
    541     latest = list(
--> 542         DataGranule(granule, cloud_hosted=cloud)
    543         for granule in response.json()["items"]
    544     )
    545 else:
    546     latest = []

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\earthaccess\results.py:208, in DataGranule.__init__(self, collection, fields, cloud_hosted)
    202 def __init__(
    203     self,
    204     collection: Dict[str, Any],
    205     fields: Optional[List[str]] = None,
    206     cloud_hosted: bool = False,
    207 ):
--> 208     super().__init__(collection)
    209     self.cloud_hosted = cloud_hosted
    210     # TODO: maybe add area, start date and all that as an instance value

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\earthaccess\results.py:20, in CustomDict.__init__(self, collection, fields, cloud_hosted)
     14 def __init__(
     15     self,
     16     collection: Dict[str, Any],
     17     fields: Optional[List[str]] = None,
     18     cloud_hosted: bool = False,
     19 ):
---> 20     super().__init__(collection)
     21     self.cloud_hosted = cloud_hosted
     22     self.uuid = str(uuid.uuid4())

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\benedict\dicts\__init__.py:54, in benedict.__init__(self, *args, **kwargs)
     52     super().__init__(obj.dict(), **kwargs)
     53     return
---> 54 super().__init__(*args, **kwargs)

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\benedict\dicts\keyattr\keyattr_dict.py:11, in KeyattrDict.__init__(self, *args, **kwargs)
      9 self._keyattr_enabled = kwargs.pop("keyattr_enabled", True)
     10 self._keyattr_dynamic = kwargs.pop("keyattr_dynamic", False)
---> 11 super().__init__(*args, **kwargs)

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\benedict\dicts\keypath\keypath_dict.py:13, in KeypathDict.__init__(self, *args, **kwargs)
     11 super().__init__(*args, **kwargs)
     12 if check_keys:
---> 13     keypath_util.check_keys(self, self._keypath_separator)

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\benedict\dicts\keypath\keypath_util.py:22, in check_keys(d, separator)
     17     if key and type_util.is_string(key) and separator in key:
     18         raise ValueError(
     19             f"Key should not contain keypath separator {separator!r}, found: {key!r}."
     20         )
---> 22 traverse(d, check_key)

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\benedict\core\traverse.py:29, in traverse(d, callback)
     27 if not callable(callback):
     28     raise ValueError("callback argument must be a callable.")
---> 29 _traverse_collection(d, callback)

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\benedict\core\traverse.py:6, in _traverse_collection(d, callback)
      4 def _traverse_collection(d, callback):
      5     if type_util.is_dict(d):
----> 6         _traverse_dict(d, callback)
      7     elif type_util.is_list_or_tuple(d):
      8         _traverse_list(d, callback)

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\benedict\core\traverse.py:14, in _traverse_dict(d, callback)
     12 keys = list(d.keys())
     13 for key in keys:
---> 14     value = d.get(key, None)
     15     callback(d, key, value)
     16     _traverse_collection(value, callback)

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\benedict\dicts\__init__.py:158, in benedict.get(self, key, default)
    157 def get(self, key, default=None):
--> 158     return self._cast(super().get(key, default))

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\benedict\dicts\__init__.py:80, in benedict._cast(self, value)
     78 obj_type = type(self)
     79 if isinstance(value, dict) and not isinstance(value, obj_type):
---> 80     return obj_type(
     81         value,
     82         keyattr_enabled=self._keyattr_enabled,
     83         keyattr_dynamic=self._keyattr_dynamic,
     84         keypath_separator=self._keypath_separator,
     85         check_keys=False,
     86     )
     87 elif isinstance(value, list):
     88     for index, item in enumerate(value):

TypeError: DataGranule.__init__() got an unexpected keyword argument 'keyattr_enabled'

results


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 results

NameError: name 'results' is not defined

Cannot conda install on windows

Hi!

I can't install this package on windows using conda (mamba). The culprit is python-magic dependency, which is only available on osx & linux (https://anaconda.org/conda-forge/python-magic).

From the pyproject.toml it seems that python-magic is a dev dependency, so maybe it could be removed from the conda package dependencies?

Looking for: ["earthaccess[version='>=0.5.0']"]

conda-forge/win-64                                          Using cache
conda-forge/noarch                                          Using cache
Could not solve for environment specs
Encountered problems while solving:
  - nothing provides python-magic >=0.4 needed by earthaccess-0.5.0-pyhd8ed1ab_0

The environment can't be solved, aborting the operation

search_data() function does not recognize doi key

Attempting to use search_data() and a DOI string (here, using the function's own example of doi="10.5067/SLREF-CDRV2" returns an unknown key error. Any assistance here would be appreciated, thanks!

import earthaccess
auth = earthaccess.login()

some authentication lines hidden for privacy...

You're now authenticated with NASA Earthdata Login
Using token with expiration date: 06/17/2023
Using user provided credentials for EDL
datasets = earthaccess.search_data(
    doi="10.5067/SLREF-CDRV2",
    cloud_hosted=True,
    temporal=("2002-01-01", "2002-12-31")
)

ValueError                                Traceback (most recent call last)
Cell In[16], line 1
----> 1 datasets = earthaccess.search_data(
      2     doi="10.5067/SLREF-CDRV2",
      3     cloud_hosted=True,
      4     temporal=("2002-01-01", "2002-12-31")
      5 )

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\earthaccess\api.py:103, in search_data(count, **kwargs)
     65 def search_data(
     66     count: int = -1, **kwargs: Any
     67 ) -> List[earthaccess.results.DataGranule]:
     68     """Search dataset granules using NASA's CMR.
     69 
     70     [https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)
   (...)
    101         ```
    102     """
--> 103     query = DataGranules().parameters(**kwargs)
    104     granules_found = query.hits()
    105     print(f"Granules found: {granules_found}")

File ~\Anaconda3\envs\earthaccess\Lib\site-packages\earthaccess\search.py:342, in DataGranules.parameters(self, **kwargs)
    339 for key, val in kwargs.items():
    340     # verify the key matches one of our methods
    341     if key not in methods:
--> 342         raise ValueError("Unknown key {}".format(key))
    344     # call the method
    345     if isinstance(val, tuple):

ValueError: Unknown key doi

Test top 50 datasets

Right now the integration tests use randomly sampled datasets from different DAACs, this results in intermittent failures due EULAs/unavailability etc. A probably better approach would be to test only the most popular datasets (+IS2 data). A list of the most popular datasets from NASA is attached.
collection_short_names.txt

Document why signed S3 URLs might be giving 400s when called from inside us-west-2

Sometimes when you make a request to a URL behind earthdata login, after a series of redirects, you get sent to a signed S3 URL. This should be transparent to the client, as the URL itself contains all the authentication needed for access.

However, sometimes, in some clients, you get a generic 403 Forbidden here without much explanation. It has something to do with other auth being sent alongside (see #187 for more vague info).

We should document what this is, and why you get the 403. This documentation would allow developing workarounds for various clients if needed.

Connection error to ipinfo.io

When I try using the Auth() object I get a connection error from ipinfo.io

import earthdata

ed_auth = earthdata.Auth()

Result:

ConnectionError: HTTPSConnectionPool(host='ipinfo.io', port=443): Max retries exceeded with url: /ip (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f97ba644fa0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

My best guess is that JPL IT is blocking connections to ipinfo.io/ip from the JupyterLab instance I am trying to run from. I need an alternative way of providing credentials.

Document the exact HTTP flow when trying to access data behind earthdata login

We made an attempt at this in https://hackmd.io/T73AtFTnS4C_Ez9JfGNldA.

The goal is to try to describe exactly the HTTP requests and responses that happen whenever we try to access earthdata login protected data. This can and does differ based on the following criteria:

  1. Location of the client making the request (AWS us-west-2 or not seems to be the primary differentiator)
  2. Wether the data is on AWS or out of AWS
  3. Particular DAACs hosting the data and choices they have made
  4. The application responsible for doing the OAuth2 flow between the source of the data and earthdata - this could be TEA, Cumulus, or anything else really.

Doing this at the HTTP protocol layer really helps us understand where problems are and how we can solve them - not just in python, but in all other languages.

Running .search_data fails with TypeError

Hi, I've been using this API from its old name. Now I'm moving for the new version and I got a TypeError. The example code is:

 import earthaccess

auth = earthaccess.login(strategy="netrc")

results = earthaccess.search_data(
    short_name='MOD10A1F',
    bounding_box=(-72,-33,-69,-28),
    temporal=('2022-01-01','2023-01-01')
)

The error message I get is the following (including the print of the granules found):

Granules found: 1464
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 results = earthaccess.search_data(
      2     short_name='MOD10A1F',
      3     bounding_box=(-72,-33,-69,-28),
      4     temporal=('2022-01-01','2023-01-01')
      5 )

File [c:\ProgramData\Anaconda3\lib\site-packages\earthaccess\api.py:108](file:///C:/ProgramData/Anaconda3/lib/site-packages/earthaccess/api.py:108), in search_data(count, **kwargs)
    106 if count > 0:
    107     return query.get(count)
--> 108 return query.get_all()

File [c:\ProgramData\Anaconda3\lib\site-packages\cmr\queries.py:104](file:///C:/ProgramData/Anaconda3/lib/site-packages/cmr/queries.py:104), in Query.get_all(self)
     95 def get_all(self):
     96     """
     97     Returns all of the results for the query. This will call hits() first to determine how many
     98     results their are, and then calls get() with that number. This method could take quite
   (...)
    101     :returns: query results as a list
    102     """
--> 104     return self.get(self.hits())

File [c:\ProgramData\Anaconda3\lib\site-packages\earthaccess\search.py:541](file:///C:/ProgramData/Anaconda3/lib/site-packages/earthaccess/search.py:541), in DataGranules.get(self, limit)
...
     83     )
     84 elif isinstance(value, list):
     85     for index, item in enumerate(value):

TypeError: __init__() got an unexpected keyword argument 'keyattr_enabled'

I'm using Python 3.9.13 and earthaccess 0.5.0.

Error messages when user hasn't accepted dataset EULA can be confusing

We've identified two behaviors that make this user experience frustrating:

  • earthaccess.download(), when it encounters a 403 due to a user not accepting EULA. @jessnicwelch showed a way to reproduce in this thread. When this error happens, we need to display a message like: Access to this data has been denied because it requires you to accept a End-User License Agreement (EULA). Follow this link to view and accept the EULA: https://{the rest of it}. The URL we need to display should be contained in the 403 response data. This is what this ticket is about.

  • earthaccess.download() will continue after an error. New ticket for this: #581

Original description:

If we get a 302 redirect for a particular dataset, see if we can detect if the reason is the lack of an EULA (NASA requires users to approve an EULA for some datasets) and explain what happened.

Currently, users can receive a confusing message:

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: ...

Support pattern matching on arbitraty parameters for granule searches

Based on some feedback from CryosphereVirtualLab/public-notebooks#13 it would be great to add pattern matching to any parameter. With this enabled a user can search for patterns directly in any of the searchable fields i.e. the granule name: MODGRNLD.*.daily.* will filter only the daily granules for the MODGRNLD dataset, there are other use cases where this can be a handy thing to have and currently only the keyword parameter supports it out of the box. To enable pattern matching we need to explicitly add it to the options parameter: &options[readable_granule_name][pattern]=true etc.

Benedict throws "unexpected keyword argument 'keypath_separator'"

This was a clean install, using the tutorial as a base.

Google isn't giving me anything, I'm wondering if a dependancy has moved somewhere?

search_period = datetime.timedelta(days=20)
start_date = datetime.datetime.now() - search_period - datetime.timedelta(days=1)
end_date = datetime.datetime.now() - datetime.timedelta(days=1)

# Define the bounding box using the lat/long coordinates
bbox = (long1, lat1, long2, lat2)

time = '{}T00:00:00Z/{}T23:59:59Z'.format(start_date.strftime('%Y-%m-%d'), end_date.strftime('%Y-%m-%d')),
Query = earthaccess.collection_query()


Query.keyword('SAR').bounding_box(long1, lat1, long2, lat2).temporal(start_date.strftime('%Y-%m-%d'),end_date.strftime('%Y-%m-%d'))
print(f'Collections found: {Query.hits()}')
collections = Query.fields(['ShortName','Version'])

This throws the error:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[38], line 1
----> 1 collections.get(10)

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/earthaccess/search.py:247, in DataCollections.get(self, limit)
    245     latest = response.json()["feed"]["entry"]
    246 elif self._format == "umm_json":
--> 247     latest = list(
    248         DataCollection(collection, self._fields)
    249         for collection in response.json()["items"]
    250     )
    251 else:
    252     latest = [response.text]

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/earthaccess/search.py:248, in <genexpr>(.0)
    245     latest = response.json()["feed"]["entry"]
    246 elif self._format == "umm_json":
    247     latest = list(
--> 248         DataCollection(collection, self._fields)
    249         for collection in response.json()["items"]
    250     )
    251 else:
    252     latest = [response.text]

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/earthaccess/results.py:20, in CustomDict.__init__(self, collection, fields, cloud_hosted)
     14 def __init__(
     15     self,
     16     collection: Dict[str, Any],
     17     fields: Optional[List[str]] = None,
     18     cloud_hosted: bool = False,
     19 ):
---> 20     super().__init__(collection)
     21     self.cloud_hosted = cloud_hosted
     22     self.uuid = str(uuid.uuid4())

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/benedict/dicts/__init__.py:50, in benedict.__init__(self, *args, **kwargs)
     48     super().__init__(obj.dict(), **kwargs)
     49     return
---> 50 super().__init__(*args, **kwargs)

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/benedict/dicts/keypath/keypath_dict.py:14, in KeypathDict.__init__(self, *args, **kwargs)
     12 super().__init__(*args, **kwargs)
     13 if check_keys:
---> 14     keypath_util.check_keys(self, self._keypath_separator)

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/benedict/dicts/keypath/keypath_util.py:22, in check_keys(d, separator)
     17     if key and type_util.is_string(key) and separator in key:
     18         raise ValueError(
     19             f"Key should not contain keypath separator '{separator}', found: '{key}'."
     20         )
---> 22 traverse(d, check_key)

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/benedict/core/traverse.py:29, in traverse(d, callback)
     27 if not callable(callback):
     28     raise ValueError("callback argument must be a callable.")
---> 29 _traverse_collection(d, callback)

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/benedict/core/traverse.py:6, in _traverse_collection(d, callback)
      4 def _traverse_collection(d, callback):
      5     if type_util.is_dict(d):
----> 6         _traverse_dict(d, callback)
      7     elif type_util.is_list_or_tuple(d):
      8         _traverse_list(d, callback)

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/benedict/core/traverse.py:14, in _traverse_dict(d, callback)
     12 keys = list(d.keys())
     13 for key in keys:
---> 14     value = d.get(key, None)
     15     callback(d, key, value)
     16     _traverse_collection(value, callback)

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/benedict/dicts/__init__.py:140, in benedict.get(self, key, default)
    139 def get(self, key, default=None):
--> 140     return self._cast(super().get(key, default))

File ~/.pyenv/versions/3.9.16/lib/python3.9/site-packages/benedict/dicts/__init__.py:69, in benedict._cast(self, value)
     67 obj_type = type(self)
     68 if isinstance(value, dict) and not isinstance(value, obj_type):
---> 69     return obj_type(
     70         value, keypath_separator=self._keypath_separator, check_keys=False
     71     )
     72 return value

TypeError: __init__() got an unexpected keyword argument 'keypath_separator'

here's the full dependancy tree:

earthaccess==0.5.1
  - fsspec [required: >=2022.1, installed: 2023.4.0]
  - multimethod [required: >=1.8, installed: 1.9.1]
  - pqdm [required: >=0.1, installed: 0.2.0]
    - bounded-pool-executor [required: Any, installed: 0.0.3]
    - tqdm [required: Any, installed: 4.65.0]
    - typing-extensions [required: Any, installed: 4.5.0]
  - python-benedict [required: >=0.25,<0.28.3, installed: 0.28.2]
    - boto3 [required: >=1.24.89,<2.0.0, installed: 1.26.76]
      - botocore [required: >=1.29.76,<1.30.0, installed: 1.29.76]
        - jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
        - python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
        - urllib3 [required: >=1.25.4,<1.27, installed: 1.26.15]
      - jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
      - s3transfer [required: >=0.6.0,<0.7.0, installed: 0.6.0]
        - botocore [required: >=1.12.36,<2.0a.0, installed: 1.29.76]
          - jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
          - python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
            - six [required: >=1.5, installed: 1.16.0]
          - urllib3 [required: >=1.25.4,<1.27, installed: 1.26.15]
    - ftfy [required: >=6.0.0,<7.0.0, installed: 6.1.1]
      - wcwidth [required: >=0.2.5, installed: 0.2.6]
    - mailchecker [required: >=4.1.0,<6.0.0, installed: 5.0.8]
    - openpyxl [required: >=3.0.0,<4.0.0, installed: 3.1.2]
      - et-xmlfile [required: Any, installed: 1.1.0]
    - phonenumbers [required: >=8.12.0,<9.0.0, installed: 8.13.8]
    - python-dateutil [required: >=2.8.0,<3.0.0, installed: 2.8.2]
      - six [required: >=1.5, installed: 1.16.0]
    - python-fsutil [required: >=0.9.2,<1.0.0, installed: 0.10.0]
    - python-slugify [required: >=6.0.1,<8.0.0, installed: 7.0.0]
      - text-unidecode [required: >=1.3, installed: 1.3]
    - pyyaml [required: >=6.0,<7.0, installed: 6.0]
    - requests [required: >=2.26.0,<3.0.0, installed: 2.28.2]
      - certifi [required: >=2017.4.17, installed: 2022.12.7]
      - charset-normalizer [required: >=2,<4, installed: 3.1.0]
      - idna [required: >=2.5,<4, installed: 3.4]
      - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.15]
    - toml [required: >=0.10.2,<1.0.0, installed: 0.10.2]
    - xlrd [required: >=2.0.0,<3.0.0, installed: 2.0.1]
    - xmltodict [required: >=0.12.0,<1.0.0, installed: 0.13.0]
  - python-cmr [required: >=0.7, installed: 0.7.0]
    - requests [required: >=2.26.0,<3.0.0, installed: 2.28.2]
      - certifi [required: >=2017.4.17, installed: 2022.12.7]
      - charset-normalizer [required: >=2,<4, installed: 3.1.0]
      - idna [required: >=2.5,<4, installed: 3.4]
      - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.15]
  - requests [required: >=2.26,<3.0.0, installed: 2.28.2]
    - certifi [required: >=2017.4.17, installed: 2022.12.7]
    - charset-normalizer [required: >=2,<4, installed: 3.1.0]
    - idna [required: >=2.5,<4, installed: 3.4]
    - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.15]
  - s3fs [required: >=2021.11,<2024, installed: 2023.4.0]
    - aiobotocore [required: ~=2.5.0, installed: 2.5.0]
      - aiohttp [required: >=3.3.1, installed: 3.8.4]
        - aiosignal [required: >=1.1.2, installed: 1.3.1]
          - frozenlist [required: >=1.1.0, installed: 1.3.3]
        - async-timeout [required: >=4.0.0a3,<5.0, installed: 4.0.2]
        - attrs [required: >=17.3.0, installed: 22.2.0]
        - charset-normalizer [required: >=2.0,<4.0, installed: 3.1.0]
        - frozenlist [required: >=1.1.1, installed: 1.3.3]
        - multidict [required: >=4.5,<7.0, installed: 6.0.4]
        - yarl [required: >=1.0,<2.0, installed: 1.8.2]
          - idna [required: >=2.0, installed: 3.4]
          - multidict [required: >=4.0, installed: 6.0.4]
      - aioitertools [required: >=0.5.1, installed: 0.11.0]
        - typing-extensions [required: >=4.0, installed: 4.5.0]
      - botocore [required: >=1.29.76,<1.29.77, installed: 1.29.76]
        - jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
        - python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
        - urllib3 [required: >=1.25.4,<1.27, installed: 1.26.15]
      - wrapt [required: >=1.10.10, installed: 1.14.1]
    - aiohttp [required: !=4.0.0a1,!=4.0.0a0, installed: 3.8.4]
      - aiosignal [required: >=1.1.2, installed: 1.3.1]
        - frozenlist [required: >=1.1.0, installed: 1.3.3]
      - async-timeout [required: >=4.0.0a3,<5.0, installed: 4.0.2]
      - attrs [required: >=17.3.0, installed: 22.2.0]
      - charset-normalizer [required: >=2.0,<4.0, installed: 3.1.0]
      - frozenlist [required: >=1.1.1, installed: 1.3.3]
      - multidict [required: >=4.5,<7.0, installed: 6.0.4]
      - yarl [required: >=1.0,<2.0, installed: 1.8.2]
        - idna [required: >=2.0, installed: 3.4]
        - multidict [required: >=4.0, installed: 6.0.4]
    - fsspec [required: ==2023.4.0, installed: 2023.4.0]
  - tinynetrc [required: >=1.3.1,<2.0.0, installed: 1.3.1]
 

Add methods to return DOI and/or data citation

To make good data citation practices easier to follow, it would be great to have a method or methods that return the DOI and data citation for a dataset. This would allow citations to be printed in notebooks or exported to bibliographies.

>results.get_doi()
10.7265/jj4s-tq79
>results.get_citation(format="APA")
Walsh, J. E., W. L. Chapman, F. Fetterer, and J. S. Stewart. (2019). Gridded Monthly Sea Ice Extent and Concentration, 1850 Onward, Version 2 [Data Set]. Boulder, Colorado USA. National Snow and Ice Data Center. https://doi.org/10.7265/jj4s-tq79. Date Accessed 02-20-2023.
>results.get_citation(format="bibtex")
@misc{Walsh, J. E., W. L. Chapman, F. Fetterer, and J. S. Stewart._2019, title={Gridded Monthly Sea Ice Extent and Concentration, 1850 Onward, Version 2}, url={https://nsidc.org/data/G10010/versions/2}, DOI={10.7265/jj4s-tq79}, publisher={National Snow and Ice Data Center}, author={Walsh, J. E., W. L. Chapman, F. Fetterer, and J. S. Stewart.}, year={2019} }

auth.get_s3_credentials() requires a .netrc

I was experimenting with only using a Bearer token to authorize all requests (#188) but it seems s3 credentials endpoints like https://data.nsidc.earthdatacloud.nasa.gov/s3credentials do not recognize this so you get re-directed to URS servers that expect basic user:password authentication:

(on a machine without a ~/.netrc)

import earthaccess #0.5.0
auth = earthaccess.login('interactive')
auth.get_s3_credentials('NSIDC')
#You're now authenticated with NASA Earthdata Login
#Using token with expiration date: 04/22/2023
#Using user provided credentials for EDL
#https://data.nsidc.earthdatacloud.nasa.gov/s3credentials
#Authentication with Earthdata Login failed with:
#HTTP Basic: Access denied.

Implement simpler API signatures

The concept of collections vs granules and instantiating those classes can be confusing, especially for new users. Perhaps having a static method that can simplify things would be simpler to use e.g. (using the upcoming name)

import earthaccess
import xarray as xr

auth = earthaccess.login(strategy="netrc")

granules = earthaccess.get_granules(short_name="ATL06",
                                    cloud_hosted=True,
                                    polygon=((1,2),(3,4)... ),
                                    temporal=("2020-01-01","2020-12-31"))

ds = xr.open_mfdataset(earthaccess.open(granules, auth=auth))

This would be better suited for regional use cases since we'll be downloading the metadata from CMR in one go.
Workflows that may require bulk downloads could potentially use an iterator like

import earthaccess

auth = earthaccess.login(strategy="netrc")
query = earthaccess.search(short_name="ATL06",
                           cloud_hosted=True,
                           page_size=2000,
                           polygon=((1,2),(3,4)... ),
                           temporal=("2020-01-01","2020-12-31"))

for granules in query.items():
    # here granules is a resultset of up to 2000 granules.
    earthaccess.download(granules, "./data/ATL06/", auth=auth)

auth.login('netrc') not working

I'm running python 3.7.8 on Ubuntu 16.04 LTS.

auth.login(strategy='netrc')

returns False.

Some testing reveals that os.path.exists("~/.netrc") returns False.

The solution is to use os.path.exists(os.path.expanduser("~/.netrc"))

Alternatively, using netrc would do all of this.

Use fsspec for all file operations.

There are 2 ways of accessing NASA datasets (not using back-end services) :

  • On-prem datasets: if a dataset is hosted by a DAAC, the only way to get the data is using auth HTTPS requests with EDL credentials.
  • Cloud datasets:
    • direct access: we are in us-west-2 and can use S3:// links (s3fs or boto3) with temporary DAAC credentials
    • HTTPS access: we are not in us-west-2, we need to use signed HTTPS requests (same as on prem)

Since both are covered by fsspec, we only need to do a little research and see if the fsspec HTTPS adapter will work as is given the proper EDL credentials or we need inject our SessionWithHeaderRedirection class.
For S3 we need to find our if we can do parallel gets (in case the user wants to actually download the data)

Finally change the name of the Accessor class to Store

Use generators/iterator for granule query

If we are querying hundred of thousands of granules, we need to handle them in a more efficient way, perhaps implementing an iterator like

links = []
for g in Datagranules().concept_id("c-some-datasets").items():
    links.append(g.data_links())

This idea is that we'll only load the current page into memory instead of the whole number of granules in a collection query.

Add method to search for data using doi, spatial domain, temporal range contained in data citation object

Problem: for reproducible workflows users want to search for and download datasets or subsets of datasets used for analysis in a published paper/presentation. In most cases, I think this data can be references by some dataset identifier (e.g. doi), a spatial domain (bounding box, polygon, collection of points), and a temporal range or set of dates. This kind of reference could be contained in some dict-type or json-like object: e.g.

{
"doi": doi_for_dataset,
"bounding_box": [North. South, East, West]  # or polygon/shapefile or other standard vector form
"temporal_range": (datetime_start, datetime_end),
}

Simple and short objects can be cited directly in papers, or stored as files with their own doi. earthdata could take these objects as an input.

Maybe there is some standard format that can used. Pangeo-forge has something like this as a recipe. Maybe this is what we should be using. But for users not wanting to use cloud resources and just use earthdata this could provide a reproducible solution to getting data.

H:M:S for Date_To in the Temporal function defaults to 0:0:0

The H:M:S for the Date_To parameter in the Temporal function defaults to 0:0:0 when only the date is supplied (e.g. 2020-01-01 defaults to 2020-01-01T00:00:00). I think the H:M:S for the Date_To parameter should default to 23:59:59 unless specified by the user. This would match the behavior of NASA's Earthdata Search.

Connection error with NSIDC_ECS products

I've got issues trying to download data from NSIDC_ECS data center, any hints/fix?

#!/usr/bin/env python
from earthdata import Auth, DataGranules, Store

auth = Auth().login(strategy="netrc") 
Query = DataGranules(auth).short_name('MOD10A2').bounding_box(-73.73,-55.01,-67.05,-17.63).downloadable(True)
granules = Query.get(5)
store = Store(auth)

store.get(granules, "/path/to/MOD10A2/")

Error message:

You're now authenticated with NASA Earthdata Login
Using token with expiration date: 08/16/2022
 Getting 5 granules, approx download size: 0.0 GB
SUBMITTING | : 100%|█████████████████████████████| 5/5 [00:00<00:00, 760.91it/s]
PROCESSING | :   0%|                                      | 0/5 [00:00<?, ?it/s]Error while downloading the file MOD10A2.A2000049.h11v10.061.2020041202751.hdf
Error while downloading the file MOD10A2.A2000049.h12v13.061.2020041203737.hdf
Error while downloading the file MOD10A2.A2000049.h14v14.061.2020041204245.hdf
Error while downloading the file MOD10A2.A2000049.h11v11.061.2020041202514.hdf
Error while downloading the file MOD10A2.A2000049.h12v12.061.2020041203240.hdf
Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/earthdata/store.py", line 222, in _download_file
    r = session.head(url)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 564, in head
    return self.request('HEAD', url, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/earthdata/store.py", line 222, in _download_file
    r = session.head(url)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 564, in head
    return self.request('HEAD', url, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/earthdata/store.py", line 222, in _download_file
    r = session.head(url)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 564, in head
    return self.request('HEAD', url, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
PROCESSING | :  20%|██████                        | 1/5 [00:00<00:02,  1.45it/s]
Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/earthdata/store.py", line 222, in _download_file
    r = session.head(url)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 564, in head
    return self.request('HEAD', url, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/earthdata/store.py", line 222, in _download_file
    r = session.head(url)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 564, in head
    return self.request('HEAD', url, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/Users/aldotapia/opt/anaconda3/envs/ee/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

PROCESSING | : 100%|██████████████████████████████| 5/5 [00:00<00:00,  7.23it/s]
COLLECTING | : 100%|███████████████████████████| 5/5 [00:00<00:00, 97997.76it/s]

Document exactly why `curl` does not work by default when called from inside us-west-2 with data in AWS

If you make a request from inside us-west-2, the OAuth2 login process for earthdata login + redirect is a bit like this:

Original URL -> Earthdata Login OAuth2 -> Original URL (with a cookie for auth) -> Cloudfront -> Signed S3 URL.

The issue here is that some clients - notably curl, and probably aiohttp - do something that means S3 will reject the request with a 403 forbidden, and not much of an explanatory message. Information from Joseph H Kennedy and Luis (and others) is that the signed S3 URL is being rejected by AWS because it has additional authentication information (in the form of a cookie or headers) it does not understand.

So the issue is that the eventual signed s3 url somtimes gets two bits of authentication:

  1. The signed S3 bits, part of the query string
  2. The earthdata cookie (maybe? Or is it a header? But cookies are also headers soooo)

And S3 rejects this. This isn't manifest in all http clients, it is in some. We should use curl as a reference, and use it.

So we should figure out and document a sequence of curl commands, that when executed outside us-west-2, produce the data - and when executed inside us-west-2, do not produce the data. We can then go from here.

Add date formatter

CMR expects date in iso format. Need some way to allow dates like YYYY, YYYY-MM, or YYYY-MM-DD.

Validate UMM

Not even UMM guarantees that some basic fields will be in the metadata so we need to make sure we don't break the parsing methods i.e. C2021957657-LPCLOUD doesn't have the granule size where it should be.

Map DAACs to cloud providers when cloud_hosted is present.

Right now a user will have to know the different cloud providers beforehand in order to search for cloud hosted data, that's not necessary if we have the mapping to a cloud provider.

Bonus point, print a list of DAACs and their info using the DataCollection class.

Consider not depending on benedict

The python-benedict dependency is pretty heavy in terms of sub-dependencies it pulls in:

> conda-tree depends python-benedict
python
requests
pyyaml
python-dateutil
ftfy
mailchecker
phonenumbers
python-fsutil
python-slugify
toml
xmltodict
openpyxl
xlrd

I don't think you are relying on most of these, so it would be nice to replace this by something more lightweight. Should you be open to this, I may find some time to make a PR, let me know :)

Relax pins for dependencies (especially calver packages)

Hi @betolink ! Great work here, I'm trying to install earthdata but having a bit of trouble with version pins:

Encountered problems while solving:
  - package earthdata-0.1.1a6-pyhd8ed1ab_0 requires s3fs >=2021.8.1,<2022.0.0, but none of the providers can be installed

Makes it hard to install earthdata into existing environments with other packages. Perhaps these could be changed to minimum consrtaints (s3fs >= "^2021.8.1" )?

Add method to check the status of NASA APIs

Currently there is no way of checking the status of the APIs used by the library, it would be useful having a status() function (maybe in the module init) to verify that CMR and EDL are working and available.

Improve unit testing and documentation

Right now we don't have a lot of unit tests around the CMR logic, especially for handling corner cases where the metadata is wrong or missing. We also need to add more documentation on how to use the query class and the results.

[proposed enhancement] AWS us-west-2 checking method

In some of our DAAC notebooks, we include the following Boto3 snippet to check if the notebook is being executed inside us-west-2, and throws a ValueError (with emojis) if you are not, preventing the notebook from being fully executed:

if (boto3.client('s3').meta.region_name == 'us-west-2'):
    display(Markdown('### us-west-2 Region Check: &#x2705;'))
else:
    display(Markdown('### us-west-2 Region Check: &#10060;'))
    raise ValueError('Your notebook is not running inside the AWS us-west-2 region, and will not be able to directly access NASA Earthdata S3 buckets')

It may be useful to include a method that the user can call to check if they are in us-west-2 for direct S3 access and will throw an error like this, possibly using an fsspec transitive dependency, or the existing authorization checks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.