icesat2py / icepyx Goto Github PK

View Code? Open in Web Editor NEW

205.0 17.0 101.0 247.77 MB

Python tools for obtaining and working with ICESat-2 data

Home Page: https://icepyx.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

icesat-2 community-driven python3 hacktoberfest closember

icepyx's Introduction

icepyx

Python tools for obtaining and working with ICESat-2 data

Latest release (main branch):

Current development version (development branch):

Origin and Purpose

icepyx is both a software library and a community composed of ICESat-2 data users, developers, and the scientific community. We are working together to develop a shared library of resources - including existing resources, new code, tutorials, and use-cases/examples - that simplify the process of querying, obtaining, analyzing, and manipulating ICESat-2 and (via the QUEST module) relevant ancillary datasets to enable scientific discovery.

icepyx aims to provide a clearinghouse for code, functionality to improve interoperability, documentation, examples, and educational resources that tackle disciplinary research questions while minimizing the amount of repeated effort across groups utilizing similar datasets. icepyx also hopes to foster collaboration, open-science, and reproducible workflows by integrating and sharing resources.

Many of the underlying tools from which icepyx was developed began as Jupyter Notebooks developed for and during the cryosphere-themed ICESat-2 Hackweek at the University of Washington in June 2019 or as scripts written and used by the ICESat-2 Science Team members. Originally called icesat2py, the project combined and generalized these scripts into a unified framework, adding examples, documentation, and testing where necessary and making them accessible for everyone. Similar scripts were conceived at the 2021 Hackweek, which in turn led to the development of QUEST. icepyx is now a domain-agnostic, standalone software package and community (under the broader icesat2py GitHub organization) that continues to build functionality for obtaining and working with ICESat-2 data products locally and in the cloud. It also improves interoperability for ICESat-2 datasets with other open-source tools.

Installation

The simplest way to install icepyx is by using the conda package manager.

conda install icepyx

Alternatively, you can also install icepyx using pip.

pip install icepyx

More detailed instructions for installing icepyx can be found at https://icepyx.readthedocs.io/en/latest/getting_started/install.html

Examples (Jupyter Notebooks)

Listed below are example Jupyter notebooks for working with ICESat-2 (IS2).

IS2_data_access

IS2_data_access2_subsetting

IS2_data_variables

IS2_data_visualization

IS2_data_read-in

IS2_cloud_data_access

QUEST_Finding_Argo_IS2_data

Citing icepyx

This community and software is developed with the goal of supporting science applications. Thus, our contributors (including those who have developed the packages used within icepyx) and maintainers justify their efforts and demonstrate the impact of their work through citations. Please see CITATION.rst for additional citation information.

Contact

Working with ICESat-2 data and have ideas you want to share? Have a great suggestion or recommendation of something you'd like to see implemented and want to find out if others would like that tool too? Come join the conversation at: https://discourse.pangeo.io/. Search for "icesat-2" under the "science" topic to find us.

Contribute

We welcome and invite contributions to icepyx from anyone at any career stage and with any amount of coding experience! Check out our contribution guidelines to see how you can contribute.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Research notice

Please note that this repository is participating in a study into sustainability of open source projects. Data will be gathered about this repository for approximately the next 12 months, starting from June 2021.

Data collected will include number of contributors, number of PRs, time taken to close/merge these PRs, and issues closed.

For more information, please visit the informational page or download the participant information sheet.

icepyx's People

Contributors

Stargazers

Watchers

Forkers

dr-vibhora whigg norlandrhagen shashankbice fperez bigdatasciencegroup dshean liuzheng-arctic arrran stoormgeo scottyhq icetianli tsutterley dzhang-pmel jzmejia edoyle9 friedrichknuth joeymartin888 charliehewitt1 wallinb nicholas-kotlinski and03010 lindalchen jedbrown salvis2 lheagy facusapienza21 mayakbecker jhon-dong trevorskaggs dpyles97 rtilling tedmaksym mollymwieringa jessicas11 nabib romip kelseybisson aaarendt adrienwehrle carlos-alberto-silva sharadgupta27 benhills andypbarrett jamieizzard asteiker desireetreichler whyjz lsetiawan rgfell bludka beccakelly7 bsaadia wsauthoff zysal sarahwcooley shafeequeigsnrr yochannah jcbw fliphilipp sharanb02 gerardoku ajaysbsc jpswinski xinluo2018 betolink cryosphere-hydrology slhowardesr gjvanleeuwen saptisunil learn2phoenix nitin-ravinder noahcui zachghiaccio mlinds zhln-0608 zilhe karinazikan hdfeos udaykumar0711 rhospatial daisybby daisybby2 sudheesanamp2003 tian-dandan michaleaking andreazacchei sayedabdom huxainsen helianglen ihowat linxiongecu menakarevel edward1322 cloveryww wanwanliang licongren kristiankjeldsen dawenkaka

icepyx's Issues

outline desired functions

lets create a list of the types of tools/functions we'd all like to see developed, with inputs and outputs.

Tidier git history using squash merge

Awesome to see so many new contributors joining icepyx! I'm sure we'll all get some great science done, but if I could make one small request - which is to keep the git history nice looking going forward. This is what the graph looks like at the moment:

It's fine to do merge commits when it's just one or two people working on a repository, but bigger projects with tens if not hundreds of contributors use squash merge commits. This keeps the git history linear looking, and we don't have big diffs from every single commit (especially important since jupyter notebooks can run into several MBs).

update docs with Discourse info and meeting info

update and clarify our contact page and provide invitation and notice of regular meetings

add and populate release notes

example gallery

@norlandrhagen had the great idea to put our examples in a gallery, similar to scikit-learn.

Enhancement: Add netrc for authentication

Currently, authentication is handled via manual user input.

https://github.com/icesat2py/icepyx/blob/master/icepyx/core/Earthdata.py#L88 which uses https://docs.python.org/2/library/getpass.html

Suggest enhancement to use netrc library to read and pass credentials. This will enable programmatic authentication.

Simplify git workflow to just be on one 'main' branch

Could we just have one default 'main' branch, and delete the 'development' branch? I don't want everyone to spend time re-learning stuff, but since a major change is happening anyway at #130, now might be a good time to raise this issue.

Current state:

Contributor works on a PR at branch 'feature-X', which is reviewed and merged into 'development'
At release, a PR from 'development' to 'master/main' is created, with a very big diff to review (see e.g. #119).
Once merged, a git tag needs to be created, and that triggers an upload to PyPI.

Future state:

Contributor works on a PR at branch 'feature-X', which is reviewed and merged into 'master/main'
At release, we just create a tag on the 'master/main' branch (e.g. v0.3.0, v0.4.0, etc) and that gets uploaded to PyPI.

The intention here is to reduce the burden on maintainers (specifically @JessicaS11) having to review code twice, and make sure that everyone points to 'development' all the time.

To be honest, I'm probably lacking context on why the master/development workflow was setup like so. If someone is after a stable release, wouldn't it be more likely that they would just use pip install icepyx instead of installing from Github?

Problem selecting region that crosses the dateline (180)

When trying to select a subsetting region that crosses the dateline, i get an error : invalid bounding box longitudes.

the basic code:

from icepyx import icesat2data as ipd
import os
import shutil
from pprint import pprint
%matplotlib inline

short_name = 'ATL06'
spatial_extent = [162.0, -78.95, -156.5, -75.7]
date_range = ['2018-10-14','2019-01-30']

region_a = ipd.Icesat2Data(short_name, spatial_extent, date_range)

the same code works fine for spatial_extent = [162.0, -78.95, 180 , -75.7] or spatial_extent = [-175.77, -79.2, -160.0, -75.88],

built-in visualization of bounding box breaks across datelin

When a bounding box that crosses the dateline is entered, the region_a.visualize_spatial_extent() function erroneously plots the box with the longitudes effectively switched. However, the information is correctly submitted to Earthdata for querying and ordering/downloading data.

helpful tips for contributors

As more people are starting submit code and examples, it's going to be increasingly important that things are independent and robust as far as links, file paths, etc. I think it would be helpful for newer contributors to have a set of helpful tips on some of these basic tasks that they can refer to. I'll work on getting a list together of some of the items I've encountered so far, but please feel free to post your ideas here. Let me know if you want to take the lead on setting up this document and adding it to the repo.

expand subsetting capabilities

Currently, it's possible to use NSIDC's subsetter to do temporal and spatial subsetting. ICESat-2 data has many other available subsetting options. If you're interested in expanding icepyx's subsetting capabilities, please let us know! Current next steps are:

making it easier for the user to see what subsetting options are available
letting the user know if they try to use a subsetting option that's not available for that dataset
sending other subsetting parameters to the subsetter in the correct format and adding relevant tests

Criteria and workflow for major changes and additions

Starting this thread for discussion on workflow and guidelines for major modifications to the package. Here are a few topics that came to mind when I started thinking on new feature implementations:

How to proceed when a feature addition requires significant modifications to stable classes/methods/submodules?
How to proceed when a new implementation changes/affects the overall strategy/approach used by the package?
How to proceed regarding suggestions of, for example, new data structures, file formats and strategies that have not been extensively tested or adopted?

In particular I want to bring attention to the distinction between what "seems" in principle an optimal way to operate with the data, and what actually "gets the job done" in practice (based on experience and use cases).

numpy scalars not accepted as validate inputs in /validate_inputs.py

During the hackweek, I observed that when I define the spatial extend using a bounding box, where each vertex is a numpy float, I receive the following error:

I can manually solve this issue if I convert spatial_extend in a list of floats. I think the issue is in line 66 of validate_inputs.py,

elif all(type(i) in [int, float] for i in spatial_extent):,

where we can see that in the conditional just Python integers and floats are allows. Maybe one way to solve this issue is to change this line by

all(isinstance(i, (np.int, np.float)) for i in spatial_extent) 

so now all types of float variables is admissible, including numpy floats. I also think line 37 has this same problem.

Happy to contunue the discussion!

Integrating with Valkyrie - ICESat + Operation IceBridge + ICESat-2

A single source of ice elevation data from 1993 to present?

Valkyrie was mentioned at one of the IcePyx team meetings last month, but I just stumbled on the work that @andypbarrett and others have been doing at https://github.com/nsidc/NSIDC-Data-Tutorials/tree/valkyrie.

For those who might not be so familiar with Valkyrie (I'm quite new to it too), here's some introductory links:

Introduce Valkyries before missions - nsidc/NSIDC-Data-Tutorials#4
https://nsidc.org/sites/nsidc.org/files/files/UWG%20Harmonized%20Products%20Final.pptx
https://github.com/nsidc/NSIDC-Data-Tutorials/blob/valkyrie/notebooks/Valkyrie/1_intro_valkyrie.ipynb

Question: Is there anyone in icepyx interested in this project? What are some common themes we can work on to benefit the cryosphere community?

Collate what's already been done and begin building documentation

Collect scripts from ICESat-2 Hackweek repos and ICESat-2 science team to build on

broken link

The link "ICESat-2_DAAC_DataAccess_Example" in README.rst (points to https://github.com/icesat2py/icepyx/blob/master/ICESat-2_DAAC_DataAccess_Example.ipynb) is broken.

spatial_subsetting_vis.ipynb

Hi there,
Im trying to get elevation data for a small polygon over all time available.

I've followed through spatial_subsetting_vis.ipynb and i'm wondering how you got to
In [14] temp_gdf.head()

ATL08_to_dict() gives me an empty list, not sure if that's bugs or my data. I just copied the dataset_dict from the tutorial.

Thanks very much! v useful module and notebooks

interacting with the file system

Currently, examples and code us os.path to interact with the file system. We'd like to transition to using pathlib instead. We're looking for someone interested in helping work with us on making this transition.

nodename nor servname provided, or not known?

Hello!

The following code works on the Hackweek Pangeo but not on my personal computer.

import os
import shutil
import h5py
import matplotlib.pyplot as plt
from icepyx import icesat2data as ipd

short_name = 'ATL06'
spatial_extent = [31.5, -70.56, 33.73, -69.29]
date_range = ['2020-03-30','2020-04-1']
region_a = ipd.Icesat2Data(short_name, spatial_extent, date_range)

earthdata_uid = 'therealbradlipovsky'
email = '[email protected]'
region_a.earthdata_login(earthdata_uid, email)
region_a.download_granules('/home/jovyan/data')

When run on my personal computer, I get the following error. Am I doing something incorrectly?

Thanks!

Brad

---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
~/Google Drive/Science/icepyx/icepyx/core/Earthdata.py in login(self)
    106             self.uid,_,self.pswd = netrc.netrc(self.netrc).authenticators(url)
--> 107             session = self._start_session()
    108 

~/Google Drive/Science/icepyx/icepyx/core/Earthdata.py in _start_session(self)
     50         hostname = socket.gethostname()
---> 51         ip = socket.gethostbyname(hostname)
     52 

gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

gaierror                                  Traceback (most recent call last)
<ipython-input-13-0400ac4c047b> in <module>
     12 earthdata_uid = 'therealbradlipovsky'
     13 email = '[email protected]'
---> 14 region_a.earthdata_login(earthdata_uid, email)
     15 region_a.download_granules('/home/jovyan/data')

~/Google Drive/Science/icepyx/icepyx/core/icesat2data.py in earthdata_login(self, uid, email)
    572 
    573         capability_url = f'https://n5eil02u.ecs.nsidc.org/egi/capabilities/{self.dataset}.{self._version}.xml'
--> 574         self._session = Earthdata(uid,email,capability_url).login()
    575         self._email = email
    576 

~/Google Drive/Science/icepyx/icepyx/core/Earthdata.py in login(self)
    111             for i in range(5):
    112                 try:
--> 113                     session = self._start_session()
    114                     break
    115                 except KeyError:

~/Google Drive/Science/icepyx/icepyx/core/Earthdata.py in _start_session(self)
     49         token_api_url = 'https://cmr.earthdata.nasa.gov/legacy-services/rest/tokens'
     50         hostname = socket.gethostname()
---> 51         ip = socket.gethostbyname(hostname)
     52 
     53         data = {'token': {'username': self.uid, 'password': self.pswd,\

gaierror: [Errno 8] nodename nor servname provided, or not known

update and expand our automated testing functions

Functionality is rapidly being added to icepyx. We're looking for a team to make sure our test suite is fully capturing that functionality. No prior experience with automatic testing needed!

subsetting - removal of too many variable paths

When using order_vars.remove() and specifying a beam_list, keyword_list, and var_list, matching beams are removed from non-specified variables. See Ex 5 of the ICESat-2 Hackweek data-access subsetting example notebook. This issue only seems to apply to datasets characterized by beam, not those sorted into profiles.

polygon coordinate inputs in incorrect order

The NSIDC-API requires polygon vertices to be entered in a clockwise direction and contain only one pole. Currently, data objects can be created with polygons that don't meet these criteria, depending on how the polygon is input by the user. We need to add valid polygon checks and reordering capabilities, then add tests to make sure these new functions are performing correctly.

Allow installing with pip

Installing icepyx as a Python package will facilitate things for users. Currently this command to install from the master branch works, but dependencies are note listed so importing in a new environment would fail.
pip install git+https://github.com/icesat2py/icepyx.git

>>> import icepyx
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/scott/miniconda3/envs/icepyx/lib/python3.7/site-packages/icepyx/__init__.py", line 1, in <module>
    from .core import is2class
  File "/Users/scott/miniconda3/envs/icepyx/lib/python3.7/site-packages/icepyx/core/is2class.py", line 1, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

@JessicaS11 - I could follow up with a PR to fix this if you'd like

Additional steps needed for Jupyter notebook PYTHONPATH

Following the install guidance: "Provided the location of the repo is part of your $PYTHONPATH, you should simply be able to add import icepyx to your Python document" I ran into complications when trying to run ICESat-2_DAAC_DataAccess_Example.ipynb from my fork. I set the repo location via command line (Mac OSX):
open .bash_profile
export PYTHONPATH="${PYTHONPATH}:/path/to/repo"

And I successfully imported icepyx in python in a terminal window.

However once I loaded up Jupyter Lab and tried running the notebook, the install failed. Following this guidance I ended up creating an ipython config file:

$ ipython locate
`/Users/username/.ipython`
$ cd /Users/username/.ipython
$ vi profile_default/ipython_config.py
c.InteractiveShellApp.exec_lines = [
    'import sys; sys.path.append("/path/to/your/module")'
]

And this did the trick. I'm not sure if others have run into this problem or if this is the best approach but it may be worth updating the install instructions if this is a more common issue.

large files in git history

Hi @JessicaS11, can't help but notice that you added two Tandem X DEM files in aa6b5fb to the repository which are used for the 'ICESat-2_DEM_comparison_Colombia_working.ipynb' tutorial. However, they're rather big files (25.7MB and 8.65MB), and git isn't usually meant for storing data files like this, so I just wanted to make sure (before you make any new commits) that this wasn't accidental/intentional?

Originally posted by @weiji14 in #95 (comment)

Thanks for pointing this out, @weiji14. I think these files have been present for awhile, but they were moved in the commit you point out. In looking into it, there are a few hdf5 files that are much larger that were accidentally uploaded early on and obviously now show up in the git history too. From my experience, the only way to get rid of them is to rewrite the git history. While I think that would ultimately be a good move (maybe we could even clean up the early commit history in the process), I'm hesitant to do so because of the downstream effects if every user does not rebase and remove them from their history, risking reintroduction (I had to remove some large files from another repo, and this was a challenge with a much smaller group). Or if I force change the master and development branches would we be able to catch this at the PR level if someone submits a PR with a branch that contains the large files (though presumably this would require an extra reviewer step to check the git history of the submitted PR)? Do you know if there are any alternative options or other ways to proceed given the group nature of the project (and entry-level git usage)?

Contributor Guidelines

As our list of contributors is starting to grow, we wanted to take this opportunity to start soliciting feedback from the community on a set of contributor guidelines. I am currently putting together a draft document, which will soon be available for comments and edits in a pull request. In the interim, I wanted to provide a space for people to share their thoughts as I put the draft guidelines together.

An intake catalog for ICESat-2 ATLAS data

Just continuing the discussion from the recent Icepyx meeting where there was an interest on Intake catalogs. To make things more concrete on what an 'intake' catalog is, here's a link to an example for ATL06 data I've made, and a short snippet.

https://github.com/weiji14/deepicedrain/blob/17067ad3cb2c8165d2ea5af4dba3fd827da9e6ae/deepicedrain/atlas_catalog.yaml#L46-L105

metadata:
  description: 'An intake catalog for loading ICESat-2 ATLAS datasets'
  version: 1
sources:
  icesat2atl06:
    description: 'Reads in ICESat-2 ATL06 data into an xarray.Dataset, one date and one laser pair at a time'
    args:
      chunks:
        delta_time: 50000
      path_as_pattern: ATL06_{datetime:%Y%m%d%H%M%S}_{referencegroundtrack:4}{cyclenumber:2}{orbitalsegment:2}_{version:3}_{revision:2}.h5
      # urlpath: https://n5eil02u.ecs.nsidc.org/opendap/hyrax/ATLAS/ATL06.003/{{date.strftime("%Y.%m.%d")}}/ATL06_*_*{{orbitalsegment}}_003_01.h5
      urlpath: simplecache::https://n5eil01u.ecs.nsidc.org/ATLAS/ATL06.00{{version}}/{{date.strftime("%Y.%m.%d")}}/ATL06_*_{{referencegroundtrack}}*{{orbitalsegment}}_00{{version}}_01.h5
      xarray_kwargs:
        combine: by_coords
        engine: h5netcdf
        group: /{{laser}}/land_ice_segments
        mask_and_scale: true
        parallel: true
      storage_options:
        simplecache:
          cache_storage: ATL06.00{{version}}/{{date.strftime("%Y.%m.%d")}}
          same_names: True
    # https://intake.readthedocs.io/en/latest/catalog.html#parameter-definition
    parameters:
        date:
          description: Year, month, and day of data acquisition
          type: datetime
          default: 2020.03.06
          min: 2018.10.14
          max: 2020.04.04  # note missing 2018.12.09, and gap from 2019.06.27 to 2019.07.25 (inclusive)
        referencegroundtrack:
          description: ICESat-2 Reference Ground Track number
          type: str
          default: ""  # Default: "" (all), min: "0000", max: "1387"
        orbitalsegment:
          description: Orbital Segment
          type: str
          default: 11
          allowed: [10, 11, 12]
        laser:
          description: Name of laser
          type: str
          default: gt2l
          allowed: ["gt1l", "gt1r", "gt2l", "gt2r", "gt3l", "gt3r"]
        version:
          description: Version number
          type: int
          default: 3
          allowed: [1, 2, 3]
    driver: intake_xarray.netcdf.NetCDFSource

The code to read/plot data would look like this:

import intake

catalog = intake.open_catalog("atlas_catalog.yaml")
dataset = catalog.icesat2atl06.read()  # load data into xarray
catalog.icesat2atl06.hvplot.quickview()  # interactive hvplot

There's a couple of 'features':

Parsing the datetime, cycle number, reference ground track, orbital segment, etc directly from the filename (see also https://www.anaconda.com/blog/intake-parsing-data-from-filenames-and-paths)
Data is pulled from NSIDC directly if not available locally (requires netrc authentication) using local caching (see https://intake.readthedocs.io/en/latest/catalog.html#caching-source-files-locally)
Predefined intreactive plots with hvplot (see https://intake.readthedocs.io/en/latest/gui.html#plot)

Current issues to be ironed out:

Ideally we would just connect to OPeNDAP so users won't have to 'download' the data before they can view the metadata, but authentication is tricky (see pydap/pydap#188).
Can only read in one laser at a time, but this can be addressed by extending xarray or writing our own custom ATLxx reader (happy to submit a patch)
Alternatively, if we can download the ATL06 data and put everything into a flat HDF5 structure, this would work seamlessly

Note that intake is actively being developed at https://github.com/intake/intake, and there's a rich plugin ecosystem at https://intake.readthedocs.io/en/latest/plugin-directory.html to handle just about any filetype. I've been working with an 'out-of-the-box' solution so far, but it would be great to have an ICESat-2 specific intake plugin! Happy to answer and questions or listen to suggestions.

Provide guidance and improve usability of search and subset request over poles

From Hackweek tutorial feedback, it was suggested to add clarification on how to request data over the poles, either in helper notebooks or within icepyx itself, including guidance on the input coordinate order for icepyx/nsidc bounding_box. David Shean suggested adding support for something like the standard bounds object (or other bounding box object) in shapely/geopandas/rasterio? Or a dictionary object {‘minx’:-102, ‘miny’:-76…})

Automatic retrieval of multiple regions

the problem

The current implementation of icepyx is such that we need to access different regions independently, login into Earthdata, and then download individual h5 files for each request. Even when it is possible to use a for loop to make all these requests (see https://github.com/ICESAT-2HackWeek/data-access/blob/master/ICESat-2Hackweek_tutorial_locations.ipynb), this has the difficulties of (i) having to login into Earthdata for each individual request, (ii) having to download each request in individuals h5 files.

a sketch of a solution

We (@fperez, @lheagy, @espg, @tsnow03, @mrsiegfried, @alicecima, @jonathan-taylor) think that there is a way to partially bypass (i) if we store our credentials, but even so, we have to make multiple calls to NSIDC, which is time-consuming, and it does not solve (ii). Then, the bottleneck appears to be at the NSIDC API level (@asteiker May have some ideas here?), not just in the icepyx code.

A different workflow could be something like this:

import icepyx
icepyx.login(email, password)
request_list = []
for lat, lon, dat:
    request_list.append(
        icepyx.request(polygon(lat, lon, date))
    )
# this should do some smart parsing - figuring out 
# which files have common data 
data = icepyx.request(request_list)
# first loop: metadata query 
#   loop over and figure out which h5 files are needed
# then only request needed files

Here, we download the required h5 files with one single call to NSIDC and this is implemented efficiently such that different regions could be stored in the same h5 file. This will be an important contribution to the case where we want to take a look at ATL03 data in many different localized regions in a large area without having to retrieve the full dataset for the large area.

I am aware that there are many challenges in solving this problem, but it could be a great contribution to icepyx and I am happy to help on this front.

Surface login/service availability errors immediately to the user if they happen

During the 2020 hack week it seems there was a transient error with NSIDC access, but unfortunately icepyx didn't surface that info right at login time, when it probably received the error message.

Instead, the error info was stored and upon later running a regular line of code, such as

region_a.order_vars.avail(options=True)

the following came back:

After inspecting that stack frame interactively (with %debug), I saw this text was stored internally in the icepyx Data object:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Unavailable</title>
</head><body>
<h1>Service Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
</body></html>

My suspicion is that 503 code may have come in at some point but wasn't surfaced. Not sure?

I'm happy to try and help further track/debug this one...

interact with NSIDC API without paging

Paging is currently used to parse through granule search results, but it is not the recommended way to interact with the API. The Icesat2Data object and its methods should be modified to not use paging.

fix duplicate and ambiguous variable names

some attributes (e.g. source and _source) have different functionalities. These should be made consistent. Relatedly, files is an ambiguous variable name.

make links work on both RTD and GitHub

Automatic links can only work within RTD OR GitHub, but not both simultaneously. Hard links won't update automatically as new commits are added to the repo. A potential solution is suggested in this issue post.

installation issue

When attempting to install icepyx locally, I experience the following error,

! cd ..; [ -d icepyx ] || git clone https://www.github.com/icesat2py/icepyx.git
sys.path.append(os.path.join(os.getcwd(), '..'))
from icepyx import icesat2data as ipd

The download goes fine,

Cloning into 'icepyx'...
warning: redirecting to https://github.com/icesat2py/icepyx.git/
remote: Enumerating objects: 15, done.
remote: Counting objects: 100% (15/15), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 2214 (delta 3), reused 12 (delta 2), pack-reused 2199
Receiving objects: 100% (2214/2214), 11.02 MiB | 14.83 MiB/s, done.
Resolving deltas: 100% (1253/1253), done.

But then when I go to import,

ImportError Traceback (most recent call last)
in
1 get_ipython().system(' cd ..; [ -d icepyx ] || git clone https://www.github.com/icesat2py/icepyx.git')
2 sys.path.append(os.path.join(os.getcwd(), '..'))
----> 3 from icepyx import icesat2data as ipd

ImportError: cannot import name 'icesat2data'

downloaded file unzipping and sorting

Data files downloaded from NSIDC come in a series of directories and subdirectories with long names. Amy's tutorial included some code for managing this structure and simplifying it to put all of the data files in one place. This code has been copied into the icepyx class object, but is not yet implemented (currently commented out). It would be great if this functionality were available, either as a step in the download process (where it currently is) or as a standalone method for the data class (probably the more flexible option).

functions to pass data to other formats

One of icepyx's primary goals is to make it easier to work with ICESat-2 data. The present class object data structure aims to facilitate switching between multiple data structures to capitalize on other resources for data analysis. Thus, we need a set of functions for the class (e.g. to_dataframe, to_geodataframe, to_dict, to_xarray, to_netcdf, to_hdf5) that enable the user to easily put their data in these formats for further analysis. How this proceeds is part of ongoing conversations about development directions, so please contribute to the conversation here or on Discourse even if you don't think you'll actually work on this issue directly.

speed up Travis builds

How can we speed up our test builds, e.g. through parallelization and caching?

automatic documentation updates

We have [mostly] successfully pushed our documentation to be live on read the docs. Currently, any changes to the docs-testing branch of icepyx automatically result in a rebuilding of the docs; as we move forward we'll adjust this to track the master branch. However, we're still troubleshooting a few issues:

Whether or not the best route for getting the _icepyx directory files on the repo initially is simply by committing them after generating them with a local sphinx build
How to auto build/update the _icepyx directory files when changes are made so nobody has to manually update them.
I'm new to using these documentation tools, so any help is appreciated!

add upgrade instructions to docs

We have install instructions, but nothing for updating. We should add update info (or at least links) to the install instructions and include a link to the install/update instructions from the ChangeLog page.

have travis CI build badge correspond to each repo

Currently the build badge shown corresponds to the master branch for all branches (see the link in the readme). It would be ideal if the badge reflected the build status for each branch, which requires a dynamic, rather than static, link in the readme.

create class object from local data

Currently icepyx relies completely on user input to generate a class object, which is then used to query, order, and download data. We'd like to expand the acceptable inputs to the class to include locally available data files (either through a saved object from the previous steps or by parsing the available files as indicated by the user).

handling multiple files with different datasets/variables

Stemming from some of the discussion in #72, a key consideration is what will an icepyx.data object hold and what operations can be done on those files. A few key questions:

can a single data object "contain"/handle files for multiple datasets?
Can a single data object hold files that contain different sets of variables (even if they're all the same dataset)?
What information, if any, do we expect the object to be able to provide about those datasets, particularly if it was not created from an icepyx.query object (i.e. will it in any way iterate through the data files (names or metadata) and provide summary spatial, temporal, etc. information)?

All of these will need to be answered before we can fully implement any sort of data management and manipulation scheme, including more advanced filtering, input/output, and file reading.

Add quick plots from OpenAltimetry

During 2020 ICESat-2 Hackweek we explored the potential of reading unfiltered ICESat-2 directly from OpenAltimetry into memory, without downloading files onto local disk. This can be used for quickly reading and visualizing data prior to download from NSIDC. We think it would be useful to embed the OpenAltimetry API functions developed during the hackweek into Icepyx for built-in visualization and quick data query.

Link to example notebook:

https://github.com/ICESAT-2HackWeek/Assimilation/blob/master/contributors/icetianli/READ_ATL06.ipynb
-https://github.com/ICESAT-2HackWeek/Assimilation/blob/master/notebooks/Assimilation_presentation.ipynb

Link to preliminary code

https://github.com/ICESAT-2HackWeek/Assimilation/blob/master/simlib/icesatapi.py#L44

Feature request: Subset and download data by rgt

For some processing, it would be nice to be able to download all repeats of a single or a few rgts, without having to download anything else.

Handling multiple data order/download in the same directory

This might overwrite the orderIDs file and the id_download file and have conflicts. It kills the kernel directly in some cases. It will be useful to generate unique filenames for each order_granules call and give the users a key (something similar to hash maybe) to generate these filenames for restarting download.

standardize import statements

set up icepyx to use ipx abbreviation and alias imports for things like query.Query

structuring of docs (documentation) directory

@norlandrhagen and I are working on getting the documentation ready to put on readthedocs. We've come up with a few different ways to structure our documentation based on a number of other repos (including Pandas and altair). I'd like to propose the following directory structure and encourage thoughts and comments. A few of our conversation points that got us here are included as well.
|---doc
| |---examples (these are at a high level within docs to make it as easy as possible for new users to find them)
| | |---NSIDC.ipynb
| |---source
| | |---_icepyx
| | | |--- *.rst
| | |--- _static
| | |--- getting_started (formerly "usage")
| | | |--- install.rst
| | |--- user_guide
| | | |--- is2class.rst
| | |--- conf.py
| | |--- icepyx.rst
| | |--- index.rst
| | |--- development
| | | |--- cont_contact.rst (draft file in norlandrhagen/icepyx docs branch)
| |--- makefile
| |--- make.bat

front facing docs page

Turn the front-facing documentation page into a one-stop-shop of information about icepyx (both on GitHub and ReadTheDocs). @fspaolo suggested Jekyll's page as an example - everything is captured simply and easily without scrolling. We'd like to implement something analogous for icepyx - all the resources easily accessible in the first place people see!

create widget for more interactive variable subsetting

Variable subsetting is fully implemented, but the long list of available variables is cumbersome to interact with (especially for new ICESat-2 data users). JupyterHub enables the creation of widgets, providing users a more interactive interface. This issue marks our desire to create a variable selection widget within icepyx to make the process of variable selection more interactive.

local file operations for subsetting and file conversion

adding another layer of in-memory bytesIO objects for performing local operations on subsetted files from icepyx

subsetting to valid data points using provided or calculated quality flags
converting to different file formats not available from NSIDC (such as zarr)

basic addition to https://github.com/icesat2py/icepyx/blob/master/icepyx/core/granules.py#L390 will be like this (with the file operations coming after):

for zfile in z.filelist:
    # Remove the subfolder name from the filepath
    zfile.filename = os.path.basename(zfile.filename)
    fileID = io.BytesIO(z.read(zfile))
    fileID.seek(0)
    # open in-memory HDF5 file and perform operations
    with h5py.File(fileID,'r') as source: