hapi-server / servers Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 4.0 3.77 MB

Catalogs of known HAPI servers

Shell 100.00%

servers's People

Contributors

Watchers

Forkers

barsha-rani candeynasa smithara joncook

servers's Issues

Move index files to different repository

This repository will become very large because of the history of changes to the metadata. I think it would be better to store this information in a new repo named server-metadata.

Alternatively, we could not put the metadata in a repo and just push it to a directory visible from, say hapi-server.org/server-metadata. The root dir would contain the latest metadata and there would be subdirs of YYYY-MM-DD for old metadata.

Improve server catalog access

Hello,

I would like to use hapi-client in our project SciQLop/spwc to add more servers and maybe deprecate our direct REST API usage.
SPWC is for us an abstraction to access remote data with efficient multi layer caching system, it should also provide all the information to build a tree for each server like here.
I'm not sure if this is a client or server limitation or I missed something, but I see no way to generate such tree. First the fact that we need to do multiple requests per server to list datasets/parameters/parameters_desc makes it quite inefficient. Then it seems to miss information such as spacecraft/observatory name.

That would be nice to have the same level of information as we get here for AMDA:
http://amda.irap.omp.eu/AMDA//data/WSRESULT/obsdatatree_impex_20201118_AmdaLocalDataBaseParameters.xml

aliveness testing when no sampleStartDate is provided

Last summer a student at APL wrote a script which would test HAPI servers for aliveness. This
was to replace the ping test I would do with a test which would dig a little deeper into each server but still complete the test within a couple of minutes. The test then would be suitable to run hourly.

I've had a heck of a time dealing with servers which do not provide a sampleStartDate and sampleStopDate. There's really no rules about the startDate and stopDate, which are required, and often requesting the last day of the interval does not return any data. The script would then enlarge the interval and test again, repeating this several times before giving up and declaring the server broken. Because of this I've been unable to reliably run the test, and first I resorted to using fixed hashes (repeating the randomly picked dataset), but even this has been shown to be unreliable.

Switch to regular Python for indexing

See code at https://github.com/hapi-server/tutorial-python/blob/main/HAPI_04.ipynb

Indexing should be made into two files

Bob and I agreed that the indexing should result in two files. The first is the list of datasets, units, and labels, and then the second would be just the coverage reported. This allows a bit more precision in glancing for changes, where a dataset growing doesn't obscure a change caused by a new dataset being published. For example here's the two files for lisird:

http_lasp.colorado.edu_lisird_hapi.json			      
http_lasp.colorado.edu_lisird_hapi_coverage.json

This is implemented and will run for the first time automatically tomorrow (Sun Jan 15).

Server scanning gums up CDAWeb server

The cron job which scans the HAPI servers for their info responses and compiles the data into searchable JSON files (https://github.com/hapi-server/servers/blob/master/index/makeGiantCatalog.jy) causes problems with the CDAWeb server. I caused problems for them yesterday when running it twice while I was making changes, and without any pauses in between calls. Bernie suggested any trivial pause is not going to fix the problem.

I've disabled this scan for now.

Get nominal cadence

Script that builds index for all servers should also get nominal cadence for datasets that don't have a cadence in metadata.

consider all.txt and dev.txt should be rewritten as json files with json schemas

We should consider introducing all.json and dev.json which would replace all.txt and dev.txt. It seems strange that we didn't use json since it's self-documenting and supports schemas.

HAPI crawler

Bob and I were thinking it would be nice to have some way of aggregating all the labels for datasets in HAPI servers, which clients could use to locate data. I could write an Autoplot script to go through the known servers, which would create a JSON file containing the dataset identifiers and descriptions. This file would then be posted here, when changes are detected, so that we could see the history and evolution of the servers, and so there's a known place where clients can search for parameters. Bob's use case was DST, which might be "dst" on one server and "DST" on another, and it's not even clear who is hosting it.

servers index does not appear to be running

The process which indexes the servers appears to be failing, and has been for maybe nine months. It looks like it fails on one of the servers and the entire process is halted. I put in what I think is a fix, and also printing which server is being indexed, since logging doesn't seem to indicate this.

hapi-server / servers Goto Github PK

servers's People

Contributors

Watchers

Forkers

servers's Issues

Move index files to different repository

Improve server catalog access

aliveness testing when no sampleStartDate is provided

Switch to regular Python for indexing

Indexing should be made into two files

Server scanning gums up CDAWeb server

Get nominal cadence

consider all.txt and dev.txt should be rewritten as json files with json schemas

HAPI crawler

servers index does not appear to be running

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent