projecthorus / sondehub-infra Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 4.0 538 KB

License: GNU General Public License v3.0

HCL 29.73% Python 70.27%

sondehub-infra's People

Contributors

Stargazers

Watchers

Forkers

maxzega gcjordi mrarm darksidelemm

sondehub-infra's Issues

Replicate v1 history endpoint in v2

We can probably just use the sondehub pypi package and lambda to grab it from the s3 bucket at the moment.

Launch site API not in swagger

The API URL is: https://api.v2.sondehub.org/sites

Response type is gzipped JSON:

{
  "-1": {
    "descent_std": 2.5,
    "rs_types": [
      "41"
    ],
    "@version": "1",
    "burst_samples": 52,
    "station": "-1",
    "position": [
      144.947375,
      -37.689883
    ],
    "descent_rate": 5.7,
    "@timestamp": "2021-11-23T22:08:57.372Z",
    "station_name": "Melbourne BoM Training Annex (Training and Ozonesondes) (Australia)",
    "alt": 119,
    "burst_std": 5382,
    "descent_samples": 48,
    "burst_altitude": 28068
  },
  etc...
}

Allow multiple recovery successful reports

There have been a few cases of either mistaken, or malicious (probably not) recovery reporting of sondes before they have actually landed.
At the moment it is not possible to re-mark an already recovered sonde as recovered, so this means the real recovery report cannot be submitted.

I would suggest removing any limits on marking sondes as recovered/not-recovered - the client for the /recovered API should be responsible for picking out the latest entry.

amateur listeners endpoint

https://github.com/projecthorus/horusdemodlib/blob/master/horusdemodlib/sondehubamateur.py#L336

Shutdown Sondehub v1

Need to redirect URLs

Add optional station frequency range to listener information uploads.

Suggest an optional frequency_range field, containing [min_freq, max_freq] in MHz.
This can be used on the SondeHub map to get some idea of what frequency coverage range each station can support. (e.g. 400 MHz, 1680 MHz).

Work out how to score top_hits better

We should be able to use function_score to better calculate which payload to show. Ideally this is the latest payload, followed by one with xdata, followed by pressure, then humidity

Scaling of websockets

Our websocket service doesn't scale with demand. We should be able to have writer server -> reader server(s)

Recovered endpoint

it gives an immediate indication to people looking at the map that a sonde was recovered, and that they shouldn't go after something that might have landed a few hours ago

as a user
I want to know if a sonde has been recovered
so that
I stop chasing the radiosonde

        {
          "serial": "S1234567",
          "lat": -34.0,
          "lon": 138.0,
          "alt": 100.0,
          "recovered": true,
          "recovered_by": "VK5QI",
          "description": "In a gigantic tree. But I had a pole."
        },
        {
            "serial": "S1112234",
            "lat": -34.1,
            "lon": 138.1,
            "alt": 100.0,
            "recovered": false,
            "recovered_by": "VK5FAIL",
            "description": "In a gigantic tree. But I didn't have a pole."
          },

Only permit uploads if the serial has been seen and the recovered postition is within 5km of the last seen data point

Record sonde predictions

Actively run predictions and store in ElasticSearch so that we can display them on historic radiosonde data.

/listener/stats changes

Can we get total unique callsign and total telemetry count fields added to the API so that I can make badges for them.

github action to push terraform fmt is failing with permission denied

Add block-list for chase car uploads.

We're seeing a lot of chase car position uploads from callsign 'CHANGEME_RDZTTGO', which are all from misconfigured rdz_ttgo_sonde devices. These jump around the map and just cause annoyance.

It would be useful if we had a block-list somewhere for chase-car callsigns. Something similar for station position and telemetry uploads would probably not be a bad idea either.

Uploaders Statistics

We can generate graphs showing the number and breakdown of receivers using the Visualise tool in ElasticSearch however it can be hard to make these publically accessible. The solution to this would be to create an API that returns the necessary data to generate these graphs.

The following ElasticSearch requests gets the last 7 days of telemetry data (~44 million packets) and returns the software names and versions according to the unique number of callsigns for each. The response also includes the raw number of packets which match each software and version.

{
  "aggs": {
    "2": {
      "terms": {
        "field": "software_name.keyword",
        "order": {
          "1": "desc"
        },
        "size": 10
      },
      "aggs": {
        "1": {
          "cardinality": {
            "field": "uploader_callsign.keyword"
          }
        },
        "3": {
          "terms": {
            "field": "software_version.keyword",
            "order": {
              "1": "desc"
            },
            "size": 10
          },
          "aggs": {
            "1": {
              "cardinality": {
                "field": "uploader_callsign.keyword"
              }
            }
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "datetime",
      "format": "date_time"
    },
    {
      "field": "time_received",
      "format": "date_time"
    },
    {
      "field": "time_server",
      "format": "date_time"
    }
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "datetime": {
              "gte": "2022-01-05T07:50:52.795Z",
              "lte": "2022-01-12T07:50:52.795Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

I'm not sure if this can easily be made into an API, it shouldn't require any input fields.

This is the ElasticSearch response from the above request:

{
  "took": 7063,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 44772779,
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "2": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "1": {
            "value": 541
          },
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1053782,
            "buckets": [
              {
                "1": {
                  "value": 312
                },
                "key": "1.5.8",
                "doc_count": 23576665
              },
              {
                "1": {
                  "value": 91
                },
                "key": "1.5.7",
                "doc_count": 4688112
              },
              {
                "1": {
                  "value": 53
                },
                "key": "1.5.6",
                "doc_count": 3088206
              },
              {
                "1": {
                  "value": 28
                },
                "key": "1.5.5",
                "doc_count": 1642882
              },
              {
                "1": {
                  "value": 22
                },
                "key": "1.5.3",
                "doc_count": 1510429
              },
              {
                "1": {
                  "value": 13
                },
                "key": "1.5.1",
                "doc_count": 909441
              },
              {
                "1": {
                  "value": 13
                },
                "key": "1.5.4",
                "doc_count": 435886
              },
              {
                "1": {
                  "value": 11
                },
                "key": "1.5.2",
                "doc_count": 608260
              },
              {
                "1": {
                  "value": 5
                },
                "key": "1.5.0",
                "doc_count": 390165
              },
              {
                "1": {
                  "value": 3
                },
                "key": "1.5.8-beta2",
                "doc_count": 273166
              }
            ]
          },
          "key": "radiosonde_auto_rx",
          "doc_count": 38176994
        },
        {
          "1": {
            "value": 147
          },
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 98
                },
                "key": "devel20211101",
                "doc_count": 4381623
              },
              {
                "1": {
                  "value": 36
                },
                "key": "master_v0.9.0",
                "doc_count": 1211087
              },
              {
                "1": {
                  "value": 4
                },
                "key": "devel20211003",
                "doc_count": 101988
              },
              {
                "1": {
                  "value": 3
                },
                "key": "devel20211005",
                "doc_count": 188027
              },
              {
                "1": {
                  "value": 3
                },
                "key": "devel20211010",
                "doc_count": 91851
              },
              {
                "1": {
                  "value": 2
                },
                "key": "devel20211013",
                "doc_count": 137199
              },
              {
                "1": {
                  "value": 2
                },
                "key": "devel20211016",
                "doc_count": 110658
              },
              {
                "1": {
                  "value": 2
                },
                "key": "devel20211023",
                "doc_count": 55280
              },
              {
                "1": {
                  "value": 1
                },
                "key": "devel20210913",
                "doc_count": 137183
              },
              {
                "1": {
                  "value": 1
                },
                "key": "devel20210914",
                "doc_count": 68819
              }
            ]
          },
          "key": "rdzTTGOsonde",
          "doc_count": 6483715
        },
        {
          "1": {
            "value": 8
          },
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 6
                },
                "key": "6.2.5.8",
                "doc_count": 44004
              },
              {
                "1": {
                  "value": 1
                },
                "key": "6.2.4.7",
                "doc_count": 66646
              },
              {
                "1": {
                  "value": 1
                },
                "key": "6.2.5.3",
                "doc_count": 1420
              }
            ]
          },
          "key": "SondeMonitor",
          "doc_count": 112070
        }
      ]
    }
  }
}

The API would need to return the following information:

{
  "time_generated": UTC,
  "total_packets": 44772779,
  "data": {
    "radiosonde_auto_rx": {"count": 38176994, "unique_count": 541, "versions": {"1.5.8":{"count": 23576665, "unique_count": 312}, etc}},
    "rdzTTGOsonde": {},
    "SondeMonitor": {}
  }
}

The ElasticSearch query takes about 7000ms so I would suggest caching it heavily using Cloudfront.

API data using wget or curl

under linux, trying to retrieve sounding telemetry using wget or curl. It creates a file,
but its all hex. Doing it in a browser with the wget on the test page is fine. What wget or curl options do I need for command line in Ubuntu?

Ray

Resources not in sondehub aws account that need to be migrated

Chase car support

Add to the datanew endpoint as normal sondes but with _chase after it

API's "Sonde Serial" vs. radiosonde_auto_rx "Sonde ID"

Hi,

I'm quite sure it was discussed in the past, but could not find the answer:
What's the recommended way to convert the sonde "ID" value used in radiosonde_auto_rx packet summary to the "serial" key expceted by the sondehub API?
It seems that for some types they are the same (e.g. Vaisala), but for IMET, the ID also includes the type while the serial key shouldn't (so "IMET-A6C9113D" needs to be looked up as "A6C9113D").

Is there a generic conversion or is it actually type-specific? Maybe I can simply use whatever-after-the-last dash (if any) in the sonde ID as the API's serial?

Thanks.

Improve Listeners API

Stop stations with a lat/lon of 0 from being returned when using the GET /listeners API endpoint.
Allow the PUT /listener API to accept a payload without the uploader_position field.
Process and show packets in the backend where uploader_position is set to null.

create build pipeline to deploy out artefacts

need a build pipeline for backend and frontend. github actions might be a good option for us, alternativly codepipeline in the sondehub aws account could work

Lambda function + SQS for ingestion data of amateur data into ELK

Missing "descending" field in Prediction Data?

Hi,

In the previous HubHab API (https://legacy-snus.habhub.org/tracker/get_predictions.php?vehicles=), the prediction data included an extremely useful "descending" field (obviously, the predication is much more accurate after the balloon bursts).
I don't see that field in the new API.

Will it be possible to add it? Or maybe it's already there and I've missed it somehow...?

Thanks.

Tool to reprocess DLQ

Add launch site to /sondes

This could be useful for ttgo devices

DLQ monitoring

Sign websocket should be a mock reply from api gateway rather than a lambda

Add launch site data to historic data

This needs to be added both to the API and the S3 exporter. We'll have to add it to the telementry format and meta data tags. Also considering a new S3 folder for specific launch sites.

Archive data from ES

Regularly check recent sonde launches
For each sonde launch dump all the data into an S3 bucket - serial/{serial}.json - make sure to download existing data first
Create a summary file with first, last and highest packets that is stored in date/{date}/{serial}.json

Notes:

ES queries need to be pagianted
Use SQS to queue - this will make backloading easier
Also might be worth adding some of the summary data to S3 as metadata as well!

Terraform profile should just use env variable

pytests

A lot of the lambda functions have a main handler to test functionality, we should convert these to pytests

Uploads from rdz_ttgo_sonde not saving?

Hi, I have encountered a very weird bug recently.

Some frames uploaded from an rdz_ttgo_sonde station, are shown on sondehub only if at the time of upload you are in the site (which means it's sent by the websocket). but after reloading sondehub (requesting /telemetry), it's missing and not showing the recent frames uploaded by the rdz_ttgo_sonde.

I assume its a sondehub-infra issue and not a rdz_ttgo_sonde issue since the data is ending up at the API.

I saw this today after recovering a sonde that was last received by an auto_rx station at a high altitude, and after I came to the landing zone and used rdz_ttgo_sonde which uploaded frames of the sonde in the ground, it's missing after refreshing sondehub.

screenshot of sondehub that was opened when the frames were uploaded (showing the recent data by a rdz_ttgo_sonde station):

screenshot after refreshing sondehub (recent rdz_ttgo_sonde data missing):

I hope I was clear in describing the issue. Thanks.

build pipeline for swagger

just copy to s3

/recovered API Changes Wishlist

Allow multiple serials as a list for input

The current API only allows a single serial as string, would be handy to make it accept a list of strings or a single string,
Will make it easier to implement future features on the tracker including visualising launches from a specific site.

Recovery Statistics Endpoint

The recovery statistics used to be generated by downloading all the recoveries every time the site was loaded but as this dataset grew it was no longer feasible.
It would be good to have a new endpoint that would return the information previously currently displayed so total number of recoveries (with counts for success and no success), total number of unique callsigns, and top 5 callsigns.
It would be extra nice if you could provide days or date range as an optional field if you want statistics to be generated for the last week, month, etc.

Add more testing on input side

imet block

Block old versions of autorx for imets due to imet-1 issue
something like this - wait for autorx version from @darksidelemm

if imet:
   if autorx == software name:
       if version.parse(software_version) < version.parse(1.5.9):
            #add error message
            # drop payload

need to check if from packaging import version available on lambda

APRS ingestion container -> SNS

Ingest Recovery Data from Radiosondy.info?

We currently have a limited but growing number of users reporting recovery information directly to SondeHub via the tracker or Chasemapper. The tracker has recently been updated to handle increased amounts of recovery data so can easily scale. The Radiosondy.info database currently has about 40x the daily reported recoveries so incorporating this data could be highly beneficial.

I cannot find any public APIs to access the Radiosondy.info data so screen scraping will be required as demonstrated in this proof of concept Python script that can be set to run automatically: https://gist.github.com/LukePrior/e02b62f35eccddea096b141ffd871e4f

The two main barriers moving forwards are getting permission from SQ6KXY and adding checks to the tracker to limit the number of displayed recovery icons.

Add heading check

if "heading" in telemetry and telemetry["heading"] > 360:
        return (False,f"Heading {telemetry['heading']} is above 360")