Git Product home page Git Product logo

sondehub-infra's People

Contributors

darksidelemm avatar lukeprior avatar theskorm avatar xssfox avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sondehub-infra's Issues

Launch site API not in swagger

The API URL is: https://api.v2.sondehub.org/sites

Response type is gzipped JSON:

{
  "-1": {
    "descent_std": 2.5,
    "rs_types": [
      "41"
    ],
    "@version": "1",
    "burst_samples": 52,
    "station": "-1",
    "position": [
      144.947375,
      -37.689883
    ],
    "descent_rate": 5.7,
    "@timestamp": "2021-11-23T22:08:57.372Z",
    "station_name": "Melbourne BoM Training Annex (Training and Ozonesondes) (Australia)",
    "alt": 119,
    "burst_std": 5382,
    "descent_samples": 48,
    "burst_altitude": 28068
  },
  etc...
}

Allow multiple recovery successful reports

There have been a few cases of either mistaken, or malicious (probably not) recovery reporting of sondes before they have actually landed.
At the moment it is not possible to re-mark an already recovered sonde as recovered, so this means the real recovery report cannot be submitted.

I would suggest removing any limits on marking sondes as recovered/not-recovered - the client for the /recovered API should be responsible for picking out the latest entry.

Work out how to score top_hits better

We should be able to use function_score to better calculate which payload to show. Ideally this is the latest payload, followed by one with xdata, followed by pressure, then humidity

Scaling of websockets

Our websocket service doesn't scale with demand. We should be able to have writer server -> reader server(s)

Recovered endpoint

it gives an immediate indication to people looking at the map that a sonde was recovered, and that they shouldn't go after something that might have landed a few hours ago

as a user
I want to know if a sonde has been recovered
so that
I stop chasing the radiosonde

  • Create API spec for recovered endpoints
    • GET /recovered
    • GET /recovered/{serial}
    • PUT /recovered/{serial
  • Update swagger, and swagger markdown
  • Add ingestion endpoint
  • Add output into datanew endpoint
        {
          "serial": "S1234567",
          "lat": -34.0,
          "lon": 138.0,
          "alt": 100.0,
          "recovered": true,
          "recovered_by": "VK5QI",
          "description": "In a gigantic tree. But I had a pole."
        },
        {
            "serial": "S1112234",
            "lat": -34.1,
            "lon": 138.1,
            "alt": 100.0,
            "recovered": false,
            "recovered_by": "VK5FAIL",
            "description": "In a gigantic tree. But I didn't have a pole."
          },

Only permit uploads if the serial has been seen and the recovered postition is within 5km of the last seen data point

Record sonde predictions

Actively run predictions and store in ElasticSearch so that we can display them on historic radiosonde data.

/listener/stats changes

Can we get total unique callsign and total telemetry count fields added to the API so that I can make badges for them.

Add block-list for chase car uploads.

We're seeing a lot of chase car position uploads from callsign 'CHANGEME_RDZTTGO', which are all from misconfigured rdz_ttgo_sonde devices. These jump around the map and just cause annoyance.

It would be useful if we had a block-list somewhere for chase-car callsigns. Something similar for station position and telemetry uploads would probably not be a bad idea either.

Uploaders Statistics

We can generate graphs showing the number and breakdown of receivers using the Visualise tool in ElasticSearch however it can be hard to make these publically accessible. The solution to this would be to create an API that returns the necessary data to generate these graphs.

The following ElasticSearch requests gets the last 7 days of telemetry data (~44 million packets) and returns the software names and versions according to the unique number of callsigns for each. The response also includes the raw number of packets which match each software and version.

{
  "aggs": {
    "2": {
      "terms": {
        "field": "software_name.keyword",
        "order": {
          "1": "desc"
        },
        "size": 10
      },
      "aggs": {
        "1": {
          "cardinality": {
            "field": "uploader_callsign.keyword"
          }
        },
        "3": {
          "terms": {
            "field": "software_version.keyword",
            "order": {
              "1": "desc"
            },
            "size": 10
          },
          "aggs": {
            "1": {
              "cardinality": {
                "field": "uploader_callsign.keyword"
              }
            }
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "datetime",
      "format": "date_time"
    },
    {
      "field": "time_received",
      "format": "date_time"
    },
    {
      "field": "time_server",
      "format": "date_time"
    }
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "datetime": {
              "gte": "2022-01-05T07:50:52.795Z",
              "lte": "2022-01-12T07:50:52.795Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

I'm not sure if this can easily be made into an API, it shouldn't require any input fields.

This is the ElasticSearch response from the above request:

{
  "took": 7063,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 44772779,
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "2": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "1": {
            "value": 541
          },
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1053782,
            "buckets": [
              {
                "1": {
                  "value": 312
                },
                "key": "1.5.8",
                "doc_count": 23576665
              },
              {
                "1": {
                  "value": 91
                },
                "key": "1.5.7",
                "doc_count": 4688112
              },
              {
                "1": {
                  "value": 53
                },
                "key": "1.5.6",
                "doc_count": 3088206
              },
              {
                "1": {
                  "value": 28
                },
                "key": "1.5.5",
                "doc_count": 1642882
              },
              {
                "1": {
                  "value": 22
                },
                "key": "1.5.3",
                "doc_count": 1510429
              },
              {
                "1": {
                  "value": 13
                },
                "key": "1.5.1",
                "doc_count": 909441
              },
              {
                "1": {
                  "value": 13
                },
                "key": "1.5.4",
                "doc_count": 435886
              },
              {
                "1": {
                  "value": 11
                },
                "key": "1.5.2",
                "doc_count": 608260
              },
              {
                "1": {
                  "value": 5
                },
                "key": "1.5.0",
                "doc_count": 390165
              },
              {
                "1": {
                  "value": 3
                },
                "key": "1.5.8-beta2",
                "doc_count": 273166
              }
            ]
          },
          "key": "radiosonde_auto_rx",
          "doc_count": 38176994
        },
        {
          "1": {
            "value": 147
          },
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 98
                },
                "key": "devel20211101",
                "doc_count": 4381623
              },
              {
                "1": {
                  "value": 36
                },
                "key": "master_v0.9.0",
                "doc_count": 1211087
              },
              {
                "1": {
                  "value": 4
                },
                "key": "devel20211003",
                "doc_count": 101988
              },
              {
                "1": {
                  "value": 3
                },
                "key": "devel20211005",
                "doc_count": 188027
              },
              {
                "1": {
                  "value": 3
                },
                "key": "devel20211010",
                "doc_count": 91851
              },
              {
                "1": {
                  "value": 2
                },
                "key": "devel20211013",
                "doc_count": 137199
              },
              {
                "1": {
                  "value": 2
                },
                "key": "devel20211016",
                "doc_count": 110658
              },
              {
                "1": {
                  "value": 2
                },
                "key": "devel20211023",
                "doc_count": 55280
              },
              {
                "1": {
                  "value": 1
                },
                "key": "devel20210913",
                "doc_count": 137183
              },
              {
                "1": {
                  "value": 1
                },
                "key": "devel20210914",
                "doc_count": 68819
              }
            ]
          },
          "key": "rdzTTGOsonde",
          "doc_count": 6483715
        },
        {
          "1": {
            "value": 8
          },
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 6
                },
                "key": "6.2.5.8",
                "doc_count": 44004
              },
              {
                "1": {
                  "value": 1
                },
                "key": "6.2.4.7",
                "doc_count": 66646
              },
              {
                "1": {
                  "value": 1
                },
                "key": "6.2.5.3",
                "doc_count": 1420
              }
            ]
          },
          "key": "SondeMonitor",
          "doc_count": 112070
        }
      ]
    }
  }
}

The API would need to return the following information:

{
  "time_generated": UTC,
  "total_packets": 44772779,
  "data": {
    "radiosonde_auto_rx": {"count": 38176994, "unique_count": 541, "versions": {"1.5.8":{"count": 23576665, "unique_count": 312}, etc}},
    "rdzTTGOsonde": {},
    "SondeMonitor": {}
  }
}

The ElasticSearch query takes about 7000ms so I would suggest caching it heavily using Cloudfront.

API data using wget or curl

under linux, trying to retrieve sounding telemetry using wget or curl. It creates a file,
but its all hex. Doing it in a browser with the wget on the test page is fine. What wget or curl options do I need for command line in Ubuntu?

Ray

Chase car support

Add to the datanew endpoint as normal sondes but with _chase after it

API's "Sonde Serial" vs. radiosonde_auto_rx "Sonde ID"

Hi,

I'm quite sure it was discussed in the past, but could not find the answer:
What's the recommended way to convert the sonde "ID" value used in radiosonde_auto_rx packet summary to the "serial" key expceted by the sondehub API?
It seems that for some types they are the same (e.g. Vaisala), but for IMET, the ID also includes the type while the serial key shouldn't (so "IMET-A6C9113D" needs to be looked up as "A6C9113D").

Is there a generic conversion or is it actually type-specific? Maybe I can simply use whatever-after-the-last dash (if any) in the sonde ID as the API's serial?

Thanks.

Improve Listeners API

Stop stations with a lat/lon of 0 from being returned when using the GET /listeners API endpoint.
Allow the PUT /listener API to accept a payload without the uploader_position field.
Process and show packets in the backend where uploader_position is set to null.

Add launch site data to historic data

This needs to be added both to the API and the S3 exporter. We'll have to add it to the telementry format and meta data tags. Also considering a new S3 folder for specific launch sites.

Archive data from ES

  • Regularly check recent sonde launches
  • For each sonde launch dump all the data into an S3 bucket - serial/{serial}.json - make sure to download existing data first
  • Create a summary file with first, last and highest packets that is stored in date/{date}/{serial}.json

Notes:

  • ES queries need to be pagianted
  • Use SQS to queue - this will make backloading easier
  • Also might be worth adding some of the summary data to S3 as metadata as well!

pytests

A lot of the lambda functions have a main handler to test functionality, we should convert these to pytests

Uploads from rdz_ttgo_sonde not saving?

Hi, I have encountered a very weird bug recently.

Some frames uploaded from an rdz_ttgo_sonde station, are shown on sondehub only if at the time of upload you are in the site (which means it's sent by the websocket). but after reloading sondehub (requesting /telemetry), it's missing and not showing the recent frames uploaded by the rdz_ttgo_sonde.

I assume its a sondehub-infra issue and not a rdz_ttgo_sonde issue since the data is ending up at the API.

I saw this today after recovering a sonde that was last received by an auto_rx station at a high altitude, and after I came to the landing zone and used rdz_ttgo_sonde which uploaded frames of the sonde in the ground, it's missing after refreshing sondehub.

screenshot of sondehub that was opened when the frames were uploaded (showing the recent data by a rdz_ttgo_sonde station):
image
screenshot after refreshing sondehub (recent rdz_ttgo_sonde data missing):
image

I hope I was clear in describing the issue. Thanks.

/recovered API Changes Wishlist

Allow multiple serials as a list for input

  • The current API only allows a single serial as string, would be handy to make it accept a list of strings or a single string,
  • Will make it easier to implement future features on the tracker including visualising launches from a specific site.

Recovery Statistics Endpoint

  • The recovery statistics used to be generated by downloading all the recoveries every time the site was loaded but as this dataset grew it was no longer feasible.
  • It would be good to have a new endpoint that would return the information previously currently displayed so total number of recoveries (with counts for success and no success), total number of unique callsigns, and top 5 callsigns.
  • It would be extra nice if you could provide days or date range as an optional field if you want statistics to be generated for the last week, month, etc.

imet block

Block old versions of autorx for imets due to imet-1 issue
something like this - wait for autorx version from @darksidelemm

if imet:
   if autorx == software name:
       if version.parse(software_version) < version.parse(1.5.9):
            #add error message
            # drop payload

need to check if from packaging import version available on lambda

Ingest Recovery Data from Radiosondy.info?

We currently have a limited but growing number of users reporting recovery information directly to SondeHub via the tracker or Chasemapper. The tracker has recently been updated to handle increased amounts of recovery data so can easily scale. The Radiosondy.info database currently has about 40x the daily reported recoveries so incorporating this data could be highly beneficial.

I cannot find any public APIs to access the Radiosondy.info data so screen scraping will be required as demonstrated in this proof of concept Python script that can be set to run automatically: https://gist.github.com/LukePrior/e02b62f35eccddea096b141ffd871e4f

The two main barriers moving forwards are getting permission from SQ6KXY and adding checks to the tracker to limit the number of displayed recovery icons.

Add heading check

if "heading" in telemetry and telemetry["heading"] > 360:
        return (False,f"Heading {telemetry['heading']} is above 360")

amateur GET endpoints

Probably needs
a) summary like we have for normal sondes
b) decimated history like normal sondes
c) raw data like history

sondehubv1 to sondehubv2

Easiest way I think is to scale up both clusters, then use logstash with dead letter queues to migrate the data over night

check performance of endpoints

it's possible that 6hour, and 1 day queries to ES during peak time are creating too many buckets and causing an internal server error to return on both the legacy endpoints and the new endpoints

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.