cmu-create-lab / esdr Goto Github PK

Environmental Sensor Data Repository (ESDR)

License: Other

JavaScript 92.48% CSS 3.89% HTML 1.10% Handlebars 2.33% SCSS 0.08% Shell 0.13%

esdr's Introduction

Environmental Sensor Data Repository (ESDR)

ESDR is an open source data repository intended for storing and retrieving time series environmental data. Basically, if the data has a timestamp and a value, ESDR can store it and provide ways to retrieve it quickly and securely. Data is stored in a custom, open source datastore which provides extremely fast data inserts and fetches, making for fast and responsive visualizations. The ESDR web site (esdr.cmucreatelab.org) provides a REST API interface to the datastore, making it easy to read/write data. Metadata is stored in a MySQL database.

ESDR is pronounced like the female name, Esther.

Concepts and Terminology

If you're familiar with Xively's API, you'll see a lot of parallels with ESDR. Our intention is to use ESDR as the data repository not only for our own products and visualizations, but also for anyone else who wants a place to store data and have tools to easily visualize it.

First, some terminology: ESDR has clients, users, products, devices, feeds, channels, and tiles. Understanding how these entities relate will give a good understanding of how the data and metadata are structured and how the system works.

Client: ESDR uses OAuth2 for authentication, so a client in ESDR is simply an OAuth2 client.
User: no real surprise here...simply a person who has registered with ESDR and may own one or more products, devices, or feeds. When a user logs in, he/she does so on behalf of an OAuth2 client.
Product: a product is simply a certain kind of sensor, for example, the Speck particle sensor.
Device: a particular instantiation of a product, i.e. an actual sensor device--something you can put your hands on--typically with a unique serial number.
Feed: a particular installation of a device. For example, if I buy a Speck and register it, the behind-the-scenes registration process creates for me both an ESDR device instance (with my Speck's serial number), as well as a feed, for the location I specified during registration. For example, let's say I purchase a Speck and install it under the awning on my deck. During registration, I would give the location a name (e.g. "Deck"), set the latitude/longitude to my house, mark the exposure to outdoors, set visibility (public or private), etc. Data recorded and uploaded by the Speck would be associated with that particular feed. If I then move the Speck to my kitchen, I would re-register the Speck so it is associated with a new feed, because the environment/location has changed. If I accidentally drop the Speck in a sink full of water and replace it with a new one, that new one would be registered as a new device (it has a different serial number), but, at my option, could be associated with the existing feed from the old Speck (so that I have one continuous stream of data since it's the environment being measured which matters most, not the actual device doing the measurement). Similarly, if I sell the Speck, the new owner would register it under her account, and get a new feed for it.
Channel: a sensor device measures one or more aspects of its environment, such as temperature, humidity, particle count, battery voltage, etc. Each is considered a different channel. A feed comprises one or more channels.
Tiles: data from a particular feed's channel can be retrieved from ESDR in small chunks of JSON which we call tiles. A tile contains at most 512 data points, and is associated with a particular starting timestamp and duration. For example, a tile could represent a summary of a decade's worth of data, or it could contain the actual recorded data samples spanning, say, only 1 second (e.g. heart rate data). The grapher we use fetches tiles as the user pans and zooms the timeline--it requests only the small subset of data it needs to render the plot. The most appropriate analogy is panning/zooming in Google Maps--the browser only requests map tiles for the current small region of the Earth you're exploring at the time. ESDR also has support for a multi-tile fetch where you can fetch data from multiple channels from multiple feeds with a single GET request. This is essential for being able to do visualizations of lots of sensors simultaneously, e.g. air quality in cities all over the country.

Again, the data samples themselves are all stored in the datastore. Data for the above entities is stored in a MySQL database. The big win with the datastore is that it works with billions of samples, doing time aggregation upon insert (and yet inserts are still fast), storing the data at a number of different summarization levels. Thus, it can return a summary of a year's worth (or more!) of data just as quickly as, say, five minutes worth. No summarization computation is required when fetching tiles, so visualizations remain responsive and fast at any zoom level.

We don't yet do spatiotemporal aggregation, but it's on the TODO list.

Please see the HOW TO document for more details on how to use ESDR.

Setup

Install the module dependencies:
```
 npm install
```
Install the BodyTrack Datastore by doing the following
1. Fetch the BodyTrack Datastore. In your terminal window, set your working directory to the root of the ESDR repository and do the following:
```
    git clone https://github.com/BodyTrack/datastore.git
```
2. Follow the BodyTrack Datastore's build and install instructions.
Install MySQL if necessary. ESDR was tested with and assumes MySQL 5.6 (there are known issues with 5.5).

Do the following to create the development MySQL database and user:

 CREATE DATABASE IF NOT EXISTS esdr_dev;
 GRANT ALL PRIVILEGES ON esdr_dev.* To 'esdr_dev'@'localhost' IDENTIFIED BY 'password';
 GRANT SELECT,INSERT,UPDATE,DELETE,CREATE ON esdr_dev.* TO 'esdr_dev'@'localhost';

If you choose to change the password, make sure it matches the password in config-dev.json.

If you want to be able to run the tests, do the following to create the test database and user:

 CREATE DATABASE IF NOT EXISTS esdr_test;
 GRANT ALL PRIVILEGES ON esdr_test.* To 'esdr_test'@'localhost' IDENTIFIED BY 'password';
 GRANT SELECT,INSERT,UPDATE,DELETE,CREATE ON esdr_test.* TO 'esdr_test'@'localhost';

If you choose to change the password, make sure it matches the password in config-test.json.

If running in production, do the following:
1. Create the config-prod.json and mail-config-prod.json files. Just copy from the other configs, but you need only include the parts that differ from config.js.
2. Do the following to create the production database and user:
```
 CREATE DATABASE IF NOT EXISTS esdr_prod;
 GRANT ALL PRIVILEGES ON esdr_prod.* To 'esdr_prod'@'localhost' IDENTIFIED BY 'USE_A_GOOD_PASSWORD_HERE';
 GRANT SELECT,INSERT,UPDATE,DELETE,CREATE ON esdr_prod.* TO 'esdr_prod'@'localhost';
```
  Again, make sure the user and password you specify matches those in config-prod.json.
Make sure the datastore data directory defined in the config file exists.

Run

The NODE_ENV environment variable may be specified when running, and must be one of dev, development, test, prod, or production. Defaults to dev if unspecified.

To run the server in development mode, do any of the following:

npm start
NODE_ENV=dev npm start
NODE_ENV=development npm start

To run the server in test mode, do:

NODE_ENV=test npm start

To run the server in production mode, do either of the following:

NODE_ENV=prod npm start
NODE_ENV=production npm start

Development

To generate the CSS from the SCSS template, do:

npm run-script gen-css

To compile the handlebars templates, do:

npm run-script gen-handlebars

esdr's People

Contributors

Stargazers

Watchers

Forkers

psvensson sufyanabbasi geohealthshivam shencoop nicoazel ajsmit24 nathanderon yenchiah

esdr's Issues

Include feed name, lat, and lon in feed export?

When we export data from ESDR, especially in the new multi-export setup, it would be great to have a way to get corresponding lat,lon of each column without having to learn the ESDR api and writing code.

Could we for example insert lat, lon rows in the exported CSV, before the time series measurements? CSV is important since a lot of our users aren't coders, and they're using spreadsheets.

My vote would be to either insert lat, lon, name as three new rows in the data download CSV, or to have an alternate CSV download with the same columns headers, and just the three rows name, lat, lon. That latter thing might be easier from node, I'm not sure?

Can't see numeric values in ESDR chart at peak

Good -- shows numbers below peak:

Bad -- numbers at peak draw off the edge and aren't legible:

Add option to get feed metadata as CSV

Some users want feed metadata as CSV instead of JSON. Add a format query string option to the feed metadata API. Defaults to JSON, but will accept csv (case insensitive) for CSV output of the metadata. Similar in spirit to the format option for export. More detail...

Currently, calling this:

http://esdr.cmucreatelab.org/api/v1/feeds/?fields=id,name,latitude,longitude&where=productId=69&limit=3&orderBy=id

Will return something like this:

{
   "code" : 200,
   "status" : "success",
   "data" : {
      "totalCount" : 48002,
      "rows" : [
         {
            "id" : 12668,
            "name" : "AQMD_NASA_17 B PurpleAir",
            "latitude" : 34.062618,
            "longitude" : -118.247131
         },
         {
            "id" : 12669,
            "name" : "AQMD_NASA_18 PurpleAir",
            "latitude" : 34.138439,
            "longitude" : -118.13797
         },
         {
            "id" : 12670,
            "name" : "AQMD_NASA_18 B PurpleAir",
            "latitude" : 34.138439,
            "longitude" : -118.13797
         }
      ],
      "offset" : 0,
      "limit" : 3
   }
}

This issue requests to extend the API so that calling this (note the addition of &format=csv at the end):

http://esdr.cmucreatelab.org/api/v1/feeds/?fields=id,name,latitude,longitude&where=productId=69&limit=3&orderBy=id&format=csv

would return this:

id,name,latitude,longitude
12668,AQMD_NASA_17 B PurpleAir,34.062618,-118.247131
12669,AQMD_NASA_18 PurpleAir,34.138439,-118.13797
12670,AQMD_NASA_18 B PurpleAir,34.138439,-118.13797

Things to deal with:

the CSV format loses the information about the total number of feeds available, the limit, and the offset. Maybe consider some other query string option which would cause a "commented-out" preamble to be included before the header with this information.
decide whether to allow selection of all fields. For example, does it really make sense to request the channelBounds JSON field in a CSV? Same for channelSpecs. Presumably, if you know how to deal with JSON, you'd want it all to be JSON.
strings with commas in them (but, are there any such fields? do we allow commas in feed names? the only other fields which could/would have commas are channelSpecs and channelBounds, but maybe we simply don't allow selection of those in CSV format?)
decide how to map null data. Present it as null or an empty string? For feed data export, the datastore uses empty string when the data is null, so maybe empty string here is nice for symmetry.

Filter to show only locations with recent data on /browse

Option, default turned on, to filter results by feeds that have data in the past 30 days. Maybe it could go in the search results?

When showing RAMP sites, half or more are decommissioned locations.

The most-recent API method doesn't support string values

Can't see numeric values on Y axis for Avalon SO2 and H2S

Avalon SO2 and HS2 tend to be values like 0.001. But there's not enough space on the right to display so many digits, so it shows as 0.00. Need more space to the right:

Link: https://esdr.cmucreatelab.org/browse/#channels=1.SO2_PPM,1.H2S_PPM&time=1354704319.2479482,1562154466.977544&zoom=3&center=49.35720091782239,-92.91969375610347&cursor=1554322435.693&search=liberty

Can we minimize the impact of paging search results on /browse?

It's easy to not realize you're missing results on the lower left pan.

How important is it to page -- could we always show all results?

Or could we start with and page by 10K or 25K results at a time instead of 100? Could we place the "Load next N" button at the bottom of the list when you've scrolled to the end, and make it less greyed-out looking?

Auto-update ESDR chart when showing real-time data

In the chart, when showing a time range that includes the current time, could we automatically refresh periodically to show most recent data as it arrives?

View Histogram broken on /browse with Safari

Safari on macOS 10.14 seems to do this:

instead of showing the histogram overlayed to the map as Chrome does.

/browse "feedback" covers bottom of search results

The button uses z-index: 1,so it overlaps the search results

[improvement] Place into a Docker container

Hello,

This is the perfect use-case for using Docker. Being able to launch an EC2 instance and then simply running $ docker pull <something>/esdr will make the setup process much easier (especially if all the mySQL components are properly configured.

ESDR explorer search clears when you hit enter

I observed someone typing in a search under channels, and then press Return, which then cleared the search. That person was very confused. Let's make Return not clear the search.

On browse, filtering on SO2 makes charts empty

Repro:

Search for (v4) ramp
Turn on Search Result Filter SO2
Zoom into Clairton
Click on Clairton Center (v4) RAMP and turn on SO2
Observe chart at bottom does not show any data, and has title "undefined"

Turn off SO2 Search Result Filter
Click again on Clairton Center (v4) RAMP. Observe that SO2 is not checked despite not being turned off by the user

Turn on (again) Clairton Center (v4) RAMP SO2
Observe two charts now -- the correct one with data at the bottom:

Click or double-click in browse often pans thousands of miles

Repro:

Type a search in Channels, e.g. (v4) ramp
Zoom in to greater Pittsburgh area
Start single-clicking or double-clicking around the map
You'll jump to lat=0 lon=0 off the coast of Africa

Mousing around, it seems like the cursor switches from hand to arrow sometimes on the map, and it seems like clicking or double-clicking when it's arrow causes the problem. I wonder if maybe the arrow appears when the mouse is over a sensor location that's been filtered-out by the search terms?

/explore and main api: Multi-channel download

Add process which produces a JSON cache of all public feeds

Sites like environmentaldata.org suffer from painfully slow load times as they try to load all of ESDR's public feeds. It might be nice to have a JSON cache of all public feeds which gets updated regularly (every 1-5 minutes?) and contains some essential, but minimal info about each feed.

Perhaps the following:

id
name
latitude
longitude
lastUpload / maxTimeSecs
exposure?

A query like this is a decent start:

select productId,
       id,
       name,
       latitude,
       longitude,
       UNIX_TIMESTAMP(lastUpload) as lastUploadSecs,
       maxTimeSecs,
       deviceId
from Feeds
where isPublic = 1
order by productId, deviceId, id desc;

Ideas to consider:

Store the JSON under ESDR's public directory, maybe in some subdirectory denoting it as a cache.
Multiple versions, with differing amounts of info.
Abbreviated field names in the interest of file size OR using some more compact format such as an array of arrays.
Group by product ID?
Sort by productId asc, deviceId asc, feedId desc...so that the most recent feed for a device comes first?
Also generate separate JSON files per product?

Possible JSON format:

{
   "version" : 1,
   "fields" : ["id", "name", "latitude", "longitude", "lastUploadSecs", "maxTimeSecs", "deviceId"],
   "feedsByProductId" : {
      "1" : [
         [26087, "West Mifflin ACHD", 40.363144, -79.864837, 1576762626, 1576686600, 26017],
         [59665, "Pittsburgh ACHD", 40.4656, -79.9611, 1648222891, 1648218600, 56652]
      ],
      "8" : [
         [4268, "CREATE Lab Test", 40.44382296127876, -79.94647309184074, 1635191877, 1635189738.36, 4260],
         [4277, "Red Room", 40.34107763959465, -80.069620013237, 1484140498, 1484084287, 4265]
      ],
      "9" : [
         [4245, "Wifi Speck 6", 40.443738, -79.946481, 1422565308, 1422565433, 4230],
         [4313, "Outdoors", 40.50156314996969, -80.06125688552856, 1432395167, 1431359910, 4291]
      ]
   }
}

Export link goes to foo.bar

To reproduce:

Visit https://esdr.cmucreatelab.org/explore/#channels=3544.TEMP&plotHeight=5.000&plotAreaHeight=40.000&showSparklines=true&showFilters=true&showSettings=true&showResults=true&center=41.182594536015834,-79.0417042070809&zoom=8

Click on export button

Click on CSV

Browser tries to navigate to foo.bar

Screenshot. (For this screenshot, my mouse was hovering over the CSV link, even though the mouse isn't visible. See the URL foo.bar in the lower left)

Improve search for /browse

Split tokens by whitespace and show feeds that contain all search terms (just not necessarily in order)

E.g. searching for

achd ramp

would include feeds from

ACHD - Lawrenceville (v4) RAMP

and searching for

achd pm

would include feeds from

Lawrenceville ACHD PM25B_UG_M3

PM2.5 Visualization Broken due to Syntax Error

ReferenceError: assignment to undeclared variable dataPocessQueue browse:2431:7 
getTiles https://esdr.cmucreatelab.org/browse/:2431:7
getGrapherTimeTiles https://esdr.cmucreatelab.org/browse/:2451:7
reColor https://esdr.cmucreatelab.org/browse/:2376:7
onclick https://esdr.cmucreatelab.org/browse/:1:1

It looks like dataProcessQueue is misspelled in the following function:

    //beginTime and endTime are in Epoch
    function getTiles(beginTime, endTime, multifeedName) {
      dataPocessQueue = [];
      var result = offsetLimitCalculator(beginTime, endTime);
      var level = result.level;
      var offset = result.offset;
      pauseTimeline();
      currentLevel = level;
      timelineReady = false;
      disablePlayButton();
      spinnerTransform();
      var url1 = "https://esdr.cmucreatelab.org/api/v1/multifeeds/" + multifeedName + "/tiles/" + level + "." + offset;
      var url2 = "https://esdr.cmucreatelab.org/api/v1/multifeeds/" + multifeedName + "/tiles/" + level + "." + (offset + 1);
      var url3 = "https://esdr.cmucreatelab.org/api/v1/multifeeds/" + multifeedName + "/tiles/" + level + "." + (offset - 1);
      getData(url1, multifeedName);
      getData(url2, multifeedName);
      getData(url3, multifeedName);
    }

/browse search result checkmarks not set when linked to channels

The /browse search results do not show the checkmarks when channels are selected for plotting from links.

Recent refactoring around improving search results loading has introduced this regression, but it should be a fairly simple fix.

cmu-create-lab / esdr Goto Github PK

esdr's Introduction

Environmental Sensor Data Repository (ESDR)

Other Links

Concepts and Terminology

Setup

Run

Development

esdr's People

Contributors

Stargazers

Watchers

Forkers

esdr's Issues

Recommend Projects

Recommend Topics

Recommend Org