Git Product home page Git Product logo

esdr's Introduction

Environmental Sensor Data Repository (ESDR)

ESDR is an open source data repository intended for storing and retrieving time series environmental data. Basically, if the data has a timestamp and a value, ESDR can store it and provide ways to retrieve it quickly and securely. Data is stored in a custom, open source datastore which provides extremely fast data inserts and fetches, making for fast and responsive visualizations. The ESDR web site (esdr.cmucreatelab.org) provides a REST API interface to the datastore, making it easy to read/write data. Metadata is stored in a MySQL database.

ESDR is pronounced like the female name, Esther.

Other Links

https://github.com/CMU-CREATE-Lab/esdr-explorer

Concepts and Terminology

If you're familiar with Xively's API, you'll see a lot of parallels with ESDR. Our intention is to use ESDR as the data repository not only for our own products and visualizations, but also for anyone else who wants a place to store data and have tools to easily visualize it.

First, some terminology: ESDR has clients, users, products, devices, feeds, channels, and tiles. Understanding how these entities relate will give a good understanding of how the data and metadata are structured and how the system works.

  • Client: ESDR uses OAuth2 for authentication, so a client in ESDR is simply an OAuth2 client.
  • User: no real surprise here...simply a person who has registered with ESDR and may own one or more products, devices, or feeds. When a user logs in, he/she does so on behalf of an OAuth2 client.
  • Product: a product is simply a certain kind of sensor, for example, the Speck particle sensor.
  • Device: a particular instantiation of a product, i.e. an actual sensor device--something you can put your hands on--typically with a unique serial number.
  • Feed: a particular installation of a device. For example, if I buy a Speck and register it, the behind-the-scenes registration process creates for me both an ESDR device instance (with my Speck's serial number), as well as a feed, for the location I specified during registration. For example, let's say I purchase a Speck and install it under the awning on my deck. During registration, I would give the location a name (e.g. "Deck"), set the latitude/longitude to my house, mark the exposure to outdoors, set visibility (public or private), etc. Data recorded and uploaded by the Speck would be associated with that particular feed. If I then move the Speck to my kitchen, I would re-register the Speck so it is associated with a new feed, because the environment/location has changed. If I accidentally drop the Speck in a sink full of water and replace it with a new one, that new one would be registered as a new device (it has a different serial number), but, at my option, could be associated with the existing feed from the old Speck (so that I have one continuous stream of data since it's the environment being measured which matters most, not the actual device doing the measurement). Similarly, if I sell the Speck, the new owner would register it under her account, and get a new feed for it.
  • Channel: a sensor device measures one or more aspects of its environment, such as temperature, humidity, particle count, battery voltage, etc. Each is considered a different channel. A feed comprises one or more channels.
  • Tiles: data from a particular feed's channel can be retrieved from ESDR in small chunks of JSON which we call tiles. A tile contains at most 512 data points, and is associated with a particular starting timestamp and duration. For example, a tile could represent a summary of a decade's worth of data, or it could contain the actual recorded data samples spanning, say, only 1 second (e.g. heart rate data). The grapher we use fetches tiles as the user pans and zooms the timeline--it requests only the small subset of data it needs to render the plot. The most appropriate analogy is panning/zooming in Google Maps--the browser only requests map tiles for the current small region of the Earth you're exploring at the time. ESDR also has support for a multi-tile fetch where you can fetch data from multiple channels from multiple feeds with a single GET request. This is essential for being able to do visualizations of lots of sensors simultaneously, e.g. air quality in cities all over the country.

Again, the data samples themselves are all stored in the datastore. Data for the above entities is stored in a MySQL database. The big win with the datastore is that it works with billions of samples, doing time aggregation upon insert (and yet inserts are still fast), storing the data at a number of different summarization levels. Thus, it can return a summary of a year's worth (or more!) of data just as quickly as, say, five minutes worth. No summarization computation is required when fetching tiles, so visualizations remain responsive and fast at any zoom level.

We don't yet do spatiotemporal aggregation, but it's on the TODO list.

Please see the HOW TO document for more details on how to use ESDR.

Setup

  1. Install the module dependencies:

     npm install
    
  2. Install the BodyTrack Datastore by doing the following

    1. Fetch the BodyTrack Datastore. In your terminal window, set your working directory to the root of the ESDR repository and do the following:

          git clone https://github.com/BodyTrack/datastore.git
      
    2. Follow the BodyTrack Datastore's build and install instructions.

  3. Install MySQL if necessary. ESDR was tested with and assumes MySQL 5.6 (there are known issues with 5.5).

  4. Do the following to create the development MySQL database and user:

     CREATE DATABASE IF NOT EXISTS esdr_dev;
     GRANT ALL PRIVILEGES ON esdr_dev.* To 'esdr_dev'@'localhost' IDENTIFIED BY 'password';
     GRANT SELECT,INSERT,UPDATE,DELETE,CREATE ON esdr_dev.* TO 'esdr_dev'@'localhost';
    

    If you choose to change the password, make sure it matches the password in config-dev.json.

  5. If you want to be able to run the tests, do the following to create the test database and user:

     CREATE DATABASE IF NOT EXISTS esdr_test;
     GRANT ALL PRIVILEGES ON esdr_test.* To 'esdr_test'@'localhost' IDENTIFIED BY 'password';
     GRANT SELECT,INSERT,UPDATE,DELETE,CREATE ON esdr_test.* TO 'esdr_test'@'localhost';
    

    If you choose to change the password, make sure it matches the password in config-test.json.

  6. If running in production, do the following:

    1. Create the config-prod.json and mail-config-prod.json files. Just copy from the other configs, but you need only include the parts that differ from config.js.

    2. Do the following to create the production database and user:

       CREATE DATABASE IF NOT EXISTS esdr_prod;
       GRANT ALL PRIVILEGES ON esdr_prod.* To 'esdr_prod'@'localhost' IDENTIFIED BY 'USE_A_GOOD_PASSWORD_HERE';
       GRANT SELECT,INSERT,UPDATE,DELETE,CREATE ON esdr_prod.* TO 'esdr_prod'@'localhost';
      

      Again, make sure the user and password you specify matches those in config-prod.json.

  7. Make sure the datastore data directory defined in the config file exists.

Run

The NODE_ENV environment variable may be specified when running, and must be one of dev, development, test, prod, or production. Defaults to dev if unspecified.

To run the server in development mode, do any of the following:

npm start
NODE_ENV=dev npm start
NODE_ENV=development npm start

To run the server in test mode, do:

NODE_ENV=test npm start

To run the server in production mode, do either of the following:

NODE_ENV=prod npm start
NODE_ENV=production npm start

Development

To generate the CSS from the SCSS template, do:

npm run-script gen-css

To compile the handlebars templates, do:

npm run-script gen-handlebars

esdr's People

Contributors

ajsmit24 avatar chrisbartley avatar dognotdog avatar lzhang17 avatar lzmunch avatar pdille avatar rsargent avatar sufyanabbasi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

esdr's Issues

Include feed name, lat, and lon in feed export?

When we export data from ESDR, especially in the new multi-export setup, it would be great to have a way to get corresponding lat,lon of each column without having to learn the ESDR api and writing code.

Could we for example insert lat, lon rows in the exported CSV, before the time series measurements? CSV is important since a lot of our users aren't coders, and they're using spreadsheets.

My vote would be to either insert lat, lon, name as three new rows in the data download CSV, or to have an alternate CSV download with the same columns headers, and just the three rows name, lat, lon. That latter thing might be easier from node, I'm not sure?

Add option to get feed metadata as CSV

Some users want feed metadata as CSV instead of JSON. Add a format query string option to the feed metadata API. Defaults to JSON, but will accept csv (case insensitive) for CSV output of the metadata. Similar in spirit to the format option for export. More detail...

Currently, calling this:

http://esdr.cmucreatelab.org/api/v1/feeds/?fields=id,name,latitude,longitude&where=productId=69&limit=3&orderBy=id

Will return something like this:

{
   "code" : 200,
   "status" : "success",
   "data" : {
      "totalCount" : 48002,
      "rows" : [
         {
            "id" : 12668,
            "name" : "AQMD_NASA_17 B PurpleAir",
            "latitude" : 34.062618,
            "longitude" : -118.247131
         },
         {
            "id" : 12669,
            "name" : "AQMD_NASA_18 PurpleAir",
            "latitude" : 34.138439,
            "longitude" : -118.13797
         },
         {
            "id" : 12670,
            "name" : "AQMD_NASA_18 B PurpleAir",
            "latitude" : 34.138439,
            "longitude" : -118.13797
         }
      ],
      "offset" : 0,
      "limit" : 3
   }
}

This issue requests to extend the API so that calling this (note the addition of &format=csv at the end):

http://esdr.cmucreatelab.org/api/v1/feeds/?fields=id,name,latitude,longitude&where=productId=69&limit=3&orderBy=id&format=csv

would return this:

id,name,latitude,longitude
12668,AQMD_NASA_17 B PurpleAir,34.062618,-118.247131
12669,AQMD_NASA_18 PurpleAir,34.138439,-118.13797
12670,AQMD_NASA_18 B PurpleAir,34.138439,-118.13797

Things to deal with:

  • the CSV format loses the information about the total number of feeds available, the limit, and the offset. Maybe consider some other query string option which would cause a "commented-out" preamble to be included before the header with this information.
  • decide whether to allow selection of all fields. For example, does it really make sense to request the channelBounds JSON field in a CSV? Same for channelSpecs. Presumably, if you know how to deal with JSON, you'd want it all to be JSON.
  • strings with commas in them (but, are there any such fields? do we allow commas in feed names? the only other fields which could/would have commas are channelSpecs and channelBounds, but maybe we simply don't allow selection of those in CSV format?)
  • decide how to map null data. Present it as null or an empty string? For feed data export, the datastore uses empty string when the data is null, so maybe empty string here is nice for symmetry.

Can we minimize the impact of paging search results on /browse?

It's easy to not realize you're missing results on the lower left pan.

How important is it to page -- could we always show all results?

Or could we start with and page by 10K or 25K results at a time instead of 100? Could we place the "Load next N" button at the bottom of the list when you've scrolled to the end, and make it less greyed-out looking?

[improvement] Place into a Docker container

Hello,

This is the perfect use-case for using Docker. Being able to launch an EC2 instance and then simply running $ docker pull <something>/esdr will make the setup process much easier (especially if all the mySQL components are properly configured.

On browse, filtering on SO2 makes charts empty

Repro:

  1. Search for (v4) ramp

  2. Turn on Search Result Filter SO2

  3. Zoom into Clairton

  4. Click on Clairton Center (v4) RAMP and turn on SO2

  5. Observe chart at bottom does not show any data, and has title "undefined"

image

  1. Turn off SO2 Search Result Filter

  2. Click again on Clairton Center (v4) RAMP. Observe that SO2 is not checked despite not being turned off by the user

image

  1. Turn on (again) Clairton Center (v4) RAMP SO2

  2. Observe two charts now -- the correct one with data at the bottom:

image

Click or double-click in browse often pans thousands of miles

Repro:

  1. Type a search in Channels, e.g. (v4) ramp

  2. Zoom in to greater Pittsburgh area

  3. Start single-clicking or double-clicking around the map

  4. You'll jump to lat=0 lon=0 off the coast of Africa

Mousing around, it seems like the cursor switches from hand to arrow sometimes on the map, and it seems like clicking or double-clicking when it's arrow causes the problem. I wonder if maybe the arrow appears when the mouse is over a sensor location that's been filtered-out by the search terms?

Add process which produces a JSON cache of all public feeds

Sites like environmentaldata.org suffer from painfully slow load times as they try to load all of ESDR's public feeds. It might be nice to have a JSON cache of all public feeds which gets updated regularly (every 1-5 minutes?) and contains some essential, but minimal info about each feed.

Perhaps the following:

  • id
  • name
  • latitude
  • longitude
  • lastUpload / maxTimeSecs
  • exposure?

A query like this is a decent start:

select productId,
       id,
       name,
       latitude,
       longitude,
       UNIX_TIMESTAMP(lastUpload) as lastUploadSecs,
       maxTimeSecs,
       deviceId
from Feeds
where isPublic = 1
order by productId, deviceId, id desc;

Ideas to consider:

  • Store the JSON under ESDR's public directory, maybe in some subdirectory denoting it as a cache.
  • Multiple versions, with differing amounts of info.
  • Abbreviated field names in the interest of file size OR using some more compact format such as an array of arrays.
  • Group by product ID?
  • Sort by productId asc, deviceId asc, feedId desc...so that the most recent feed for a device comes first?
  • Also generate separate JSON files per product?

Possible JSON format:

{
   "version" : 1,
   "fields" : ["id", "name", "latitude", "longitude", "lastUploadSecs", "maxTimeSecs", "deviceId"],
   "feedsByProductId" : {
      "1" : [
         [26087, "West Mifflin ACHD", 40.363144, -79.864837, 1576762626, 1576686600, 26017],
         [59665, "Pittsburgh ACHD", 40.4656, -79.9611, 1648222891, 1648218600, 56652]
      ],
      "8" : [
         [4268, "CREATE Lab Test", 40.44382296127876, -79.94647309184074, 1635191877, 1635189738.36, 4260],
         [4277, "Red Room", 40.34107763959465, -80.069620013237, 1484140498, 1484084287, 4265]
      ],
      "9" : [
         [4245, "Wifi Speck 6", 40.443738, -79.946481, 1422565308, 1422565433, 4230],
         [4313, "Outdoors", 40.50156314996969, -80.06125688552856, 1432395167, 1431359910, 4291]
      ]
   }
}

Improve search for /browse

Split tokens by whitespace and show feeds that contain all search terms (just not necessarily in order)

E.g. searching for

achd ramp

would include feeds from

ACHD - Lawrenceville (v4) RAMP

and searching for

achd pm

would include feeds from

Lawrenceville ACHD PM25B_UG_M3

PM2.5 Visualization Broken due to Syntax Error

ReferenceError: assignment to undeclared variable dataPocessQueue browse:2431:7 
getTiles https://esdr.cmucreatelab.org/browse/:2431:7
getGrapherTimeTiles https://esdr.cmucreatelab.org/browse/:2451:7
reColor https://esdr.cmucreatelab.org/browse/:2376:7
onclick https://esdr.cmucreatelab.org/browse/:1:1

It looks like dataProcessQueue is misspelled in the following function:

    //beginTime and endTime are in Epoch
    function getTiles(beginTime, endTime, multifeedName) {
      dataPocessQueue = [];
      var result = offsetLimitCalculator(beginTime, endTime);
      var level = result.level;
      var offset = result.offset;
      pauseTimeline();
      currentLevel = level;
      timelineReady = false;
      disablePlayButton();
      spinnerTransform();
      var url1 = "https://esdr.cmucreatelab.org/api/v1/multifeeds/" + multifeedName + "/tiles/" + level + "." + offset;
      var url2 = "https://esdr.cmucreatelab.org/api/v1/multifeeds/" + multifeedName + "/tiles/" + level + "." + (offset + 1);
      var url3 = "https://esdr.cmucreatelab.org/api/v1/multifeeds/" + multifeedName + "/tiles/" + level + "." + (offset - 1);
      getData(url1, multifeedName);
      getData(url2, multifeedName);
      getData(url3, multifeedName);
    }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.