Git Product home page Git Product logo

mc-hw's Introduction

Data Tracker

This application tracks a small set of Bitcoin and Ethereum crypotcurrency quotes for multiple currencies:

  • btceur
  • btcgbp
  • btcjpy
  • btcusd
  • btcusdc
  • btcusdt
  • etheur
  • ethgbp
  • ethjpy
  • ethusd
  • ethusdc
  • ethusdt

TODO: update naming and 'metrics' tracked to make more sense. 'metric' throughout the code base can probably be referred to as 'priceToCurrency' and we may want to handle other types of metrics like 'tradeCount'.

The current application can be accessed:

https://jzf1aj3eu2.execute-api.us-east-1.amazonaws.com/metrics

Deploying btc-data-tracker to stage dev (us-east-1, "default" provider)

โœ” Service deployed to stack btc-data-tracker-dev (63s)

us-east-1
endpoints:
  GET - https://jzf1aj3eu2.execute-api.us-east-1.amazonaws.com/metrics
  GET - https://jzf1aj3eu2.execute-api.us-east-1.amazonaws.com/metrics/{metric}
functions:
  rateHandler: btc-data-tracker-dev-rateHandler (34 MB)
  getMetrics: btc-data-tracker-dev-getMetrics (34 MB)

Endpoints Explained:

Up and Running

To run your own instance of this application, you will need:

  1. AWS Account
  2. Serverless Framework Account
    • this framework will require a programmatic user from your AWS account
    • TODO: setup a secret manager
  3. Influx DB Cloud Account
    • code uses api token
    • The application expects two buckets to be created
      • BTC
      • BTC_24hr_rank
      • * no need to create any schema, the app will take care of that
      • TODO: name buckets more appropriately
  4. Cryptowatch Account
    • code uses api token
    • This may be optional, but the code currently uses an API token for increased usage permissions

Install

With npm:

npm install -g serverless

Without npm:

curl -o- -L https://slss.io/install | bash

Run

Create the aws resources required to run:

serverless deploy

Follow the CLI output to the REST API url or run the following:

serverless invoke -f getMetrics

Frameworks & Services Used

The frameworks and services used to track this data:

  • Serverless Framework
    • a simple app creation framework that hooks us into AWS with minimal setup/configuration
    • TODO: extract infrastructure into terraform or cloudformation in order to gain functionality not supported by the framework
  • AWS
    • the preferred cloud provider
    • TODO: support multi cloud if required
  • Python
    • 3.8
    • packages:
      • requests
      • influxdb-client
    • TODO: extract code into more modular components/classes
    • TODO: add unit tests
  • Datastore -- influxDB
    • TODO: abstract data logic in code to become more flexible when it comes to data providers
    • TODO: learn best practices for timeseries data and implement

Optimizations

All of the current aspects of this data tracker were chosen in order to ship quickly and inexpensively.

Scalability

If the priorities of this data tracker were to change, here are the scenarios that we could account for:

  1. What would you change if you needed to track many metrics?
    • Depending on how many metrics we want to track, our schema may need to be optimized
      • I am a little new to timeseries data, so I am not sure exactly what an efficient schema/structure would look like for our use-case
    • If the number of metrics increases enough, our ranking algorithm/query may need to be optimized or we may need to run a computation after every write to the database
      • This could be a trigger or in influxDB it would be a task
  2. What if you needed to sample them more frequently?
    • Using serverless functions might be the wrong approach if we increase the sample frequency.
      • At 1 request/min we execute:
        • 43,800 requests/month
        • 525,600 requests/year
      • At 1 request/30seconds we execute:
        • 87,600 requests/month
        • 1,051,200 requests/year
      • AWS Lamdba charges per request and request duration
        • depending on how many users we intend to support, the price of hosting a traditional server may end up being more cost effective
    • cryptowat.ch has a web socket API which would allow us to maintain a persistent connection and stream updates of the data
      • depending on the frequency and websocket API pricing, this might be the best approach for us to take
  3. What if you had many users accessing your dashboard to view metrics?

Testing

Testing for our data tracker should involve:

  • Unit testing of the core logic in python
    • getMetrics
      • inputs:
      • outputs: list of the metrics we track
    • getMetrics/{metric-id}
      • inputs: price
      • outputs: appropriately modeled data with rank
        • handle BAD scenarios:
          • metrics that don't exist
          • script/code as input
        • handle GOOD scenarios:
          • metrics that do exist
    • uploadData
      • inputs:
      • outputs: log statement or confirmation that data was or was NOT uploaded
        • handle BAD scenarios:
          • can't connect to DB
          • can't connect to cryptowat.ch
  • Integration testing
    • make requests to API endpoints and confirm that they are returning appropriate data
      • compare outputs against what is in the database
      • compare outputs to cryptowat.ch

Feature Proposal

To help the user identify opportunities in real-time, the app will send an alert whenever a metric exceeds 3x the value of its average in the last 1 hour.

For example, if the volume of GOLD/BTC averaged 100 in the last hour, the app would send an alert in case a new volume data point exceeds 300.

Using InfluxDB in the cloud allows us to create & manage "Check"s. A check allows us to run an aggregate function on our data and send an alert if we cross a certain data threshold. In this case we would need to keep an hourly average of our metrics. This could be updated on each write to our database or on a schedule. Our operations would look like:

  1. Get cryptowat.ch data (every 1 minute)
  2. Write data to InfluxDb
    • On write, trigger hourly_avg task
    • hourly_avg task:
      • sum each of the metrics in each of the records from the past hour
      • divide those by 60 (because we sample every minute)
      • write those values to a separate table
  3. Create check that looks at the latest record in source table and compares to the latest record in the hourly_avg task table

mc-hw's People

Contributors

treyhay31 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.