Git Product home page Git Product logo

datapoints-csv-extractor's Introduction

datapoints-csv-extractor

An extractor utilized to extract datapoints stored in CSV format

Overview of extractor

  • The extractor is composed of a Python script (Python v3.7.2) that extracts datapoints stored in CSV format (the specific format of the file is discussed below), formats them, and pushes them to the CDF
  • This script can be run for either live or historical CSVs in a given folder
    • Live: The script will process the 20 most recent files in the folder, and then proceed to check for any new files added to the folder every 8 seconds indefinitely
    • Historical: The script will process data from all files in the folder, newest first, and then stop
  • The script either removes the files from the source after successfully processing the data, or moves them to subfolder

File format

The data that the script is capable of processing is in the following format:

  • CSVs encoded in latin-1 that have a delimiter of ; and quote character of "
  • First row includes names of the time series (one per column) stored in the following format: external_id: time_series_name
    • external_id: the unique ID that the client attaches to a given time series. It is assumed that the external_id of the time series never changes
    • time_series_name: the name of the time series, and method of how the CDF maps datapoints of a time series to the metadata of that time series. It is assumed that the time_series_name may change on the client's end
  • Second row includes the unit of the time series. This is already stored in the time series metadata, and is thus ignored.
  • Following rows are stored in the following manner
    • First column: timestamps of the datapoints in Epoch time
    • Following columns: values of a datapoint at a given time for a given time series

Deploying the extractor

At the very minimum, the script needs to have an input folder that holds the CSV files that the user wants processed.

By default, the script runs assuming that the user desires to process historical data. This means that the script will not continue to check for new files, and instead will stop running after processing files detected when initially run.

If not utilizing the --api-key command line flag, then it is necessary to configure an environment variable called COGNITE_EXTRACTOR_API_KEY or COGNITE_API_KEY that stores the API key for CDF.

On an iOS device:

python datapoints-csv-extractor.py --input <INPUT_FOLDER>

On a Windows device:

<PYTHON PATH>\python.exe datapoints-csv-extractor.py --input <INPUT_FOLDER>
  • Where is the path to the Python interpreter on the device.

Command line arguments

Long Tag Short Tag Required Parameter Required Description
--input -i TRUE TRUE Specify the folder that holds the CSV files that the user wants processed.
--historical FALSE Flag to process historical data (redundant). By default, the script will process live data.
--live -l FALSE Flag required to process live data. If this flag is not used, then the script will process historical data.
--api-key -k FALSE TRUE If this flag is not use, the script will attempt to pull the API key from an environment variable called COGNITE_EXTRACTOR_API_KEY
--log -d FALSE TRUE Specify the folder that log files will be created.
--log-level FALSE TRUE Which log level should be logged. Default INFO.
--move-failed FALSE FALSE If this flag is used, the script will move CSV files failed to process into a subfolder called failed/
--keep-finished FALSE FALSE If this flag is used, the script will move finished CSV files into a subfolder called finished/

Contributing

Feel free to open issues / pull requests on GitHub.

Versioning

Changelog can be found on GitHub.

SemVer. See repository tags.

Tests

Follow command will run unit tests and generate coverage report:

pipenv run pytest

datapoints-csv-extractor's People

Contributors

eventh avatar pymishi avatar s38feng avatar simenl avatar vbmach avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

kozhevnykov

datapoints-csv-extractor's Issues

How do I get an API KEY?

I'd like to suggest some improvements to this and other libraries from cognitive, but all of them need the API KEY, is there one to use in a demo mode?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.