Git Product home page Git Product logo

Comments (3)

elasticsearchmachine avatar elasticsearchmachine commented on July 20, 2024

Pinging @elastic/es-analytical-engine (Team:Analytics)

from elasticsearch.

craigtaverner avatar craigtaverner commented on July 20, 2024

This is related to the scheduled work for ST_DISTANCE, which covers at least the distance calculation part. However calculating speed is a separate concern. At the simplest, this could be simply distance/duration, which does not require a new function, so could be considered complete once the ST_DISTANCE is done. However, there are two further considerations:

  • The above assumes we have some way of having the two points and two timestamps in the same row, which could be true of some datasets, but is far likely to not be true when each document contains only the current location and timestamp. So we need a way of getting both the current and previous document into the same row.
  • This feature feels like it is suitable for TSDB. During the original TSDB work we did a feature involving optimized geo_line aggregations, and those collected sequences of locations grouped by TSID into LineString geometries, ordered by time, including a feature for line simplification for very large geometries. There was a request to filter out outliers that deviate too much from the line, and the above feature sounds related, where we want speed outliers to be detected and highlighted. If the users of this feature are likely to use TSDB features, since they are working with time-ordered event data, perhaps we should consider a TSDB feature around outlier detection (both spatial, temporal and spatiotemporal/speed)?

from elasticsearch.

craigtaverner avatar craigtaverner commented on July 20, 2024

I also took a look at the linked enhancement request and the SPL query they use and have a few comments:

  • It looks like the main missing feature from our side is eventstats, which I believe we're working on (calling 'inline stats' at the moment).
  • The SPL query seems to do a lot of unnecessary inefficient work. In particular it appears to use eventstats to associate every single event with the same user with every other event of that user, and then calculate the distance, duration and speed between every combination. This is extremely inefficient, if we assume we only really need to consider consecutive events in time-order.

It would be far more efficient to use some time-ordering, or event ordering approach and look at windowing functions. @alex-spies pointed out the SQL functions LEAD and LAG as a good approach to this. They also seem generally useful for event data, log data and the security use cases.

from elasticsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.