Git Product home page Git Product logo

py-st-dbscan's Introduction

py-st-dbscan

An implementation of ST-DBScan algorithm using Python language. For more information, see the paper:

Birant, D. and Kut, A. (2007). St-dbscan: An algorithm for clustering 
spatial–temporal data. Data & Knowledge Engineering, 60(1):208 – 221. 
Intelligent Data Mining.

Related work

DBSCAN is a spatial density-based clustering algorithm for applications with noise. This algorithm does not require the number of clusters, this value is identified based on the quantity of highly density connected components. The required parameters are the radius and the minimum number of neighbors.

From these parameters, clusters with different formats and the same density, are found [Sander et al., 1998]. This algorithm can be applied in several contexts in which the identification of densely connected components is desired (e.g., delimitation of deforested regions and identification of areas of organs affected by tumors). In all these contexts, clusters are identified considering spatial characteristics of the elements.

We can find many variations of DBSCAN in the literature, one of these is ST-DBSCAN, which takes into account both spatial and non-spatial aspects (e.g., temperature, color, or time) of the elements [Birant and Kut, 2007]. However, in this work ST-DBSCAN is used for identifying traffic congestions from geospatial data provided by an application of taxi calling. In this context, many efforts have been made to identify traffic congestion using clustering algorithms. In [Kianfar and Edara, 2013], the efficiency of Kmeans, Hierarchical Clustering, and Gaussian Mixture Models (GMM) were compared identifying congested and free roads. In the other way, [Liu et al., 2010] developed an algorithm called Mobility-Based Clustering for find agglomerations of mobile objects (e.g., taxis) in cities. They defend that vehicle speed can be predicted through its accumulation. Finally, a deep learning method, called Restricted Boltzmann Machine, was proposed for predicting traffic conditions from data generated by taxi GPS in order to recommend roads for taxi drivers [Niu et al., 2015].

Sample application Traffic Congestion Detection using Taxi Position

Traffic congestion is a frequent situation in urban centers nowadays. It occurs because the urban infrastructure can not keep up with the growth of the number of vehicles. Thus, it causes many drawbacks, such as stress, delays, and excessive fuel consumption. This application aims to identify traffic congestions using the geographic position of taxis provided by GPS.

We assume that a vehicle speed can be estimated by its position in different times. Thus, we applied a density clustering algorithm, which takes into account both spatial and non-spatial aspects [R3], to identify traffic congestions from taxi positions.

Sample result for Curitiba data

Reference

[Birant and Kut, 2007] Birant, D. and Kut, A. (2007). St-dbscan: An algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60(1):208 – 221. Intelligent Data Mining.

[Sander et al., 1998] Sander, J., Ester, M., Kriegel, H.-P., and Xu, X. (1998). Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Discov., 2(2):169–194

[Liu et al., 2010] Liu, S., Liu, Y., Ni, L. M., Fan, J., and Li, M. (2010). Towards mobility-based clustering. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pages 919–928, New York, NY, USA. ACM.

py-st-dbscan's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-st-dbscan's Issues

Vertical Spatial Shapes

Hi,

I am probably making a mistake somewhere but the resulting spatial clusters seemingly fit into vertical lines which just don't make any sense (see image from QGIS showing 3 of the largest clusters).

image

I have not modified any of your scripts, I have basically just supplied another dataset. The parameters I have used are:

spatial_threshold = 5000 # meters
temporal_threshold = 4320 # seconds
min_neighbors = 5

When it comes to GIS conversion, my lon lat coordinates are in epsg 4326 and I have used the convert_to_utm() method from src_epsg=4326 to dst_epsg=27700 (I have also tried other dst types).

Do you have any ideas why such cluster could occur? Any knowledge of this being an issue? Or thoughts what I am doing wrong?

Any comments much appreciated!

David

Query about documentation

Nice project and really useful for many applications.
The comments on stdbscan.py at line 20 indicated
:param temporal_threshold: Maximum non-spatial distance value (meters);

I think the units could be wrong for a temporal threshold. Would be minutes instead of meters?

SettingWithCopyWarning

I didn't check if its influences the results but it does sound a bit dangerous.

$ /st_dbscan/python/src/stdbscan.py:22: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df['cluster'] = unmarked

incorrectly label border points as noise

according to the original st-dbscan publication, "The points which have been marked to be noise may be changed later, if they are not directly density-reachable but they are density-reachable from some other point of the database", thus source code:

if all([neig_cluster != noise, neig_cluster == unmarked])

should change to

if neig_cluster is not in any cluster

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.