fhirschmann / rdp Goto Github PK
View Code? Open in Web Editor NEWPython/Numpy implementation of the Ramer-Douglas-Peucker algorithm
Home Page: https://pypi.python.org/pypi/rdp
License: MIT License
Python/Numpy implementation of the Ramer-Douglas-Peucker algorithm
Home Page: https://pypi.python.org/pypi/rdp
License: MIT License
I have a dataframe which I want to reduce the number of points of every feature,
This is my dataframe
ย | recency | frequency | money |
---|---|---|---|
0 | 8.353111 | 1.625226 | 20.943134 |
1 | 2.934699 | 4.013015 | 26.170988 |
2 | 5.703040 | 4.013015 | 31.328091 |
3 | 4.958268 | 4.529335 | 42.014511 |
4 | 4.291614 | 5.502551 | 31.964992 |
Shape is 793,3 before rdp
After using RDP
[ 8.35311112, 1.62522615, 20.94313392],
[ 2.93469946, 4.01301524, 26.17098815],
[ 5.70303988, 4.01301524, 31.32809134],
...,
Shape is 793,3 after rdp no change in output
The link to https://0x0b.de/the-ramer-douglas-peucker-algorithm.html redirects to https://hirschmann.blog/. Not sure if this is gone, or the blog post is rehosted in a hidden url on your new blog now.
Dear all,
is there a way to limit number of simplified points together with max distance (epsilon)?
Or to be more precise, to pick only subset of N the most relevant points for given dmax.
Thanks in advance!
Cheers,
Ivica
Python: 3.9
RDP: 0.8
Minor issue, in rdp constructor the epsilon by default is 0, which makes pylance interpret it as an int type.
Hello and thank you for your package.
I am using it for signal processing and having a timestamps on each data. Is there a possibility to recover these timestamp or the index of the data kept once output from the rdp algorithm?
Many thanks
Thanks,
Justin
This code works MUCH faster. It does not use for loop, instead it uses numpy vectorization
import numpy as np
def line_dists(points, start, end):
if np.all(start == end):
return np.linalg.norm(points - start, axis=1)
vec = end - start
cross = np.cross(vec, start - points)
return np.divide(abs(cross), np.linalg.norm(vec))
def rdp(M, epsilon=0):
M = np.array(M)
start, end = M[0], M[-1]
dists = line_dists(M, start, end)
index = np.argmax(dists)
dmax = dists[index]
if dmax > epsilon:
result1 = rdp(M[:index + 1], epsilon)
result2 = rdp(M[index:], epsilon)
result = np.vstack((result1[:-1], result2))
else:
result = np.array([start, end])
return result
Although I love recursion and Ramer-Douglas-Peucker is naturally defined as a recursion, this is quite inefficient in Python and leads often to errors like RecursionError: maximum recursion depth exceeded in comparison
. In my case I used a GPS track from a GPS sport-watch which has about 12072 points. Python's default recursion depth is 1000 and it can be set with sys.setrecursionlimit
but this should never be used actually since you are shifting the problem just a bit away.
Luckily, RDP can also be defined iteratively as shown here.
Since your RDP implementation seems to be the only one on PyPI it would be nice to reformulate it as an iterative algorithm.
The simplify_coords
function, which uses the RDP algorithm, causes my script to abruptly (and silently) exit under both 0.3.9 and 0.4.4 (the two versions I tested).
It was passed a list of approximately 212k LineStrings (i.e. [x, y, z] elements) and exited seconds later to command prompt.
ccount = len(coordlist)
simplified = simplify_coords(coordlist, 1)
print ccount, len(simplified)
I was running my script under Python 2.7.10 on Windows 10.
https://github.com/cubao/pybind11-rdp
It's much faster than python version.
The current pldist
function returns the distance between a point and the infinte line generated by two other points (i.e the distance to the orthogonal projection), which is incorrect. It should return the distance between a point and a line-segment.
Here is a fix coming from this answer https://stackoverflow.com/a/54442561:
def pldist(point, start, end):
"""
Calculates the distance from ``point`` to the line given
by the points ``start`` and ``end``.
:param point: a point
:type point: numpy array
:param start: a point of the line
:type start: numpy array
:param end: another point of the line
:type end: numpy array
"""
if np.all(start == end)):
return np.linalg.norm(point - start)
# normalized tangent vector
d = np.divide(end - start, np.linalg.norm(end - start))
# signed parallel distance components
s = np.dot(start - point, d)
t = np.dot(point - end, d)
# clamped parallel distance
h = np.max([s, t, 0])
# perpendicular distance component, as before
c = np.cross(point - start, d)
# use hypot for Pythagoras to improve accuracy
return np.hypot(h, c)
Hi fhirschmann,
Thanks for writing this library, it's immensely useful for me. I'm trying to use it in some neural network applications where I'm simplifying the data I am analyzing - mainly vector images.
Occasionally I get a weird error thrown from the rdp code from the np.linalg.norm calls: "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()". It works on 95% of the time though. I wonder if you have also encountered this issue before. If not, I'll try to do more digging to see what is up with this.
Cheers.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.