aikubo / linkinglines Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 3.0 8.43 MB

Link and Cluster Cartesian Line Segments

License: MIT License

Python 96.50% TeX 3.50%

linkinglines's Introduction

Welcome to LinkingLines!

Read the Full documentation on ReadtheDocs!

1. Introduction

Welcome to the documentation for our Python module that performs the Hough Transform on lines from geospatial data, clusters it using Agglomerative Clustering. This module also includes custom plotting scripts and feature extraction methods to help you analyze and visualize your data effectively.

This code was used to create the results published in Kubo Hutchison et al., 2023. Initially, it was designed to link together mapped dike segments in Cartesian space to find their true lengths. This code can be applied to any linear features including roads, fractures, and other types of linear data.

Data Clustering: Apply Agglomerative Clustering to group similar data points, this can be used for data reduction, analysis, and mapping.
Data Visualization: Custom plotting scripts help you visualize and analyze your data, making it easier to identify patterns and anomalies.
Feature Extraction: Extract meaningful features from clustered data to perform further analysis, such as linear or radial type features.

Full documentation can be found on ReadTheDocs

2. Installation

To use this module, make sure you have Python installed (preferably Python 3.x). You can install the required packages using pip:

pip install linkinglines

3. Quick Start

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import linkinglines as ll

data=ll.readFile('path/to/data')

dtheta=2 #degrees
drho=500 #meters

dikeset, Z=ll.AggCluster(data)
lines,evaluation=examineCluster(data)
fig,ax=DotsLines(lines, ColorBy='AvgTheta')

We have three examples:

In-depth tutorial with Hough transform, clustering, and feature extraction using Spanish Peaks Data CSV file.
Hough Transform and feature extraction on Venus lineament data shape file.
Hough Transform on fracture data geoJSON.

Data from:

4. Contributing Guidelines

If you find bugs or issues please open up an issues. We also welcome requests for additional features.

If you would like to contribute code please do so in a separate branch and open up an issue describing your contribution.

git clone [email protected]:USER/LinkingLines.git
git checkout my-development-branch

We recommend using a virtual environment to manage packages. We use poetry to manage dependencies and building. See more about poetry.

pipx install poetry # you may need to install pipx first

cd linkinglines # go to the repo

poetry install --with test,dev # install in editable mode

# add your code

poetry run pytest  # test code locally

Before submitting your pull request please verify the following:

Code is documented in NumPy Docstring Style
Code is tested and passes test
- To run the tests please go to "/tests" and run poetry run pytest or pytest
- Add your test code to any file with the name test
- More here on pytest and testing practices
Open an issue and pull request
After your pull request the code will be reviewed by maintainer (namely @aikubo).
After passing review and automated tests it will be added to the next release and published to pypi.

linkinglines's People

Contributors

Stargazers

Watchers

Forkers

qiangf nialov hugoledoux

linkinglines's Issues

Document the version of LinkingLines sent for review with tag

Part of openjournals/joss-reviews#6147

You have stated that version v.2.1.0 of LinkingLines is to be reviewed. You have created a GitHub release accordingly. However, there is no git tag representing this revision of the LinkingLines repository. I would suggest creating a git tag that represents the correct revision. If the latest commit on master represents the version to be reviewed, you can tag it with git tag and push the tag with git push --tags.

Clean up repo with a .gitignore

Part of openjournals/joss-reviews#6147

The repository has files and folders that probably don't need to be checked into version control. A non-exhaustive list:

.virtual_documents
dist
__pycache__ (several folders)
src/linkinglines.egg-info
docs/.ipynb_checkpoints
docs/.virtual_documents
docs/_build

You could remove these. In the future you can ignore these files with a gitignore. See:
https://git-scm.com/docs/gitignore
https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files

Add example with data other than dykes (e.g. fracture trace data)

Just to preface: use of this specific data is just a suggestion, feel free to use any other source! I do believe, however, that adding some (small) example(s) with data other than dyke data is needed to make evident the general usability of the software. No complex analysis is required for this purpose, just show how it is loaded (in csv form or with geopandas) and a simple analysis with e.g. the Cartesian space - Hough Transform space plot.

My data suggestions

I have personally been involved with gathering fracture trace data and some of our data is publicly available:

https://zenodo.org/records/7077846

The data consists of ESRI Shapefile data that can be loaded with geopandas (See #20) or with QGIS and transformed to csv to work with the current API.

https://zenodo.org/records/7919843

This dataset is contains code and data. Fracture trace data is in data/trace_data/traces/20m/ folder in geojson data type.

You can freely add individual files/parts of files to this repository as the datasets are openly and freely licensed. Just add a mention and link to Zenodo or the DOI found on the Zenodo pages in e.g. the README.md or some other suitable place.

Documentation example requires local modules

Part of openjournals/joss-reviews#6147

import sys
sys.path.append("../src")
sys.path.append("/home/akh/myprojects/Dikes_Linking_Project/dikedata/spanish peaks")

# Packages written for the paper
from htMOD import HoughTransform
from clusterMod import AggCluster
from plotmod import DotsLines, plotScatterHist, pltRec, DotsHT
from PrePostProcess import *
from examineMod import examineClusters, checkoutCluster
from fitRadialCenters import RadialFit, NearCenters, CenterFunc

should probably become something like

import linkinglines as ll

from ll.htMOD import HoughTransform
...  # etc

I believe a similar thing needs to be done for your tests to work automatically.

Document development dependencies in pyproject.toml

I would document the development dependencies i.e. pytest in pyproject.toml according to the specification (https://packaging.python.org/en/latest/specifications/pyproject-toml/#dependencies-optional-dependencies).

Documentation links don't work and examples are not formatted

Part of openjournals/joss-reviews#6147

The documentation contains unresolved links like so: Follow this indepth tutorial <>_ to get started!

If you build the documentation locally, you probably should see warnings with more details to fix these.

Furthermore, the API documentation can use some more formatting. For example in
https://linkinglines.readthedocs.io/en/latest/linkinglines.html#module-linkinglines.SyntheticLines there are multiple unformatted multiline examples that are put on a single line.

More generic API

Part of openjournals/joss-reviews#6147

The codebase is currently quite fixed towards a specific research workflow. It would help re-use if it becomes more generic. Two good examples:

On the use of WKT

Currently the package only reads and writes csv files with a (undocumented) column containing WellKnownText LineStrings. A generic version would take in any geospatial dataframe, only needing to check whether the geometries are lines. The great package geopandas, fits this use case perfectly, and automatically gives you a range of input and output files, and probably more generic methods to read coordinates from geometry.

On the use of scripting

Ideally package code is object oriented, with methods being useful on their own. Here you use DataFrames as intermediate objects, but they still require scripting steps. This is easier to explain in an example:

Current code:

data=pd.read_csv('path/to/data')
theta,rho,xc,yc=ll.HoughTransform(data)
data['theta']=theta
data['rho']=rho

dtheta=2 #degrees <-- unused
drho=500 #meters <-- unused

dikeset, Z=ll.AggCluster(data)

The above code requires setting data['theta'], before AggCluster can work. As such, it would be easier to let ll.HoughTransform(data) return a new Dataframe with these columns added, that can be used directly, like so:

data = pd.read_csv('path/to/data')
ndata = ll.HoughTransform(data)  # returns dataframe with columns theta, rho, xc, yc added
dikeset, Z=ll.AggCluster(ndata)

You can go even one step further, and let AggCluster call HoughTransform if the required columns theta, rho etc are missing from it.

Add short explanation inside `DemoFractures.ipynb`

I think DemoFractures.ipynb serves as a good example of using the software for other data than dykes but I think it looks like there are no radial patterns in the fracture data (not sure if I was expecting them when suggesting the data 😆 ). Or that is how I interpret the results. I think you should add a few words to the notebook about there apparently not being any radial patterns according to the analysis in this data, which is a result in itself, so there are not any confusion in someone trying to interpret the results.

Issue with running `DemoLinkingLines.ipynb`

Trying to run docs/DemoLinkingLines.ipynb, I get the following error in cell number 2:

➜ poetry run ipython DemoLinkingLines.ipynb
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 5
      1 # Load the example dataset
      2
      3 #load it using built in reading function which also does the preprocessing
      4 # the CSV must have a column called "WKT" and can have any other data
----> 5 dikeset=ll.readFile('../data/SpanishPeaks_3857.csv', preprocess=True)

File ~/projects/LinkingLines/src/linkinglines/PrePostProcess.py:84, in readFile(name, preprocess)
     82 # if preprocess is True, preprocess the data
     83 if preprocess:
---> 84     data = WKTtoArray(data)
     85     data = preProcess(data)
     87 return data

File ~/projects/LinkingLines/src/linkinglines/PrePostProcess.py:196, in WKTtoArray(df, plot)
    194 if not ("WKT" in df.columns ):
    195     if not ("geometry" in df.columns):
--> 196      raise ValueError("No geometry present")
    198 xstart=[]
    199 ystart=[]

ValueError: No geometry present

While making new data formats work, maybe you made a change that made the old data not work 😅

There is also an error in the path to the file, it should be '../data/SpanishPeaks_3857.csv' rather than '/../data/SpanishPeaks_3857.csv' i.e. without the leading forward slash.

aikubo / linkinglines Goto Github PK

linkinglines's Introduction

Welcome to LinkingLines!

Read the Full documentation on ReadtheDocs!

1. Introduction

2. Installation

3. Quick Start

4. Contributing Guidelines

linkinglines's People

Contributors

Stargazers

Watchers

Forkers

linkinglines's Issues

My data suggestions

On the use of WKT

On the use of scripting

Recommend Projects

Recommend Topics

Recommend Org