tdaverse / ggtda Goto Github PK

ggplot2 extension to visualize persistent homology

Home Page: https://tdaverse.github.io/ggtda/

License: GNU General Public License v3.0

R 94.71% TeX 5.29%

topological-data-analysis r tidyverse ggplot2 persistent-homology visualization ggplot-extension tda persistence-data rstats simplicial-complex

ggtda's Introduction

ggtda

Overview

The ggtda package provides ggplot2 layers for the visualization of constructions and statistics arising from topological data analysis.

Installation

The development version can be installed used the remotes package:

# install from GitHub
remotes::install_github("tdaverse/ggtda", vignettes = TRUE)

For an introduction to package functionality, read the vignettes:

# read vignettes
vignette(topic = "visualize-persistence", package = "ggtda")
vignette(topic = "illustrate-constructions", package = "ggtda")
vignette(topic = "grouped-list-data", package = "ggtda")

We aim to submit to CRAN in Spring 2024!

Example

# attach {ggtda}
library(ggtda)
#> Loading required package: rlang
#> Loading required package: ggplot2

Sample data set

This example illustrates ggtda features using an artificial point cloud $X$ sampled with noise from a circle:

# generate a noisy circle
n <- 36
set.seed(0)
t <- stats::runif(n = n, min = 0, max = 2*pi)
d <- data.frame(
  x = cos(t) + stats::rnorm(n = n, mean = 0, sd = .2),
  y = sin(t) + stats::rnorm(n = n, mean = 0, sd = .2)
)
# plot the data
ggplot(d, aes(x, y)) + geom_point() + coord_equal() + theme_bw()

Topological constructions

ggtda provides stat and geom layers for common TDA constructions. To illustrate, pick a proximity, or threshold, to consider points in the cloud to be neighbors:

# choose a proximity threshold
prox <- 2/3

The homology $H_k(X)$ of a point cloud is uninteresting ($H_0(X) = \lvert X \rvert$ and $H_k(X) = 0$ for $k > 0$). The most basic space of interest to the topological data analyst is the union of a ball cover $B_r(X)$ of $X$—a ball of common radius $r$ around each point. The common radius will be $r =$ prox / 2.

The figure below compares the ball cover (left) with the Vietoris (or Rips) complex ${VR}_r(X)$ constructed using the same proximity (right). The complex comprises a simplex at each subset of points having diameter at most prox—that is, each pair of which are within prox of each other. A key result in TDA is that the homology of the ball union is “very close” to that of the complex.

# visualize disks of fixed radii and the Vietoris complex for this proximity
p_d <- ggplot(d, aes(x = x, y = y)) +
  coord_fixed() +
  geom_disk(radius = prox/2, fill = "aquamarine3") +
  geom_point() +
  theme_bw()
p_sc <- ggplot(d, aes(x = x, y = y)) +
  coord_fixed() +
  stat_simplicial_complex(diameter = prox, fill = "darkgoldenrod") +
  theme_bw() +
  theme(legend.position = "none")
# combine the plots
gridExtra::grid.arrange(
  p_d, p_sc,
  layout_matrix = matrix(c(1, 2), nrow = 1)
)

This cover and simplex clearly contain a non-trivial 1-cycle (loop), which makes $H_1(B_r(X)) = H_1({VR}_r(X)) = 1$. But detecting this feature depended crucially on the choice of prox, and there’s no guarantee with new data that this choice will be correct or even that a single best choice exists. Instead, we tend to be interested in considering those features that persist across many values of prox. The GIF below¹ illustrates this point: Observe how features appear and disappear as the disk covers grow:

Persistent homology

Persistent homology (PH) encodes the homology group ranks across the full range $0 \leq r < \infty$, corresponding to the full filtration of simplicial complexes constructed on the point cloud. We use ripserr to compute the PH of the point cloud $X$:

# compute the persistent homology
ph <- ripserr::vietoris_rips(as.matrix(d), dim = 1)
print(ph)
#>    dimension     birth      death
#> 1          0 0.0000000 0.02903148
#> 2          0 0.0000000 0.05579919
#> 3          0 0.0000000 0.05754819
#> 4          0 0.0000000 0.06145429
#> 5          0 0.0000000 0.10973364
#> 6          0 0.0000000 0.11006440
#> 7          0 0.0000000 0.11076601
#> 8          0 0.0000000 0.12968679
#> 9          0 0.0000000 0.14783527
#> 10         0 0.0000000 0.15895889
#> 11         0 0.0000000 0.16171041
#> 12         0 0.0000000 0.16548606
#> 13         0 0.0000000 0.18487403
#> 14         0 0.0000000 0.22148925
#> 15         0 0.0000000 0.22522546
#> 16         0 0.0000000 0.23144975
#> 17         0 0.0000000 0.23833826
#> 18         0 0.0000000 0.24347463
#> 19         0 0.0000000 0.24621173
#> 20         0 0.0000000 0.24877669
#> 21         0 0.0000000 0.25618632
#> 22         0 0.0000000 0.27190520
#> 23         0 0.0000000 0.28831319
#> 24         0 0.0000000 0.30491738
#> 25         0 0.0000000 0.30827205
#> 26         0 0.0000000 0.31391084
#> 27         0 0.0000000 0.32506632
#> 28         0 0.0000000 0.32904677
#> 29         0 0.0000000 0.33786610
#> 30         0 0.0000000 0.34781266
#> 31         0 0.0000000 0.37754559
#> 32         0 0.0000000 0.41597062
#> 33         0 0.0000000 0.43867939
#> 34         0 0.0000000 0.46645722
#> 35         0 0.0000000 0.47084765
#> 36         1 0.6282155 1.39626210

The loop is detected, though we do not yet know whether its persistence stands out from that of other features. To prepare for ggplot(), we convert the result to a data frame and its numeric dimension column to a factor:

pd <- as.data.frame(ph)
pd <- transform(pd, dimension = as.factor(dimension))
head(pd)
#>   dimension birth      death
#> 1         0     0 0.02903148
#> 2         0     0 0.05579919
#> 3         0     0 0.05754819
#> 4         0     0 0.06145429
#> 5         0     0 0.10973364
#> 6         0     0 0.11006440
tail(pd)
#>    dimension     birth     death
#> 31         0 0.0000000 0.3775456
#> 32         0 0.0000000 0.4159706
#> 33         0 0.0000000 0.4386794
#> 34         0 0.0000000 0.4664572
#> 35         0 0.0000000 0.4708476
#> 36         1 0.6282155 1.3962621

Persistence plots

ggtda also provides stat and geom layers for common visualizations of persistence data. We visualize these data using a barcode (left) and a persistence diagram (right). In the barcode, the dashed line indicates the cutoff at the proximity prox; in the persistence diagram, the fundamental box contains the features that are detectable at this cutoff.

# visualize the persistence data, indicating cutoffs at this proximity
p_bc <- ggplot(pd, aes(start = birth, end = death)) +
  geom_barcode(linewidth = 1, aes(color = dimension, linetype = dimension)) +
  labs(x = "Diameter", y = "Homological features",
       color = "Dimension", linetype = "Dimension") +
  geom_vline(xintercept = prox, color = "darkgoldenrod", linetype = "dotted") +
  theme_barcode()
max_prox <- max(pd$death)
p_pd <- ggplot(pd) +
  coord_fixed() +
  stat_persistence(aes(start = birth, end = death,
                       colour = dimension, shape = dimension)) +
  geom_abline(slope = 1) +
  labs(x = "Birth", y = "Death", color = "Dimension", shape = "Dimension") +
  lims(x = c(0, max_prox), y = c(0, max_prox)) +
  geom_fundamental_box(
    t = prox,
    fill = "darkgoldenrod", color = "transparent"
  ) +
  theme_persist()
# combine the plots
gridExtra::grid.arrange(
  p_bc, p_pd,
  layout_matrix = matrix(c(1, 2), nrow = 1)
)

The barcode lines are color- and linetype-coded by feature dimension: the 0-dimensional features, i.e. the gaps between connected components, versus the 1-dimensional feature, i.e. the loop. These groups of lines do not overlap, which means that the loop exists only in the persistence domain where all the data points are part of the same connected component. Our choice of prox is between the birth and death of the loop, which is why the complex above recovers it.

The persistence diagram shows that the loop persists for longer than any of the gaps. This is consistent with the gaps being artifacts of the sampling procedure but the loop being an intrinsic property of the underlying space.

Multiple data sets

TDA usually involves comparisons of topological data between spaces. To illustrate such a comparison, we construct a larger sample and examine the persistence of its cumulative subsets:

# larger point cloud sampled from a noisy circle
set.seed(0)
n <- 180
t <- stats::runif(n = n, min = 0, max = 2*pi)
d <- data.frame(
  x = cos(t) + stats::rnorm(n = n, mean = 0, sd = .2),
  y = sin(t) + stats::rnorm(n = n, mean = 0, sd = .2)
)
# list of cumulative point clouds
ns <- c(12, 36, 60, 180)
dl <- lapply(ns, function(n) d[seq(n), ])

First we construct a nested data frame containing these subsets and plot their Vietoris complexes. (We specify the simplextree engine and restrict to 2-simplices to reduce runtime.)

# formatted as grouped data
dg <- do.call(rbind, dl)
dg$n <- rep(ns, vapply(dl, nrow, 0L))
# faceted plots of cumulative simplicial complexes
ggplot(dg, aes(x, y)) +
  coord_fixed() +
  facet_wrap(facets = vars(n), labeller = label_both) +
  stat_simplicial_complex(
    diameter = prox, dimension_max = 2L,
    engine = "simplextree",
    fill = "darkgoldenrod"
  ) +
  theme_bw() +
  theme(legend.position = "none")

The Vietoris complexes on these subsets for the fixed proximity are not a filtration; instead they show us how increasing the sample affects the detection of homology at that threshold. Notice that, while a cycle exists at $n = 36$, the “true” cycle is only detected at $n = 60$.

We can also conveniently plot the persistence diagrams from all four cumulative subsets, this time using a list-column of data sets passed to the dataset aesthetic:

# nested data frame of samples of different cumulative sizes
ds <- data.frame(n = ns, d = I(dl))
print(ds)
#>     n            d
#> 1  12 c(1.0565....
#> 2  36 c(1.0565....
#> 3  60 c(1.0565....
#> 4 180 c(1.0565....
# faceted plot of persistence diagrams
ggplot(ds, aes(dataset = d)) +
  coord_fixed() +
  facet_wrap(facets = vars(n), labeller = label_both) +
  stat_persistence(aes(colour = after_stat(factor(dimension)),
                       shape = after_stat(factor(dimension)))) +
  geom_abline(slope = 1) +
  labs(x = "Birth", y = "Death", color = "Dimension", shape = "Dimension") +
  lims(x = c(0, max_prox), y = c(0, max_prox)) +
  theme_persist()
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_point()`).

The diagrams reveal that a certain sample is necessary to distinguish bona fide features from noise, as only occurs here at $n = 36$. While the true feature retains about the same persistence (death value less birth value) from diagram to diagram, the persistence of the noise gradually lowers.

Contribute

To contribute to ggtda, you can create issues for any bugs you find or any suggestions you have on the issues page.

If you have a feature in mind you think will be useful for others, you can also fork this repository and create a pull request.

The GIF and many features of ggtda were originally developed in the separate package TDAvis. ↩

ggtda's People

Contributors

Stargazers

Watchers

Forkers

peekxc felix660 gaomath ndinh215 jamesotto852

ggtda's Issues

add real-world persistence data to ggtda

?PH of mosquito records to predict Zika prevalence in Brazil

Add `autoplot()` methods for classes of persistence and other topological data

e.g. fortify_ripserr() for persistence data output by ripserr

make complex parameter case-insensitive

Note: This issue refers to the underway merge of the StatSimplicialComplex and main branches.

I think the simplicial_complex layer should accept any-case spellings of "Cech", "alpha", etc. This can be done using tolower().

I raise it as an issue rather than unilaterally making the change because i don't know whether it makes more sense to apply tolower() in the internal helper function assign_complex_engine or in the ggproto function StatSimplicialComplex$setup_data(). I'd welcome advice.

missing aesthetics documentation

see convo in #21

Warnings with stat_vietoris2 and stat_cech2

The following code creates an equilateral triangle and uses stat_vietoris* and stat_cech* to visualize the faces of the resulting simplicial complex at a diameter of 1, but returns warnings that the computation failed in stat_vietoris2 and stat_cech2 due to argument length of 0. Any thoughts on what's going on, @corybrunson? (using this example in a vignette under cran-prep branch of this repo)

library("ggplot2")
library("ggtda")

# create equilateral triangle point cloud
eq_tri <- data.frame(x = c(0, 1, 0.5),
                     y = c(0, 0, sqrt(3)/2))

# VR complex
ggplot(eq_tri, aes(x = x, y = y)) +
  coord_fixed() +
  stat_vietoris2(diameter = 2, fill = "darkgoldenrod", alpha = 0.1) +
  stat_vietoris1(diameter = 2, alpha = 0.25) +
  stat_vietoris0() +
  theme_minimal()

# Čech complex
ggplot(eq_tri, aes(x = x, y = y)) +
  coord_fixed() +
  stat_cech2(diameter = 2) +
  stat_cech1(diameter = 2, alpha = 0.25) +
  stat_cech0() +
  theme_minimal()

drop theme_tda

While reviewing the exported objects, i noticed that theme_persist() is a copy of theme_tda(). Since there will eventually be many more plot layers, some of which may be better suited to theme elements different from those of persistence diagrams, i would suggest dropping the generically-named theme_tda() in favor of defining only more precise themes as needed.

If agreed to, it would be good to do this as part of #21, to avoid future deprecation.

shading for persistence diagram

could we consider adding the option to shade the bottom-right triangular half of persistence diagrams? (example image above)

vignettes

add vignettes for new users; tentatively the following ones:

introduction: basic use of most/all visualizations
detailed: one detailed vignette for each group of geoms/stats (e.g. 1 for simplicial complex visualizations, 1 for persistence homology, etc.)

R 3.6.2 messes up stat_disk drawing

Hey @corybrunson, just updated R to 3.6.2 and using stat_disk results in improperly rendered circles (below is example from ggtda GitHub README); possible that this is because of how circles are rendered as polygons instead? (geom_point from ggplot2 seems to be working okay)

Question: shiny module support warranted?

Shiny modules seem to provide the most supported solution of encapsulating blocks of R code geared for building shiny apps quicker (another example article).

I was wondering if this package would be interested in incorporating some degree of modularization?

As an example, I forked this repo and added a few example shiny modules. Collectively, they create a set of UI buttons for choosing between persistence data plot types. There's also a buttons for doing things like adding the frontier, filtering by some noise threshold, choosing between H0, H1, etc., automatic reactive callbacks for detecting the nearest feature a user clicked, etc.

An example shiny app, also shown below, demonstrates the simplification modularization enables:

library(shiny)
library(ggtda)
library(TDAstats)

## Example from TDAstats calculate_homology function help page
library(TDAstats)
data("circle2d") # unit circle (Betti-1 number = 1)
circ.phom <- as.data.frame(calculate_homology(circle2d))
circ.phom$dimension <- as.factor(circ.phom$dimension)

## assume persistence data is named 'dgm' 
dgm <- circ.phom

ui <- fluidPage(
  titlePanel("Persistence shiny plot demo"), 
  sidebarLayout(
    sidebarPanel(
      dmg_select_ui(id = "select_pers"), ## selects the type of persistence plot to show
      dgm_var_UI(id = "var_pers")        ## parameters to tweak like e.g. noise threshold 
    ), 
    mainPanel(
      linkedPersistenceOutput("pers_plot"), 
      verbatimTextOutput("text_out")
    )
  )
)

server <- function(input, output, session){
  ## Get the type of plot to interact with  
  dgm_selection <- callModule(dgm_select, "select_pers")
  
  ## Get the selected parameters associated with the UI elements
  dgm_vars <- callModule(dgm_var, "var_pers", dgm)
  
  ## Create the persistence plot
  res <- callModule(linkedPersistence, "pers_plot", dgm, dgm_selection, dgm_vars)
  
  ## Show the index of the selected feature
  output$text_out <- renderText({ res$selected_feature_idx() })
}
shinyApp(ui, server)

The modules aren't entirely meant for the package as-is. I'm sure there's lots of different ways to improve them, just wondering the scope of this package, since I plan to do this otherwise with some of my other packages.

Implement visualizing extended persistence

Potentially useful reference here

plot 0-simplices only when radius or diameter is too small

Note: This issue refers to the underway merge of the StatSimplicialComplex and main branches.

Currently, not only does the simplicial_complex layer not use a default radius or diameter, it fails when either fails to obtain at least one 1-simplex (as in the examples below, which cannot be reprexed because the branch merge is not complete). Instead, this should result in the 0-simplices being plotted, equivalently to using the point layer. Moreover, i suggest that the default be set to 0, perhaps with a warning to the user that a positive radius or diameter must be specified in order for higher-dimensional simplices to be plotted.

# create annulus point cloud
t <- runif(n = 60, min = 0, max = 2*pi)
ann <- data.frame(
  x = cos(t) + rnorm(n = 60L, mean = 0, sd = .1),
  y = sin(t) + rnorm(n = 60L, mean = 0, sd = .1)
)
# plot 0-simplices using the `point` layer
ggplot(ann, aes(x = x, y = y)) +
  coord_fixed() +
  geom_point()
# try to plot 0-simplices using the `simplicial_complex` layer
ggplot(ann, aes(x = x, y = y)) +
  coord_fixed() +
  stat_simplicial_complex(radius = 0)
# try again with a too-small radius
ggplot(ann, aes(x = x, y = y)) +
  coord_fixed() +
  stat_simplicial_complex(radius = 0.01)
# succeed only with a sufficiently large radius
ggplot(ann, aes(x = x, y = y)) +
  coord_fixed() +
  stat_simplicial_complex(radius = 0.1)

Bug in `StatLandscape$compute_group`

While working on the refactor from #55 I discovered what I believe to be a bug in the current implementation of the StatLandscape$compute_group method. When plotting the persistence landscape for a slightly modified version of the toy data used in our examples, one of the frontiers has two incorrect peaks. In the reprex below we see that the third frontier function has peaks at (1, 3) and (4, 5). However, there are no (birth, death) values corresponding to these peaks -- I think it's grabbing these from the previous frontier.

I'm going to fix this in the branch I'm working on, just wanted to bring attention to it in case we end up not merging the refactor to main.

library("ggtda")
#> Loading required package: rlang
#> Loading required package: ggplot2

# toy example
toy.data <- data.frame(
  birth = c(0, 0, 1, 3, 4, 1.5),
  death = c(5, 3, 5, 4, 6, 4),
  dim = factor(c(0, 0, 0, 0, 0, 0))
  # dim = factor(c(0, 0, 1, 1, 2, 2))
)

# persistence diagram with landscape overlaid
ggplot(toy.data,
       aes(start = birth, end = death, shape = dim)) +
  theme_persist() +
  coord_equal() +
  stat_persistence() +
  stat_landscape(aes(color = after_stat(factor(level, 1:4))), n_levels = 4, diagram = "diagonal") +
  lims(x = c(0, 8), y = c(0, NA)) +
  guides(alpha = "none")

^{Created on 2024-05-05 by the reprex package (v2.0.1)}

fixed-radius disk geom

The geometric construction underpinning persistent homology is usefully illustrated by drawing disks of fixed radius around points in a Euclidean point cloud. Since this illustrative use does not require a variable radius, and since the x0 and y0 aesthetics of ggforce::geom_circle() can create confusion when used in tandem with x and y aesthetics, let's create a new geom_disk() layer for this purpose. By default, it should render disks (interiors) but not circles (boundaries), i.e. its defaults should include fill = <favorite color> and colour = NA.

Implement persistent terraces

Introduced in this arXiv article

prevent inconsistency in diagram type across layers

A major problem with the diagram parameter is that different values can be applied to different layers, which results in nonsense. Instead, new coordinate system ggproto classes should be used, perhaps CoordFlat and CoordLandscape, which would transform the entire plot.

computed variable 'dim' should be made factor (not ordinal) at end of statistical transformation

Note: This issue refers to the underway merge of the StatSimplicialComplex and main branches.

The computed variable 'dim' is assigned ordered(...) in several places, which seems to result in them occasionally being combined. This results in a warning, "Combining variables of class and was deprecated
in ggplot2 3.4.0." This suggests to me that the variable should be returned as an unordered factor (which still preserves the order of the values).

Even then, however, there is a possibility of unexpected behavior, since the smallest dimension may not always be 2—whether the combination is being done in base R or in {vctrs} (see example below).

I agree that the stat should output the variable as a factor rather than an integer, but it should be possible to store it as an integer throughout the computations and convert it to a factor just before returning.

c(factor(3:5),factor(1:3))
#> [1] 3 4 5 1 2 3
#> Levels: 3 4 5 1 2
vctrs::vec_c(factor(3:5),factor(1:3))
#> [1] 3 4 5 1 2 3
#> Levels: 3 4 5 1 2

^{Created on 2023-12-30 with reprex v2.0.2}

implement new plot layers

This issue consolidates #17, #18, and #53 and adds additional layers of interest. They are listed in perceived order of difficulty.

graphical element for fundamental boxes

The authors of this preprint define the fundamental box for any real value t. Ignoring multiplicities, it is the region of birth–death pairs (b,d) defined by b_≤_t<d; it can be thought of as the space of features whose lifetimes include t. Like the colleague who introduced me to it, i think i've seen it before under a different name.

It would be natural to highlight this region in certain plots, as the authors do in the preprint, using bounding rays and shading within the limits of the plotting window. It should be easy to implement as a new graphical element (e.g. geom_fundamental_box()) with a single additional non-aesthetic parameter t that also requires the same diagram parameter used by stat_persistence().

Duplicate barcodes not rendered

I think there's an issue with the barcode geom when two given bars are not unique. I have 28 barcodes, however 3 of them have exactly the same birth/death times, so only 25 get rendered.

Here's an example with the embedded toy data I was working with:

## Generated data 
dim <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1)
birth <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.213226,0.213225)
death <- c(0.00718,0.008658,0.00866,0.033702,0.033704,0.041683,0.041683,0.049228,0.049229,0.103626,0.103626,0.108313,0.108314,0.116322,0.116323,0.122399,0.1224,0.145653,0.145654,0.169471,0.169472,0.207892,0.207892,0.208636,0.208637,Inf,Inf,Inf)
bar <- data.frame(birth=birth, death=death, dim=dim)

## Plot barcodes w/ ggtda
library("ggtda")
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 3.5.2
barcodes <- data.frame(appear=bar$birth, disappear=bar$death, dim=as.character(bar$dim))
p1 <- ggplot(barcodes, aes(start = appear, end = disappear, colour = dim, shape = dim)) +
      geom_barcode() +
      theme_tda()

## Working linerange w/ id_ dimension
barcodes_id <- cbind(id_ = seq(length(bar$birth)), barcodes)
p2 <- ggplot(barcodes_id) + geom_linerange(aes(x = id_, ymin = appear, ymax=disappear, colour = dim))

^{Created on 2019-05-16 by the reprex package (v0.3.0)}

If you inspect p1 vs p2, you'll see that p1 plots 25 bars while all 28 get plotted w/ p2.

Implement persistence images

Introduced in this JMLR article

set default max_dimension to 2

Note: This issue refers to the underway merge of the StatSimplicialComplex and main branches.

While experimenting with examples in the old vignette using the new simplicial_complex layer, it became clear quickly that unexpectedly long runtimes can result from innocuous-looking code because Vietoris complexes accrue large numbers of high-dimensional simplices. Since (a) i think the vast majority of illustrations only bother using one layer of shading and (b) for quick-and-dirty pictures this should suffice anyway, i suggest that we change max_dimension = 10 to max_dimension = 2L. This will prevent long runtimes while without allowing the user to believe that all dimensions have been layered on (as might be wrongly suggested by setting max_dimension = 3L).

finish persistence landscape implementation

basic feature of persistence landscape (stat_persistence(diagram = "landscape")) has been implemented, following would help complete implementation:

inclusion of diagonal lines within stat/geom to mimic a landscape?
example added to README
use described in more detail in a ggtda vignette

Edit:

add multiple persistence landscapes (exported function + shortcut stat)

nearest neighbors geom (and nearest neighbors option for disk geom)

To my knowledge, no {ggplot2} extension provides a statistical transformation layer to compute the nearest neighbors (NN) of (a subset of) the points in a 2-D cloud. For the following reasons, i think this should be added to {ggtda}:

NN selection is a topological operation, so NN prediction is a topological tool (it is "within scope").
A primary purpose of this package is to illustrate topological tools on 2-D data (it would fit in with existing functionality).
NN engines might support future illustrative layers like neighborhood-based disks and witness complexes or even computational layers like witness complex--based persistent homology (it would support additional more obvious extensions).

I think this can wait until after the first CRAN submission, so i raise the issue as a reminder and to invite feedback.

first CRAN submission

We are close to our first submission to CRAN. I'd like to check with everyone on a few points first:

Does the package look complete, with respect to tests, examples, and vignettes?
Are there any open issues that you think should be resolved before submission?
Do you have any other concerns that should be discussed first?

Thanks! I'll tentatively plan to submit at the end of April, but we can take what time we need to address anything that comes up.

implement Betti curves

Betti curves are known to be unstable but are still commonly used to summarize patterns.

It might also make sense to provide the stabilized Betti curve of Johnson and Jung.

BibTeX file for vignette references

The vignettes include a great deal of front matter, the vast majority of it bibliographic. To make reading and editing the source code easier, would it make sense to maintain a single BibTeX (.bib) file with all references, which can then be cited in the vignettes? I haven't done this before so it may incur new issues, but i'd be glad to give it a try on a new branch. My point of reference is this chapter in the R Markdown Cookbook.

convenience geom layers

The persistence stat layers are currently paired by default to basic geom layers whose aesthetics are not ideal. We can create some convenience geom layers that simply use better default aesthetics but otherwise pass to the basic geoms. Start with an illustrative geom_disk() to pair with stat_disk().

tdaverse / ggtda Goto Github PK

ggtda's Introduction

ggtda

Overview

Installation

Example

Sample data set

Topological constructions

Persistent homology

Persistence plots

Multiple data sets

Contribute

Footnotes

ggtda's People

Contributors

Stargazers

Watchers

Forkers

ggtda's Issues

Recommend Projects

Recommend Topics

Recommend Org