Git Product home page Git Product logo

Comments (8)

bwiernik avatar bwiernik commented on June 8, 2024

Neat! Looks straightforward!

from correlation.

mattansb avatar mattansb commented on June 8, 2024

From here

ksaai <- function(X, Y, ties = TRUE){
  n <- length(X)
  r <- rank(Y[order(X)], ties.method = "random")
  set.seed(42)
  if(ties){
    l <- rank(Y[order(X)], ties.method = "max")
    return( 1 - n*sum( abs(r[-1] - r[-n]) ) / (2*sum(l*(n - l))) )
  } else {
    return( 1 - 3 * sum( abs(r[-1] - r[-n]) ) / (n^2 - 1) )    
  }
}

I don't like that it's not symmetrical - shouldn't correlation coefficients be symmetrical?

x <- rnorm(100, sd = 4)
y <- sin(x) + rnorm(100, sd = 0.2)

plot(x, y)

ksaai(x, y)
#> [1] 0.6306631
ksaai(y, x)
#> [1] -0.1710171

Also the maximal value isn't 1 and seems to depend on the sample size?

z10 <- runif(10)
z100 <- runif(100)
z1000 <- runif(1000)

ksaai(z10, z10)
#> [1] 0.7272727
ksaai(z100, z100)
#> [1] 0.970297
ksaai(z1000, z1000)
#> [1] 0.997003

Created on 2024-04-14 with reprex v2.1.0

from correlation.

vincentarelbundock avatar vincentarelbundock commented on June 8, 2024

Your note about sample size is presumably what he means by "converges to a limit" in point 4 of the screenshot in my original post. Since there's theory to provide confidence intervals, maybe that's not a big deal? Maybe even good?

And on symmetry:

(1) Unlike most coefficients, ξn is not symmetric in X and Y .
But that is intentional. We would like to keep it that way because we may
want to understand if Y is a function X, and not just if one of the variables
is a function of the other. If we want to understand whether X is a function
of Y , we should use ξn(Y, X) instead of ξn(X, Y ). A symmetric measure
of dependence, if required, can be easily obtained by taking the maximum
of ξn(X, Y ) and ξn(Y, X).

from correlation.

mattansb avatar mattansb commented on June 8, 2024

Cool (👍

  1. I don't see any mention of a confidence interval - should we just use Fisher's Z?
  2. In theory, xi is non-negative, but it sometimes is - should we return 0 in such cases?

from correlation.

vincentarelbundock avatar vincentarelbundock commented on June 8, 2024

I don’t see any mention of a confidence interval

Sorry, I misread about the CI. The XICOR package does provide a SD, but it feels wrong to just compute a symmetric interval using that.

should we just use Fisher’s Z?

I’ve only really skimmed the paper, and don’t truly understand it. Until I grok this better (realistically: never), I would be reticent to report a quantity not explicitly endorsed by the author.

In theory, xi is non-negative, but it sometimes is - should we return 0 in such cases?

“In the limit” != “In theory”. I’d say report the actual output of the equation, rather than an ad hoc hack.

I ran into some errors with your ksaai() function with large N. However, the paper authors have published a XICOR package on CRAN. It seems fast and is published under Apache License which, I believe, is compatible with GPL3.

library(XICOR)
N <- 100
x <- rnorm(N, sd = 4)
y <- sin(x) + rnorm(N, sd = 0.2)
xicor(y, x, pvalue = TRUE)

    $xi
    [1] 0.03840384

    $sd
    [1] 0.06325978

    $pval
    [1] 0.2718984

from correlation.

mattansb avatar mattansb commented on June 8, 2024

In theory == I mean the estimand is non-positive.

I'll run some simulations to see if the Fisher Z CIs work well enough.

from correlation.

bwiernik avatar bwiernik commented on June 8, 2024

The author did a small simulation in section 4.2 and concluded that sqrt(n) * xi is asymptomatically normal (when n = 1000). That's not unexpected, but also not very helpful for more realistic sample sizes.

The author's XICOR package defaults to using the specified mean and SD values with a normal distribution. They also offer a permutation test.

I'd be okay with reporting normal-theory intervals and p values to start given that's what the author does, but we should ideally do some simulations to confirm good performance of the intervals at smaller n (or use a z transform if that works nicely).

I don't compare the code above from the blog post and the XICOR package to be sure they aligned, but we should follow XICOR https://github.com/cran/XICOR/blob/master/R/xicor.R

from correlation.

TarandeepKang avatar TarandeepKang commented on June 8, 2024

Hi All,

Just to mention that this preprint has now been published:

Chatterjee, S. (2021). A New Coefficient of Correlation. Journal of the American Statistical Association, 116(536), 2009–2022. https://doi.org/10.1080/01621459.2020.1758115

from correlation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.