Git Product home page Git Product logo

Comments (17)

jdramsey avatar jdramsey commented on August 15, 2024

First FastICA needs to be recoded and recalibrated. Then we need to get Matlab code for LiNGAM and calibrate a Java version of that to the Matlab code. We need an actual unit test of this so it doesn't break.

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

Kun has a better idea.

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

OK, I am looking at LiNGAM. After some work I'm getting pretty good results orienting a 3-variable collider model, which wasn't happening before. I went back to the original paper and once again tried to put the code in line with that paper. I think I did a better job this time. This is a trace. X2 is the child; the order of the variables here is not randomized; I'll try that next.

Here is the paper.

shimizu06a.pdf

Here is the trace (for the algorithm in Section 4, algorithm A:

Simulating dataset #1
W =
V1 V2 V3
-0.9724 0.0972 -0.4970
-0.5890 -0.0349 0.9793
-0.4109 1.1341 0.7451

** 30.992942547104565 [0, 1, 2]
** 2.9312993708222947 [0, 2, 1]
perm1 = [0, 2, 1]
WTilde before normalization =
V1 V2 V3
-0.9724 0.0972 -0.4970
-0.4109 1.1341 0.7451
-0.5890 -0.0349 0.9793

WPrime =
V1 V2 V3
1.0000 -0.1000 0.5111
-0.3623 1.0000 0.6570
-0.6014 -0.0357 1.0000

BHat =
V1 V2 V3
0.1000 -0.5111
0.3623 -0.6570
0.6014 0.0357

*** 0.4942973056906238 [0, 1, 2]
*** 0.924698559828673 [0, 2, 1]
BTilde =
V1 V2 V3
-0.5111 0.1000
0.6014 0.0357
0.3623 -0.6570

perm2 = [0, 2, 1]
[X1, X2, X3]
/knowledge
addtemporal

1 X1
2 X3
3 X2

forbiddirect

requiredirect
graph Returning this graph: Graph Nodes:
X1;X2;X3

Graph Edges:

  1. X1 --> X2
  2. X3 --> X2

Graph Attributes:
BIC: 698.522617

Graph Node Attributes:
BIC: [X1: 200.177757;X2: 238.222708;X3: 260.122153]

I found some index errors and other such silly errors in the code. (Sorry this was coded very badly originally, not by me. I had fixed a lot but not everything apparently.) As you can see, if W is right (that comes from ICA), WTilde is doing the right thing (permuting the rows of W so that the biggest guys are along the diagonal), WPrime is right (normalizing each row by its diagonal element), BHat is right (I - WPrime), and BTilde is right (permuting both rows and column s of BHat so that the elements in the lower triangle squared is largest). So that part of the algorithm is right.

So then it comes down to ICA. The problem with ICA is that it's representing data internally in the "European" way, with variables as rows and cases as columns, and Tetrad has that flipped. Some transpositions were done internally to try to adjust for this. I took all of those out, so it just does the European idea. I'll document that. So to use it you need to first transpose the data, which is fine; transposing doesn't make a copy of the data, so it's not expensive. So now all the matrices should be in the right orientation.

I just tried scrambling the order of the variables in the dataset; it still has pretty good signal but does screw up more often then when the variables are unscrambled. I think that's a problem with ICA. I'll try to think if there's any bias in ICA.

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

Current performance, compared to FASK.

N = 1000
10 nodes, 10 edges, DAG
Linear Fisher simulation using non-Gaussian errors.
Columns in data set randomly reordered.

For ICA:

Max iterations = 1000
Deflation algorithm
Log cosh function
Tolerance 1e-6.

Algorithms:

  1. LiNGAM (Linear Non-Gaussian Acyclic Model
  2. FASK using Fisher Z test

Graphs are being compared to the true DAG.

AVERAGE STATISTICS

All edges

Sim Alg sampleSize penaltyDiscount alpha thresholdAlpha AP AR AHP AHR F1All ExactlyRight E
1 1 1000.00 1.00 * * 0.92 1.00 0.86 0.93 0.80 0.30 2.14
1 2 1000.00 * 1.00E-3 * 1.00 1.00 0.99 0.99 0.99 0.90 0.01

There's clearly a signal for LiNGAM; it does pretty well. But it's much, much slower, and this is only 10 nodes. It probably can't go much more than 12 nodes. Also, it's not as accurate as FASK.

@cg09

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

New clue. Raising coefficient range to +/-(.5, 1.2). Now (our) LiNGAM and FASK are very close in performance. So (our) LiNGAM just doesn't do well with smaller coefficients. (Or, FASK does worse with larger coefficients.)

10 nodes, 10 edges, random DAG
10 runs averaged
Linear Fisher simulation using non-Gaussian errors.
Columns in data set randomly reordered.

For ICA:

Max iterations = 1000
Deflation algorithm
Log cosh function
Tolerance 1e-6.

Algorithms:

  1. LiNGAM (Linear Non-Gaussian Acyclic Model
  2. FASK using Fisher Z test

Graphs are being compared to the true DAG.

AVERAGE STATISTICS

All edges

Sim Alg sampleSize penaltyDiscount alpha thresholdAlpha AP AR AHP AHR E
1 1 1000.00 1.00 * * 0.85 1.00 0.74 0.86 2.40
1 2 1000.00 * 1.00E-3 * 0.94 0.91 0.75 0.73 0.01

from tetrad.

cg09 avatar cg09 commented on August 15, 2024

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

Here's an improvement. Same conditions; I've added RSkew (which does terribly here, not sure why) and R3 (which soars):

Algorithms:

  1. LiNGAM (Linear Non-Gaussian Acyclic Model
  2. R3, entropy based pairwise orientation with initial graph from Fast adjacency search (FAS) using Fisher Z test
  3. RSkew with initial graph from Fast adjacency search (FAS) using Fisher Z test
  4. FASK using Fisher Z test

Graphs are being compared to the true DAG.

AVERAGE STATISTICS

All edges

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.94  1.00  0.92  0.97     1.66
    1    2                *      *  0.97  0.94  0.97  0.94     0.04
    1    3                *      *  0.97  0.94  0.51  0.49  7.00E-3
    1    4                *   0.01  0.93  0.97  0.91  0.95  8.30E-3

Here, LiNGAM does pretty well. I've run this several times and it does pretty well each time. 10 odes, 10 edges, N = 1000, variables randomly reordered each time, coefficients in +/-(0.2, 0.7), average of 10 runs, so the same problem as before.

I'm not sure what all is different, but I did change a detail in the data generating process. Before, standard deviations were being chosen (erroneously) for each sample, randomly. Now I pick a standard deviation and (correctly) apply it to the entire column. I've also played with various ways of doing the LiNGAM procedure--e.g. minimizing the upper triangle versus maximizing the lower triangle (they should be the same!). I've played with the quantity being minimized or maximized, though the suggested c^2 seems to work the best.

Sometimes LiNGAM does as well as FASK here, but usually it's a little off still. It's possible this is just a problem with running FASK with the 10 node 10 edge regime; FASK usually more than a couple of edges wrong in this case. The order is being fixed, but maybe these mistakes are within the causal order. Maybe the original LiNGAM adjacency search should be explored.

Still, FASK beats LiNGAM to an extent on accuracy but handily on elapsed time.

But--yay! Progress!

Here are some more runs:

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.94  1.00  0.89  0.94     1.57
    1    2                *      *  1.00  0.99  1.00  0.99     0.04
    1    3                *      *  1.00  0.99  0.48  0.47  8.80E-3
    1    4                *   0.01  0.94  0.99  0.93  0.98  8.00E-3

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.95  0.99  0.92  0.94     1.58
    1    2                *      *  0.99  0.95  0.99  0.95     0.04
    1    3                *      *  0.99  0.95  0.44  0.42  7.00E-3
    1    4                *   0.01  0.95  0.95  0.88  0.89  7.60E-3

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.91  0.99  0.82  0.89     1.61
    1    2                *      *  0.98  0.97  0.98  0.97     0.04
    1    3                *      *  0.98  0.97  0.44  0.44  7.30E-3
    1    4                *   0.01  0.91  0.97  0.89  0.95  8.50E-3

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.95  1.00  0.91  0.96     1.56
    1    2                *      *  1.00  0.98  1.00  0.98     0.04
    1    3                *      *  1.00  0.98  0.51  0.50  7.00E-3
    1    4                *   0.01  0.98  0.98  0.98  0.98  8.60E-3

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

This is the ICA book, in case anyone wants to spend some delightful hours reading by the fire.

https://www.cs.helsinki.fi/u/ahyvarin/papers/bookfinal_ICA.pdf

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

Update. I discovered the alpha ("a") parameter for Fast ICA! I knew if was there and that it took values between 1 and 2, and I tried 1 and I tried 2, unsuccessfully, but I didn't try 1.1! Here is 1.1:

AVERAGE STATISTICS

All edges

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.97  1.00  0.95  0.97     1.67
    1    2                *      *  0.99  1.00  0.99  1.00     0.04
    1    3                *      *  0.99  1.00  0.50  0.50  7.20E-3
    1    4                *   0.01  0.96  1.00  0.94  0.98  8.50E-3

This is stunning! Details. This is N = 1000 as usual, with 10 nodes, 10 edges, DAG, with variables randomly reordered, alpha = 1.1. I tried several runs of this; it almost always comes out great. I can try some more values and come up with a good default for that.

Caveats: (a) This is with the Log Cosh function (tahn??). Using the exp formula in the book decimates accuracy. (b) This is with the deflationary procedure; the parallel procedure needs to be cleaned up--in any case, currently it's way inferior.

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

Sorry, the algorithms for that last table were the same as before. Here it is again with the algorithms:

Algorithms:

  1. LiNGAM (Linear Non-Gaussian Acyclic Model
  2. R3, entropy based pairwise orientation with initial graph from Fast adjacency search (FAS) using Fisher Z test
  3. RSkew with initial graph from Fast adjacency search (FAS) using Fisher Z test
  4. FASK using Fisher Z test

Graphs are being compared to the true DAG.

AVERAGE STATISTICS

All edges

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.97  1.00  0.95  0.97     1.67
    1    2                *      *  0.99  1.00  0.99  1.00     0.04
    1    3                *      *  0.99  1.00  0.50  0.50  7.20E-3
    1    4                *   0.01  0.96  1.00  0.94  0.98  8.50E-3

So now LiNGAM, FASK, and R3 are pretty much on par (Yay!!!). RSkew sucks (Boo!). Not sure why.

from tetrad.

cg09 avatar cg09 commented on August 15, 2024

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

Update: The EXP function (g(x) = y exp(-(y^2) / 2)) now works as promised, at least this once. I have:

(a) Taken the alpha out of the formula, as suggested by the book.

(b) Increased the maximum number of iterations to 1500.

(c) Decreased the tolerance to 1e-4. (It was 1e-2.)

Same setup as last report. N = 1000 as usual, with 10 nodes, 10 edges, DAG, with variables randomly reordered, alpha = 1.1.

Algorithms:

  1. LiNGAM (Linear Non-Gaussian Acyclic Model
  2. R3, entropy based pairwise orientation with initial graph from Fast adjacency search (FAS) using Fisher Z test
  3. RSkew with initial graph from Fast adjacency search (FAS) using Fisher Z test
  4. FASK using Fisher Z test

Graphs are being compared to the true DAG.

AVERAGE STATISTICS

All edges

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.98  0.98  0.94  0.94     1.67
    1    2                *      *  1.00  0.94  1.00  0.94     0.04
    1    3                *      *  1.00  0.94  0.54  0.52  5.90E-3
    1    4                *   0.01  0.99  0.96  0.99  0.96  7.10E-3

This is stunning (as usual; I'm getting used to being stunned.) EXP has to date been recalcitrant.

I will expose these parameters when I get a chance.

More runs:

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.95  1.00  0.90  0.95     1.65
    1    2                *      *  0.98  0.99  0.98  0.99     0.04
    1    3                *      *  0.98  0.99  0.48  0.49  7.80E-3
    1    4                *   0.01  0.91  0.99  0.91  0.99  9.20E-3

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.97  1.00  0.95  0.98     1.66
    1    2                *      *  0.98  0.96  0.98  0.96     0.04
    1    3                *      *  0.98  0.96  0.59  0.58  7.20E-3
    1    4                *   0.01  0.96  0.98  0.94  0.95  7.60E-3

This looks reliable.

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

Update: The PARALLEL code for the log cosh case is now working also, so I got rid of the code for the exp case and just used the g function in the book (which switches between "log cosh" (tanh(y)) and "exp" (y exp(-y^2/2)), so covers both cases).

Here's the log cosh case:

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.99  1.00  0.99  1.00     1.48
    1    2                *      *  0.99  0.93  0.99  0.93     0.04
    1    3                *      *  0.99  0.93  0.53  0.50  8.40E-3
    1    4                *   0.01  0.91  0.97  0.89  0.95  8.50E-3

Here's the "exp case:

All edges

  Sim  Alg  penaltyDiscount  alpha    AP    AR   AHP   AHR        E
    1    1             1.00      *  0.97  1.00  0.97  1.00     1.47
    1    2                *      *  0.97  0.96  0.97  0.96     0.04
    1    3                *      *  0.97  0.96  0.53  0.52  6.70E-3
    1    4                *   0.01  0.91  0.96  0.90  0.95  8.40E-3

As usual, LiNGAM is the first algorithm. So these are working now, blessing of blessing. This means all of the options for this implementation of ICA are now working. Nothing was done here except to adjust the parameters as indicated earlier. Tolerance is 1e-6 (I guess that's a little lower); max iterations is 2000; a is 1.1, as before.

I did make a change to LiNGAM. There are two optimizations by evil permutation. The first tries to maximizing the absolute coefficients along the diagonal by permuting the rows of W. That was being done by minimizing the sum of 1 / abs(c) for diagonal elements c. I just directly maximize the sum of |c|. The second optimization tries to make the upper triangle as small as possible. I just maximize again the sum of |c| for lower triangular elements. This seems to work better. So far I see no point to doing it the original way. I might change my mind on that; it's easy to change up.

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

OK, endgame for this. We have a choice of deflation or parallel, and we have a choice of log cosh (tanh) or exp. Just playing around with it, with a low tolerance parallel with exp seems (??) to be just about perfect, so I'll choose that. That also has the advantage of taking the alpha parameter out of the interface so the use doesn't have to tune that. I am configuring the algorithm for these choices.

I think it's fine to expose the max iterations and tolerance parameters to the users; they have good defaults, but the user may want to adjust those.

That leaves the penalty discount parameter. That's being exposed because we're using FGES for the pruning step. It seems to work fine. It works so well that I'm not going to code up the original pruning step unless someone complains for principled reasons.

I think that's it. LiNGAM is now fixed, SFAIK.

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

I'm treating the issue with RSkew as a separate, unconnected issue. @cg09 In any case, RSkew hasn't changed. I've noticed it is somewhat less general than other non-Gaussian algorithms in any case.

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

Also, @cg09 I don't think the above low accuracies with LiNGAM (which are fixed I think) had to do with low sample size. @cg09 When you said that, the first thing I tried was to raise the sample size; it didn't work. I think there were some coding mistakes, and the parameter choices were unfortunate.

from tetrad.

jdramsey avatar jdramsey commented on August 15, 2024

Yay, we can finally close this issue! It's in development and has been checked!

from tetrad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.