Git Product home page Git Product logo

acd_stats's People

Contributors

wbreeze avatar

Watchers

 avatar  avatar  avatar

acd_stats's Issues

Hangs on clustering

The following data given to the clustering call, prechi.cluster_neighbors causes the cluster process to run long with no sign of termination:

$range
[1] 42 47 52 57 62 67 72 77 82
$counts
[1] 1 0 0 0 0 0 0 0 0
$grades
 [1] 79 70 84 74 79 70 70 42 84 76 76 84 76 76

First, the counts aren't correct given the grades. (The ranges are correct.) Second, the cluster algorithm ought to spit that out as unsolvable more or less immediately.

The contest that raised this is #652, which is odd in that it has grades on tenths, not rounded to 0.5. This contest and contest #641 were manually added to the processed list in order to omit them.

Allow prechi to accept minimum count of partitions

The prechi algorithm currently fixes the minimum number of partitions at three. The ChiSq test we are using has n-3 degrees of freedom. Fixing three means doing tests with 3-3=0 degrees of freedom.

Parameterize the fixed number three, so that we can pass four or five and not be doing meaningless ChiSq tests. This will mean fewer distributions that can be tested with ChiSq; however, the tests aren't really useful with so few partitions.

prechi too slow

Prechi does not terminate in a reasonable time given:

2, 0, 0, 0, 3, 0, 0, 0, 2, 3, 5, 1, 4, 2, 8

(on intervals of five from twenty up through ninety)

It arrives at 5, 5, 6, 6, 8 quite quickly, but then goes into forever trying the large number of unproductive combinations. "Forever" is hours or days. Haven't seen it terminate. Another solution is 5, 5, 5, 7, 8; however I think that's worse than the one first found.

There are two things that can be done:

  1. strengthen the lower bound - half the number of items with value less than the target count (5, rounded up) can be added to the current solution length. That will cut-back all of the bad starts that occur after having made a good start with the heuristic.
  2. strengthen the heuristic - after sorting by increasing value, add a secondary sort for the minimum sum after join. I'm not sure how much that will help, but it will ensure that the strings of zeros get merged together first.

If those don't help, I'll have to think of allowing it to be heuristic but not optimal.

prechi runs of zeros

The data contains many distributions with large runs of zero counts. This makes for an explosion of fruitless combinations. The prechi algorithm is timing-out frequently.

A zero run can be merged with the left or right neighbor (there are always both), whichever is smaller. Add a preprocessing step which does this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.