Git Product home page Git Product logo

topotext's People

Contributors

adolenc avatar ptrus avatar rokivansek avatar

Watchers

 avatar  avatar  avatar

topotext's Issues

Check distances between two diagrams

Use bottleneck distance to compute distance between two diagrams (in all dimensions).

Also apply clustering to check if persistence diagrams make sense?

Collect data

Put the data in folder ./data/<folder_name>

Names specified below:

  • Sports articles - "sport"
  • Movie reviews - "movie"
  • Science articles abstracts - "abstracts"

Computing R for alpha_shapes in 3d

I am having serious trouble figuring out how .data[0] component for 3d alpha_shapes is computed and what its maximum value could be. This then messes up our division of the interval on 10 equal parts and produces a handful of erroneously large birth/death pairs per each domain.

You can play around with this by changing line 14 in main.py to cx_method = alpha_shapes, which will then print out matrix of points (PCA'd X) on which it is building alpha_shapes, and all the ''bad'' simplices along with the square root of their data[0] and the estimated maximum R for this domain. Note that the number of such simplices is very small (so we could in the worst case just ignore them). Note also, that changing line 15 to dims = 2 now works correctly for alpha_shapes.

Any help is appreciated.

Reference: http://www.mrzv.org/software/dionysus/python/alphashapes.html#alphashapes

Create a toy example on which our method sholud work.

Our method doesn't work even on linearly separable iris classes. But that doesnt necessary mean anything is wrong, as we are comparing the structures of examples of both classes individually. We should create a custom set of points, which would have a clear difference in its structure within classes (e.g. samples from one class would lie in a circle, and samples from the other in a straight line)

Plotting diagrams

Write code to draw plots of both persistence diagrams and bar diagrams.

Implement pca and alpha shapes in 2(3) dimensions

Since implementations of alpha shapes in higher than 3 dimensions are rare/non-existent implement PCA technique to project features of texts into 2(3) dimensions and then build aplha-shapes on top of that.

Use sklearn, dyonisis...

Try using different R

Instead of defining R as max distance between two points in domain, define it as max distance in filtration.

Make generic preprocessor

Preprocessor should accept:

  • a path to directory of files (which it loads and sends to pre-preprocessor);
  • a pre-preprocessor, which takes as input a text and produces a list of words;
  • a list of functions which generate features;
  • optional parameter which makes it also include tf/idf features;

and combines all of the features into matrices X and y.

tf-idf code

Change and upload the code for calculating the tf-idf matrix.

It should accept an array of string arrays, where one row represents one text. Obviously rows will be of different lenghts.

It will return an tf-idf matrix (mxn) where m is the number of samples and n is the number of words selected as features.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.