Git Product home page Git Product logo

pyramid's Introduction

Pyramid

A Java Machine Learning Library

Pyramid is a Java machine learning library which implements many state-of-the-art machine learning algorithms, including

At the moment, not all algorithms are released. We are actively working on tidying up the source files and adding documentations. We will release a few algorithms at a time when they are ready and hope to have all algorthms released soon!

Requirements

If you just want to use pyramid as a command line tool (which is very simple), all you need is Java 8.

If you are also a Java developer and wish to call Pyramid Java APIs, you will also need Maven.

Setup

Pyramid doesn't require any installation effort. All you need is downloading the latest pre-compiled package (with a name like pyramid-x.x.x.zip) and decompressing it. Now you can move into the created folder and type

./pyramid config/welcome.properties

You will see a welcome message and that means everything is working perfectly.

Windows users please see the notes.

Command Line Usage

All algorithms/functions implemented in Pyramid can be run though a simple command, with the following syntax:

./pyramid <properties_file>

Example:

./pyramid config/welcome.properties

or

./pyramid config/cbm.properties

pyramid is a launcher script and <properties_file> is a file specifying the name of the algorithm and all necessary parameters, such as the input data, output folder, and learning algorithm hyper parameters. The <properties_file> can be specified by either an absolute or a relative path.

To run different algorithms, you just need to invoke the program with different properties files. The list of available algorithms and their corresponding properties file templates can be found in the Wiki.

Building from Source

If you are a Java developer who prefer working with the source code or want to contribute to the Pyramid package:

Pyramid uses Maven for its build system.

To compile and package the project from the source code, simply run the mvn clean package -DskipTests command in the cloned directory. The compressed package will be created under the core/target/releases directory.

Feedback

We welcome your feedback on the package. To ask questions, request new features or report bugs, please contact Cheng Li via [email protected].

Answers to some commonly asked questions can be found in FAQ.

pyramid's People

Contributors

apete avatar cheng-li avatar dependabot[bot] avatar deyb avatar henry-yan avatar jiliny avatar jinlongfan avatar maoqiuzi avatar rainicy avatar virgilpavlu avatar yuyuxu avatar zhenmingbi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyramid's Issues

Compilation Error "maven-assembly-plugin:2.5.5:single failed: user id '16779829' is too big ( > 2097151 )"

[INFO] pyramid 0.12.6 ..................................... SUCCESS [ 0.238 s]
[INFO] phrase-count-plugin 1.0 ............................ SUCCESS [ 4.893 s]
[INFO] pyramid 0.12.6 ..................................... FAILURE [ 15.589 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 20.851 s
[INFO] Finished at: 2020-03-07T20:08:16-05:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.5.5:single (default) on project pyramid: Execution default of goal org.apache.maven.plugins:maven-assembly-plugin:2.5.5:single failed: user id '16779829' is too big ( > 2097151 ). -> [Help 1]

I have attached the error message. Do you know how to deal with it?

CBM: java.lang.OutOfMemoryError: GC overhead limit exceeded

When I run CBM with 10K training samples, 10K features and 12 labels (70 label sets) , it works fine. But when I increase the number of training samples to 150K, it throws the following error.. Most of the features are real valued. The dataset is around 95% sparse. The code uses around 50GB memory out of 128GB.

Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at edu.neu.ccs.pyramid.application.AppLauncher.invokeMain(AppLauncher.java:72)
at edu.neu.ccs.pyramid.application.AppLauncher.launch(AppLauncher.java:39)
at edu.neu.ccs.pyramid.application.AppLauncher.main(AppLauncher.java:24)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.split(String.java:2338)
at java.lang.String.split(String.java:2410)
at edu.neu.ccs.pyramid.dataset.TRECFormat.fillMultiLabelClfDataSet(TRECFormat.java:332)
at edu.neu.ccs.pyramid.dataset.TRECFormat.loadMultiLabelClfDataSet(TRECFormat.java:159)
at edu.neu.ccs.pyramid.dataset.TRECFormat.loadMultiLabelClfDataSet(TRECFormat.java:106)
at edu.neu.ccs.pyramid.application.App5.train(App5.java:58)
at edu.neu.ccs.pyramid.application.App5.main(App5.java:38)
... 7 more

From web, what I got to know is, this error is raised when most of the run time is consumed in Garbage collection and the progress become too slow. Java shows this error as it suspects the program may never finish. As per the paper, CBM is able to handle TMC2007 dataset which is relatively large. So I am hoping there is a solution to this issue.

Any idea how to fix this issue?

Confidence scores of the Predictions

Hi @cheng-li ,
Is it possible to save the confidence scores of the predictions i.e. for a test data point when a label subset is predicted what is the confidence (or probability) of that label subset. This will help us to identify the hard-to-classify data points.

Thanks in advance.

the visualizer folder parameter has to end with "/"

[chengli@fiji11 pyramid-0.1.0]$ python visualization/visualizer.py /huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/train_reports
Traceback (most recent call last):
File "visualization/visualizer.py", line 1624, in
main()
File "visualization/visualizer.py", line 1602, in main
f1 = open(directoryName + configName, 'r')
IOError: [Errno 2] No such file or directory: '/huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/data_config.json'
[chengli@fiji11 pyramid-0.1.0]$ python visualization/visualizer.py /huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/train_reports/
Json:/huge1/people/chengli/projects/pyramid/archives/app3/ohsumed_20000/8/reports/train_reports/report_1.json load successfully.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.