Git Product home page Git Product logo

rnlp's Introduction

srlearn

Repository preview image: "srlearn. Python wrappers around BoostSRL with a scikit-learn-style interface. pip install srlearn."

License LGTM code quality analysis GitHub CI Builds Code coverage status Documentation status

srlearn is a Python package for learning statistical relational models, and wraps BoostSRL (and other implementations) with a scikit-learn interface.

Getting Started

Prerequisites:

  • Java (1.8, 1.11)
  • Python (3.7, 3.8, 3.9, 3.10)

Installation

pip install srlearn

Basic Usage

The general setup should be similar to scikit-learn. But there are a few extra requirements in terms of setting background knowledge and formatting the data.

A minimal working example (using the Toy-Cancer data set imported with 'load_toy_cancer') is:

from srlearn.rdn import BoostedRDNClassifier
from srlearn import Background
from srlearn.datasets import load_toy_cancer
train, test = load_toy_cancer()
bk = Background(modes=train.modes)
clf = BoostedRDNClassifier(
    background=bk,
    target='cancer',
)
clf.fit(train)
clf.predict_proba(test)
# array([0.88079619, 0.88079619, 0.88079619, 0.3075821 , 0.3075821 ])
print(clf.classes_)
# array([1., 1., 1., 0., 0.])

train and test are each srlearn.Database objects, so this hides some of the complexity behind the scenes.

This example abstracts away some complexity in exchange for compactness. For more examples, see the Example Gallery.

Citing

If you find this helpful in your work, please consider citing:

@misc{hayes2019srlearn,
  title={srlearn: A Python Library for Gradient-Boosted Statistical Relational Models},
  author={Alexander L. Hayes},
  year={2019},
  eprint={1912.08198},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Contributing

Many thanks to those who have already made contributions:

Many thanks to the known and unknown contributors to WILL/BoostSRL/SRLBoost, including: Navdeep Kaur, Nandini Ramanan, Srijita Das, Mayukh Das, Kaushik Roy, Devendra Singh Dhami, Shuo Yang, Phillip Odom, Tushar Khot, Gautam Kunapuli, Sriraam Natarajan, Trevor Walker, and Jude W. Shavlik.

We have adopted the Contributor Covenant Code of Conduct version 1.4. Please read, follow, and report any incidents which violate this.

Questions, Issues, and Pull Requests are welcome. Please refer to CONTRIBUTING.md for information on submitting issues and pull requests.

Versioning and Releases

We use SemVer for versioning. See Releases for stable versions that are available, or the Project Page on PyPi.

rnlp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rnlp's Issues

Output files

Present implementation appends facts, blocks, sentences, etc. to files in the user's current directory.

Ideally there should be an output flag (-o) with which users may specify where these should be written to.

Parallelism for makeIdentifiers

A large amount of the running time tends to be spent in parse.makeIdentifiers(), which is essentially a triple-nested for loop over blocks, sentences, and words.

Previously this was "resolved" by wrapping the outer loop with tqdm to estimate how long the process would take. This did not actually change anything but likely would make someone feel better about the situation.


joblib may be a viable way to execute the outer loop in parallel:

from joblib import Parallel, delayed
from tqdm import tqdm

def foo(block, blockID):
    """
    :param block: The current block to be processed (list of lists).
    :param blockID: Index of the current block (int).
    """
    return [blockID]

Blocks = list(range(5000))
facts = Parallel(n_jobs=-1)(delayed(foo)(Blocks[i], i) for i in tqdm(range(len(Blocks))))

In the short example above, the "Blocks" would in reality be the the list of blocks generated earlier. foo(block, blockID) would be something similar to the current parse.makeIdentifiers() method, but blockID is passed as a parameter rather than an integer that increments at the end of the outer loop.

Logging should be optional

The current implementation sets logging from __main__.py and appends both errors and logs to a file named rnlp_log.log.

If the user experiences errors when running the code (for example: nltk is not installed), these errors are placed in the log file and it may not be obvious what occurred.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.