srlearn

srlearn is a Python package for learning statistical relational models, and wraps BoostSRL (and other implementations) with a scikit-learn interface.

Documentation: https://srlearn.readthedocs.io/en/latest/
Questions? Contact Alexander L. Hayes (hayesall)

Getting Started

Prerequisites:

Java (1.8, 1.11)
Python (3.7, 3.8, 3.9, 3.10)

Installation

pip install srlearn

Basic Usage

The general setup should be similar to scikit-learn. But there are a few extra requirements in terms of setting background knowledge and formatting the data.

A minimal working example (using the Toy-Cancer data set imported with 'load_toy_cancer') is:

from srlearn.rdn import BoostedRDNClassifier
from srlearn import Background
from srlearn.datasets import load_toy_cancer
train, test = load_toy_cancer()
bk = Background(modes=train.modes)
clf = BoostedRDNClassifier(
    background=bk,
    target='cancer',
)
clf.fit(train)
clf.predict_proba(test)
# array([0.88079619, 0.88079619, 0.88079619, 0.3075821 , 0.3075821 ])
print(clf.classes_)
# array([1., 1., 1., 0., 0.])

train and test are each srlearn.Database objects, so this hides some of the complexity behind the scenes.

This example abstracts away some complexity in exchange for compactness. For more examples, see the Example Gallery.

Citing

If you find this helpful in your work, please consider citing:

@misc{hayes2019srlearn,
  title={srlearn: A Python Library for Gradient-Boosted Statistical Relational Models},
  author={Alexander L. Hayes},
  year={2019},
  eprint={1912.08198},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Contributing

Many thanks to those who have already made contributions:

Alexander L. Hayes, Indiana University, Bloomington
Harsha Kokel, The University of Texas at Dallas
Siwen Yan, The University of Texas at Dallas

Many thanks to the known and unknown contributors to WILL/BoostSRL/SRLBoost, including: Navdeep Kaur, Nandini Ramanan, Srijita Das, Mayukh Das, Kaushik Roy, Devendra Singh Dhami, Shuo Yang, Phillip Odom, Tushar Khot, Gautam Kunapuli, Sriraam Natarajan, Trevor Walker, and Jude W. Shavlik.

We have adopted the Contributor Covenant Code of Conduct version 1.4. Please read, follow, and report any incidents which violate this.

Questions, Issues, and Pull Requests are welcome. Please refer to CONTRIBUTING.md for information on submitting issues and pull requests.

Versioning and Releases

We use SemVer for versioning. See Releases for stable versions that are available, or the Project Page on PyPi.

Parallelism for makeIdentifiers

A large amount of the running time tends to be spent in parse.makeIdentifiers(), which is essentially a triple-nested for loop over blocks, sentences, and words.

Previously this was "resolved" by wrapping the outer loop with tqdm to estimate how long the process would take. This did not actually change anything but likely would make someone feel better about the situation.

joblib may be a viable way to execute the outer loop in parallel:

from joblib import Parallel, delayed
from tqdm import tqdm

def foo(block, blockID):
    """
    :param block: The current block to be processed (list of lists).
    :param blockID: Index of the current block (int).
    """
    return [blockID]

Blocks = list(range(5000))
facts = Parallel(n_jobs=-1)(delayed(foo)(Blocks[i], i) for i in tqdm(range(len(Blocks))))

In the short example above, the "Blocks" would in reality be the the list of blocks generated earlier. foo(block, blockID) would be something similar to the current parse.makeIdentifiers() method, but blockID is passed as a parameter rather than an integer that increments at the end of the outer loop.

srlearn / rnlp Goto Github PK

rnlp's Introduction

srlearn

Getting Started

Basic Usage

Citing

Contributing

Versioning and Releases

rnlp's People

Stargazers

Watchers

Forkers

rnlp's Issues

environment not well clarified

Output files

Parallelism for makeIdentifiers

Logging should be optional

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent