Git Product home page Git Product logo

talismane's Introduction

Talismane Logo

Build Status

Talismane is a natural language processing framework with sentence detector, tokeniser, pos-tagger and dependency syntax parser. Current available language packs include French (standard and Universal Dependencies) and English.

Sample input:

Les amoureux qui se bécotent sur les bancs publics ont des petites gueules bien sympathiques.

Sample output: a syntax tree, shown below in CoNLL-X format, also available as a Java object for manipulation in code.

1	Les	les	DET	DET	n=p|	2	det	2	det
2	amoureux	amoureux	NC	NC	g=m|	10	suj	10	suj
3	qui	qui	PROREL	PROREL	n=s|	5	suj	5	suj
4	se	se	CLR	CLR	n=p|p=3|	5	aff	5	aff
5	bécotent	bécoter	V	V	n=p|t=PS|p=3|	2	mod_rel	2	mod_rel
6	sur	sur	P	P		5	mod	5	mod
7	les	les	DET	DET	n=p|	8	det	8	det
8	bancs	banc	NC	NC	n=p|g=m|	6	prep	6	prep
9	publics	public	ADJ	ADJ	n=p|g=m|	8	mod	8	mod
10	ont	avoir	V	V	n=p|t=P|p=3|	0	root	0	root
11	des	des	DET	DET	n=p|	13	det	13	det
12	petites	petit	ADJ	ADJ	n=p|g=f|	13	mod	13	mod
13	gueules	gueule	NC	NC	n=p|	10	obj	10	obj
14	bien	bien	ADV	ADV		15	mod	15	mod
15	sympathiques	sympathique	ADJ	ADJ	n=p|	13	mod	13	mod
16	.	.	PONCT	PONCT		15	ponct	15	ponct

Downloads: The latest release and language packs can be downloaded on the releases pages.

Wiki: Simple instructions for use can be found on the Talismane wiki.

Command-line usage: follow the setup instructions, and then run a command similar to the following:

java -Xmx1G -Dconfig.file=talismane-fr-X.X.X.conf -jar talismane-core-X.X.X.jar --analyse --sessionId=fr --encoding=UTF8 --inFile=data/frTest.txt --outFile=data/frTest.tal

Calling from Java: For syntax analysis within Java code via the API, see this Java code example.

JavaDoc API: You may also consult the full JavaDoc API online.

User's manual: An out-of-date users's manual can be found on the GitHub Talismane project page. For up-to-date documentation, you're far better off consulting the wiki or the JavaDoc API .

Additional information on the project can be found on the CLLE-ERSS laboratory Talismane project home page.

Language pack usage

  • The French language pack can be used for research purposes provided that you have a license for the French Treebank. The model included is not optimised as it uses a Maximum Entropy model (which only requires about 1G of RAM) rather than a Linear SVM model (which requires about 24G RAM). If you would like the more optimised Linear SVM model, please contact Assaf Urieli.

  • The English language pack can be used for research purposes provided that you have a license for the Penn Treebank. WARNING: the English model is only an initial version, with no attempts at optimisation.

talismane's People

Contributors

reckart avatar satabin avatar urieli avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.