bigodatamining / brown-cluster Goto Github PK
View Code? Open in Web Editor NEWThis project forked from percyliang/brown-cluster
C++ implementation of the Brown word clustering algorithm.
This project forked from percyliang/brown-cluster
C++ implementation of the Brown word clustering algorithm.
Implementation of the Brown hierarchical word clustering algorithm. Percy Liang Release 1.3 2012.07.24 Input: a sequence of words separated by whitespace (see input.txt for an example). Output: for each word type, its cluster (see output.txt for an example). In particular, each line is: <cluster represented as a bit string> <word> <number of times word occurs in input> Runs in $O(N C^2)$, where $N$ is the number of word types and $C$ is the number of clusters. References: Brown, et al.: Class-Based n-gram Models of Natural Language http://acl.ldc.upenn.edu/J/J92/J92-4003.pdf Liang: Semi-supervised learning for natural language processing http://cs.stanford.edu/~pliang/papers/meng-thesis.pdf Compile: make Run: # Clusters input.txt into 50 clusters: ./wcluster --text input.txt --c 50 # Output in input-c50-p1.out/paths ============================================================ Change Log 1.3: compatibility updates for newer versions of g++ (courtesy of Chris Dyer). 1.2: make compatible with MacOS (replaced timespec with timeval and changed order of linking). 1.1: Removed deprecated operators so it works with GCC 4.3. ============================================================ (C) Copyright 2007-2012, Percy Liang http://cs.stanford.edu/~pliang Permission is granted for anyone to copy, use, or modify these programs and accompanying documents for purposes of research or education, provided this copyright notice is retained, and note is made of any changes that have been made. These programs and documents are distributed without any warranty, express or implied. As the programs were written for research purposes only, they have not been tested to the degree that would be advisable in any important application. All use of these programs is entirely at the user's own risk.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.