Git Product home page Git Product logo

tshrdlu's Introduction

tshrdlu

=======

Author: Jason Baldridge ([email protected])

This is a parent repository for project related code for Applied NLP course being taught by Jason Baldridge at UT Austin. This involves creating applications that use Twitter streams and can take automated actions as Twitter users, using natural language processing and machine learning.

The name "tshrdlu" comes from Twitter+SHRDLU.

For more information, updates, etc., follow @appliednlp on Twitter. The @tshrdlu account is now doing some tweeting of its own (by which I mean automated tweeting, based on the code in this repository).

Requirements

Configuring your environment variables

The easiest thing to do is to set the environment variables JAVA_HOME and TSHRDLU_DIR to the relevant locations on your system. Set JAVA_HOME to match the top level directory containing the Java installation you want to use.

Next, add the directory TSHRDLU_DIR/bin to your path. For example, you can set the path in your .bashrc file as follows:

export PATH=$PATH:$TSHRDLU_DIR/bin

Once you have taken care of these three things, you should be able to build and use tshrdlu.

You will also need to configure twitter4j Twitter access, see http://twitter4j.org/en/configuration.html for details.

If you plan to index and search objects using the provided code based on Lucene, you can customize the directory where on-disk indexes are stored (the default is the tempdir, check the directory tshrdlu) by setting the environment variable TSHRDLU_INDEX_DIR.

Some functionality depends on GeoNames API access (free to sign up and use). To take advantage of this functionality, create an account and set the environment variable TSHRDLU_GEONAMES_USERNAME to your GeoNames username. tshrdlu should still run without this but some of the repliers will not function.

Building the system from source

tshrdlu uses SBT (Simple Build Tool) with a standard directory structure. To build tshrdlu, type (in the TSHRDLU_DIR directory):

$ ./build update compile

This will compile the source files and put them in ./target/classes. If this is your first time running it, you will see messages about Scala being downloaded -- this is fine and expected. Once that is over, the tshrdlu code will be compiled.

To try out other build targets, do:

$ ./build

This will drop you into the SBT interface. To see the actions that are possible, hit the TAB key. (In general, you can do auto-completion on any command prefix in SBT, hurrah!)

To make sure all the tests pass, do:

$ ./build test

Documentation for SBT is at http://www.scala-sbt.org/

Note: if you have SBT already installed on your system, you can also just call it directly with "sbt" in TSHRDLU_DIR.

Questions or suggestions?

Email Jason Baldridge: [email protected]

Or, create an issue: https://github.com/utcompling/tshrdlu/issues

tshrdlu's People

Contributors

jasonbaldridge avatar ericlatimer avatar njwilson avatar reactormonk avatar nazneenrajani avatar hassaanm avatar jimsevans avatar myall86 avatar treadstone90 avatar drdub avatar jmattfong avatar

Stargazers

AJAY SAHU avatar Jianan Sun avatar Fay avatar Neaton avatar 程序员田同学 avatar  avatar chenjie180 avatar  avatar  avatar 田雨 avatar  avatar  avatar  avatar Lee avatar kingkarlito avatar  avatar  avatar  avatar lifets avatar 老梅 avatar june avatar Lucas Saldyt avatar Mauros avatar  avatar Angus H. avatar Rebecca avatar Rohit Kashyap avatar Ahsan Nabi Dar avatar Carlos Saltos avatar Alvaro Viebrantz avatar Abhishek Shukla Ravishankara avatar Jose avatar Vamsi Krishna B avatar SwhGo_oN avatar ian parkins avatar  avatar Marius Elkan avatar Brian Guarraci avatar Philip (flip) Kromer avatar Christopher Brown avatar Dave Lester avatar  avatar  avatar Ryan LeCompte avatar

Watchers

Dan Garrette avatar mingfeng.zhang avatar Ryan LeCompte avatar James Cloos avatar  avatar Rohit Kashyap avatar  avatar  avatar

tshrdlu's Issues

RegEX change needed in Bot.scala

In Bot.scala, the StripLeadMentionRE RegEX will not match if the user name in the mention contains a character other than lowercase [a-z]. We at least need to add "_" and possibly numbers and capitalized letters.

RFC: Store the tweets inside Lucene?

Lucene supports storing inside a field (Field.Store.YES), with the cost of a bit more memory used. The gain would be simpler code because we don't have to serialize the tweets separately. Given the low efficiency constraints, this should not be a problem. If it becomes one, I suggest switching to a "real" key-value store.

Question Regarding User Stream Listener

Hi,

I was going through the code.

I have questions regarding the code in this file:

src/main/scala/tshrdlu/util/bridge.scala

  1. I understand that "def onStatus(status: Status) = actor ! status" is handled by the "Bot" actor.

Events like below are handled by which actor?

def onDeletionNotice(notice: StatusDeletionNotice) = actor ! notice
def onStallWarning(warning: StallWarning) = actor ! warning

  1. You have case classes defined for few other events. Instead of that could you not have just worked if you had left the implementations empty?

For example:

Instead of

def onTrackLimitationNotice(int: Int) = actor ! TrackLimitationNotice(int)

This would have worked fine too.
def onTrackLimitationNotice(int: Int) { /* Purposefully left Empty */ }

Is there something that I am missing?
Good Practice, maybe?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.