Git Product home page Git Product logo

aloha's People

Contributors

amirziai avatar davidgev avatar deaktator avatar jmorra avatar pferrel avatar rickbw avatar ryan-deak-zefr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aloha's Issues

Simple walkthrough docs

Documentation written for a first time user who doesn't know why this is cool and maybe doesn't know Java or Scala.

Polish Docs

Clean up links, fill in appropriate pages

Add exploration library

When doing contextual bandits it's essential to record the probability with which the action was shown. To do this we should make use of explore-java to get these probabilities

Add support for arg max (or min) models

Add the ability to do arg max models. This naive implementation would identify one (or more) parameters that can vary during score time. At training time these parameters are known, but at test time they are combinatorially tried one at a time and the set that produces the max (or min) is returned.

Allow VW multiclass to specify class mapping

Allow VW multi-class predictions to specify a class mapping. This will enable the models to output different prediction class types.

In dataset code

  • Array will be optional and default to Int if not supplied
  • Determine type of classes array based JSON values
  • If the array is provided and of mixed type, throw an Exception
  • Types of classes supported: Boolean, Long, Double, String
  • classLookup(cbAction).toIndex + 1 (this will be a HashMap constructed from the array.

Make sure to reuse code for determining type of JSON array / prediction class type

In model code

  • type T = typeOf[class lookup]
  • Double => Int => T => B, where B is the model output type
// If the prediction is not an Int 
// (doubleVal.isValidInt), throw exception at predict time
// throw vs return error?
val classIndex: Double => Int = (vwPred: Double) => {
  if (!vwPred.isValidInt) throw new Exception("...")
  vwPred.toInt - 1
}
val classLookup: Int => T = 
  (vwPredClass: Int) =>  lookup(vwPredClass)
val tToB: T => B = TypeCoercion[T, B] getOrElse {
  throw new DeserializationException(
    s"Couldn't find conversion to ${RefInfoOps.toString[B]}"
  )
}
val finalizerFn: Double => B = 
  classIndex andThen classLookup andThen tToB

In VW CLI code

If it exists, copy classLookup in spec to classLookup in model.

VW model creation forces VFS2

When creating an Aloha VW model and embedding the native model as a base64 encoded string VFS2 is forced. This should not be the case and the format should be specified by the caller.

As part of this also check the H2O branch to make sure the same thing isn't happening over there.

Ability to transform business rules into ModelDecisionTree

This would be a tool that ingests a number of business rules specified w.r.t. features in a dataset and a set of other models. It would return a model decision tree that encodes all those business rules and leads to either predetermined outcomes or to the other models specified.

Support action based features

A user should be able to specify features that are based on the actions. This is tricky as it requires the model to know all possible actions and represent these features differently.

Error handling page

Create a separate page for how aloha's philosophy on errors and how it deals with errors. Include SemanticsUdfException, ErrorSwallowingModel and information about The Model API for apply, scoreAsEither and score.

Create an Option to Seq pimp

Create an easy, generic, fast way to convert Option[A] to Seq[(String, Double)]. This should NOT be implicit conversion but rather a pimp on Option through the use of a value class and an implicit method. The implicit method is chosen instead of an implicit value class because value classes can't be embedded in a trait and we want the pimp to be in a stackable trait architecture because we want to update com.eharmony.aloha.feature.BasicFunctions so that the standard import of import com.eharmony.aloha.feature.BasicFunctions._ can be used to get the new functionality.

The value class will go in the package object for the regression model.

package object com.eharmony.aloha.models.reg {
  final class OptToKv[A](val a: Option[A]) extends AnyVal {
    def toKv(implicit f: A => Double): Seq[(String, Double)] = 
      if (a.nonEmpty) ("", f(a.get)) :: Nil else Nil 
  }
}

The implicit conversion will go in RegressionModelValueToTupleConversions.

implicit def toKv[A](a: Option[A]) = new OptToKv(a)

Then, specifications like the following can be written in Aloha models:

"features: {
  "height_cm": "Option(180).toKv",
  "weight_lbs": "Option(${profile.weight}).filter(_ <= 200) .toKv"
}

This would create the regression model features for someone over 200 lbs as:

IndexedSeq(
  List(("height_cm", 180.0)),
  Nil
)

This is exactly what we want. Don't screw up type inference but make the conversion fast and pretty painless for the model authors.

Aloha VW JNI model is no longer serializable

I'm trying to serialize the Aloha VW JNI model after updating the to latest code and I'm seeing the following exception:

java.io.NotSerializableException: org.apache.commons.vfs2.provider.local.LocalFile

This is being triggered by a call to SparkContext.broadcast of an Aloha VW JNI model. After looking at the code the culprit appears to be the vwModel object. Can this please be changed so that it is serializable again.

h2o docs

Update model JSON formats with JSON specifics for the h2o model.

Make auditor output type

Remove the notion of a ScoreProto and replace that with a recursive data type that allows representation of all the subscores.

Create a constant time decision tree selector

Create a JSON entity and matching Selector type that allows the JSON author to provide a function that provides an index of the child to traverse. This may look something like:

{
  "type": "constant",
  "selector": "${reg.hour_of_day}",
  "children": [ 3, 4, 5 ]
}

Integrate Aloha with R

Add the ability to build R Aloha models. This should have all the same support that VW currently has.

Automatic Salts

I recently ran into an issue where the random node selector wasn't working because I was using the same features at two different levels of multiple model decision trees. This process is avoidable if Aloha takes care of the salts automatically.

This can be done by having a flag which defaults to true called automaticSalts. If this is set to true, then Aloha will append the depth in the model decision tree to the set of features for the hash. This will prevent the same set of features from being used twice.

This feature should be extended to CategoricalDistribution models, and any other place randomization is needed.

In order for this work something about the model should be included in the salt. An idea that came up is a hash of the JSON.

Fix Bag of Words

bag function in com.eharmony.aloha.feature.BagOfWords doesn't sum weights associated with a duplicate key. Essentially it's doing the map but not the reduce phase of a word count.

Need to aggregate these values.

Also check skipGrams and nGrams for similar problems.

Simplify dataset error messages

Make it something like:

Couldn't create dataset.  Errors: 
    0)  CsvRowCreator.Producer: Object is missing required member 'imports'

Model Descriptions

Add a detailed description of every model type and how to construct them by hand.

Alternative site deploy plugin

Currently, the com.github.github maven-site-plugin is slow because each resource is uploaded individually and the uploads are rate-limited. Look for an alternative plugin that uploads all at once.

For instance: mojohaus

Integrate Aloha with H2O

Add the ability to build H2O Aloha models. This should have all the same support that VW currently has.

aloha-cli doesn't handle quoted strings correctly

aloha-cli bash script doesn't process quoted strings correctly. Quoted strings with spaces are exploded into multiple arguments.

Need to Fix shell script only. Might have to add unit tests though

VwJniModel Serialization is broken

VwJniModel serialization only works when serialized and deserialized on the same machine. This appears to be because the serialization only supports the vw model files location and not the contents of the file.

Model Serialization

Some types of models are not serializable and therefore cannot be used with Spark. The one I see is Exception in thread "main" java.io.NotSerializableException: com.eharmony.aloha.util.rand.HashedCategoricalDistribution.

For this ticket please go through all model types and ensure they are all serializable.

more descriptive cli error messages

I was working with Aloha and receiving the below error every now and then after altering the spec file. A more descriptive error message would be helpful.

" Couldn't create writer for dataset type csv. Error: No applicable producer found. Given Producer"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.