Git Product home page Git Product logo

hoeffdingtree's Introduction

HoeffdingTree

Implementation of algorithms for incremental classification [1,2] and regression [3,6] tree learning from time-changing data streams. See [8] for detailed description of the main ideas behind these algorithms.

As of this writing this is still work in progress. The plan is to integrate this into QMiner soon. (Note that this code requires GLib.)

Note: If you'll try to build this, note that GLib uses v0.10 version of libuv; so run git clone -b v0.10 https://github.com/joyent/libuv.git in case you need to build it

Note: We started integrating this code into QMiner.

Note: I will no longer maintain this repository. As of now the learner will be availble through QMiner. I might clean up the code so others can use it, but I believe it will be much, much nicer when exposed through QMiner Javascript API. It will be easy to use, assume zero C++ knowledge, require no recompilation, etc. See QMiner documentation for details.

Note: Check out incremental classification and regression tree learning in QMiner here. User code is in src/ht.js. (This is a working version on my fork. For the stable one go here.)

Simple usage example

Configuration file describes data stream. Below is a simple config example.

dataFormat: (status,age,sex,survived)
status: discrete(first,second,third,crew)
age: discrete(adult,child)
sex: discrete(male,female)
survived: discrete(yes,no)

Future version of the HoeffdingTree --- to be available in QMiner --- will accept JSON configuration.

{
	"dataFormat": ["status", "sex", "age", "survived"],
	"sex": {
		"type": "discrete",
		"values": ["male", "female"]
	},
	"status": {
		"type": "discrete",
		"values": ["first", "second", "third", "crew"]
	},
	"age": {
		"type": "discete",
		"values": ["child", "adult"]
	},
	"survived": {
		"type": "discrete",
		"values": ["yes", "no"]
	}
}

The goal is to expose it through QMiner Javascript API. (This will happen the next few days.)

Simple usage example.

// ... 
int main(int argc, char** argv) {
	// create environment
	Env = TEnv(argc, argv, TNotify::StdNotify);
	// pass some parameters 
	const bool ConceptDriftP = Env.IsArgStr("drift"); // use version that handles concept-drift? 
	const double SplitConfidence = Env.GetIfArgPrefixFlt("-splitConfidence:", 1e-6, "Split confidence"); // 1e-6 
	const double TieBreaking = Env.GetIfArgPrefixFlt("-tieBreaking:", 0.01, "Tie breaking"); // 1e-2 
	const TStr ConfigFNm = Env.GetIfArgPrefixStr("-config:", "titanic.config", "Config file");
	const TStr AttrHeuristic= Env.GetIfArgPrefixStr("-attrEval:", "InfoGain", "Attribute evaluation heuristic");
	const int GracePeriod = Env.GetIfArgPrefixInt("-gracePeriod:", 300, "Grace period"); // 3e2 
	const int DriftCheck = Env.GetIfArgPrefixInt("-driftCheck:", 10000, "Drift check"); // 1e4 
	const int WindowSize = Env.GetIfArgPrefixInt("-windowSize:", 100000, "Window size"); // 1e5 
	
	// usage example 
	PHoeffdingTree ht = THoeffdingTree::New("docs/" + ConfigFNm, GracePeriod, SplitConfidence, TieBreaking,
		DriftCheck, WindowSize);
	ht->SetAdaptive(ConceptDriftP);
	ProcessData("data/titanic-220M.dat", ht);
	ht->Export("exports/titanic-"+TInt(ht->ExportN).GetStr()+".gv", TExportType::DOT);
	return 0;
}
// process data line-by-line 
void ProcessData(const TStr& FileNm, PHoeffdingTree HoeffdingTree) {
	Assert(TFile::Exists(FileNm));
	TFIn FIn(FileNm);
	TStr Line;
	while (FIn.GetNextLn(Line)) { HoeffdingTree->Process(Line, ','); }
}

References

hoeffdingtree's People

Contributors

blazs avatar

Stargazers

Afaf Alwan avatar Qifan Wang avatar

Watchers

James Cloos avatar Joanne Sun avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.