elki-project / elki Goto Github PK

ELKI Data Mining Toolkit

Home Page: https://elki-project.github.io/

License: GNU Affero General Public License v3.0

Java 99.85% Python 0.10% GLSL 0.02% Batchfile 0.02% Shell 0.01%

data-mining java machine-learning clustering outlier-detection anomalydetection visualization data-mining-algorithms indexing index

elki's People

Stargazers

Watchers

Forkers

mindis zhangxia85 wayneleung555 amrqura fjutlj marianokohan fjfd sreev tzaeschke pengliu anddegs emiratrim starju directorscut82 e1000i xiaokesec victordov schefflm vikramdessai wuafeing deric orenov mangwang sksundaram-learning parisilabs xxpanda dieface 3bst0r patrickkostjens parmegv sebaruehl mldl parekhabhishekn chaosimple skrusche63 yochju mintingjava xlong88 4sp1r3 norbertoritzmann marlonglopes jcatop zagoraju cybernetics pokemonxue faducoder ehedegaard idealoutage sis-labs gerjo caifazhou nelmaya ktargows fredps rychenga maoxinyao kno10 mikeaddison93 nitish18 zzq1016 dbauersachs ravithejaburugu gvk489 connectthefuture minghao2016 hashojaei gjzheng93 qinlab carlesgota oattia starrrr1 mehrshad-mansouri muashraf linghushaoxia loretoparisi yangml6 yunfeng-chen xushjie1987 anam282 flymi sj delead ankitsc anto-espo skobets yikez978 mikewlange heoa pawanrana kshimauchi renyao mauriziocasciano sinskar gamobink benzei ackshat kingdom031 lynnetest oxymora vkumar25-suny

elki's Issues

Any docs about DeLiClu clustering algorithm?

I had read this paper, but it's hard to understand its algorithm logic for me, is there any docs about this algorithm?

No data type found satisfying: NumberVector,field AND NumberVector,variable

Hi everyone,

I'm currently trying to use ELKI for clustering some rather big data... or better said, I'd like to use if, if it would let me.
I've used it before, where it worked, but now something is going wrong.
I've cut it down to a file consisting out of 10 rows with 7 columns with float numbers (attached,
absolute_counts_per_contig.csv.percentages_per_row.head_10_columns_7.txt), and I also used the latest ELKI version (just cloned it a minute ago), and I still get an error.

The command + error is the following:

14:30:44 bastian@computer:~$ java -Xmx11G -jar /exports/mm-hpc/bacteriologie/bastian/tools/elki/elki-bundle-0.7.2-SNAPSHOT.jar KDDCLIApplication -dbc.in /exports/mm-hpc/bacteriologie/bastian/data/absolute_counts_per_contig.csv.percentages_per_row.head_10_columns_7.csv -out /exports/mm-hpc/bacteriologie/bastian/data/elki_results/kmeans_perc_per_row_maxiter_10000/2/ -algorithm clustering.kmeans.KMeansLloyd -kmeans.k 2 -kmeans.maxiter 10000   -evaluator clustering.internal.EvaluateDaviesBouldin,clustering.internal.EvaluatePBMIndex,clustering.internal.EvaluateSquaredErrors,clustering.internal.EvaluateVarianceRatioCriteria,clustering.internal.EvaluateSimplifiedSilhouette -parser.colsep \\t -resulthandler ResultWriter 
No data type found satisfying: NumberVector,field AND NumberVector,variable
Available types: DBID DoubleVector,variable,mindim=5,maxdim=7 LabelList
de.lmu.ifi.dbs.elki.data.type.NoSupportedDataTypeException: No data type found satisfying: NumberVector,field AND NumberVector,variable
Available types: DBID DoubleVector,variable,mindim=5,maxdim=7 LabelList
        at de.lmu.ifi.dbs.elki.database.AbstractDatabase.getRelation(AbstractDatabase.java:123)
        at de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(AbstractAlgorithm.java:79)
        at de.lmu.ifi.dbs.elki.workflow.AlgorithmStep.runAlgorithms(AlgorithmStep.java:100)
        at de.lmu.ifi.dbs.elki.KDDTask.run(KDDTask.java:109)
        at de.lmu.ifi.dbs.elki.application.KDDCLIApplication.run(KDDCLIApplication.java:58)
        at de.lmu.ifi.dbs.elki.application.AbstractApplication.runCLIApplication(AbstractApplication.java:184)
        at de.lmu.ifi.dbs.elki.application.KDDCLIApplication.main(KDDCLIApplication.java:93)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at de.lmu.ifi.dbs.elki.application.ELKILauncher.main(ELKILauncher.java:77)

It looks like there is some issue with parsing the columns...but I really cannot see it, all columns in all rows have values.
Any advice what could be going wrong?

Thanks,
Bastian

Extracting hierarchy from DBSCAN

I use ELKI within my Java code and have been trying to export the cluster hierarchy generated with HDBSCAN, however this just results in a single root cluster with the child cluster all being leaves.

In order to "fix" this I changed the collectChildren method in the HDBSCANHierarchyExtraction class.
Replacing
collectChildren(temp, clustering, child, clus, flatten);
with
finalizeCluster(child, clustering, clus, flatten);

This does seem to result in a proper hierarchy, although does return all clusters (including those with fewer than minPts data points). However, my understanding of the code is not enough to know whether this is in any way sensible or correct.

I use the following code to output the hierarchy:

{
        ...create clustering code...

        Relation<NumberVector> coords = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD_2D);
        List<Cluster<DendrogramModel>> topClusters = clustering.getToplevelClusters();
        Hierarchy<Cluster<DendrogramModel>> hierarchy = clustering.getClusterHierarchy();

        for (Cluster<DendrogramModel> cluster : topClusters) {
            System.out.println("---------------------------------");
            outputHierarchy(cluster, hierarchy, coords, "");
        }
}

private static void outputHierarchy(Cluster<DendrogramModel> cluster,
                                    Hierarchy<Cluster<DendrogramModel>> hierarchy,
                                    Relation<NumberVector> coords,
                                    String indent) {
    final DBIDs ids = cluster.getIDs();
    DendrogramModel model = cluster.getModel();
    System.out.format("%s%s: %d : %.3f%n", indent, cluster.getName(), ids.size(), model.getDistance());
    if (!ids.isEmpty()) {
        System.out.print(indent);
        for (DBIDIter iter = ids.iter(); iter.valid(); iter.advance()) {
            System.out.print(Arrays.toString(coords.get(iter).toArray()));
        }
        System.out.println();
    }
    if (hierarchy != null) {
        if (hierarchy.numChildren(cluster) > 0) {
            for (It<Cluster<DendrogramModel>> iter = hierarchy.iterChildren(cluster); iter.valid();
                 iter.advance()) {
                outputHierarchy(iter.get(), hierarchy, coords, indent + "  ");
            }
        }
    }
}

MaximumMatchingAccuracy Index out of Bounds Exception

Hey,
im new to elki and might be doing something wrong.
When i run:
for k in $( seq 3 40 ); do java -jar elki-bundle-0.7.6-SNAPSHOT.jar KDDCLIApplication -dbc.in data/synthetic/Vorlesung/mouse.csv -algorithm clustering.kmeans.LloydKMeans -kmeans.k $k -resulthandler ResultWriter -out.gzip -out output/k-$k ; done
i get a lot of

Index 6 out of bounds for length 6
java.lang.ArrayIndexOutOfBoundsException: Index 6 out of bounds for length 6
at elki.evaluation.clustering.MaximumMatchingAccuracy.(MaximumMatchingAccuracy.java:69)
at elki.evaluation.clustering.ClusterContingencyTable.getMaximumMatchingAccuracy(ClusterContingencyTable.java:246)
at elki.evaluation.clustering.EvaluateClustering$ScoreResult.(EvaluateClustering.java:245)
at elki.evaluation.clustering.EvaluateClustering.evaluteResult(EvaluateClustering.java:173)
at elki.evaluation.clustering.EvaluateClustering.processNewResult(EvaluateClustering.java:159)
at elki.evaluation.AutomaticEvaluation.autoEvaluateClusterings(AutomaticEvaluation.java:148)
at elki.evaluation.AutomaticEvaluation.processNewResult(AutomaticEvaluation.java:67)
at elki.workflow.EvaluationStep$Evaluation.update(EvaluationStep.java:106)
at elki.workflow.EvaluationStep$Evaluation.(EvaluationStep.java:95)
at elki.workflow.EvaluationStep.runEvaluators(EvaluationStep.java:72)
at elki.KDDTask.run(KDDTask.java:109)
at elki.application.KDDCLIApplication.run(KDDCLIApplication.java:58)
at elki.application.AbstractApplication.runCLIApplication(AbstractApplication.java:175)
at elki.application.KDDCLIApplication.main(KDDCLIApplication.java:91)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at elki.application.ELKILauncher.main(ELKILauncher.java:80)

if i do
for k in $(seq 3 6) it works. When i go from 6 to 7 or 40 as above it starts to throw the exceptions.

Infinite Loop in Kmediods

New push to Kmedoids results in infinite loop when k=1.

When k=1, the continue statement at: https://github.com/elki-project/elki/blob/master/elki/src/main/java/de/lmu/ifi/dbs/elki/algorithm/clustering/kmeans/KMedoidsPAM.java#L200 will always be hit, thus best ending up Positive_infinity and never breaking.

Timeout on instantiating de.lmu.ifi.dbs.elki.gui.util.TreePopup

I have tried to build v0.7.1 on OS X using Java 1.8.0_74 and Maven 3.3.9, and got the error in the subject. What does it depend on?

Here is the trace:

[DEBUG] Executing command line: [java, -cp, /Users/me/Downloads/elki-release0.7.1/elki/target/classes:/Users/me/.m2/repository/net/sf/trove4j/trove4j/3.0.3/trove4j-3.0.3.jar, de.lmu.ifi.dbs.elki.application.internal.DocumentParameters, /Users/me/Downloads/elki-release0.7.1/elki/target/apidocs/parameters-byclass.html, /Users/me/Downloads/elki-release0.7.1/elki/target/apidocs/parameters-byopt.html]
Timeout on instantiating de.lmu.ifi.dbs.elki.gui.util.TreePopup
java.util.concurrent.TimeoutException
java.lang.RuntimeException: java.util.concurrent.TimeoutException
    at de.lmu.ifi.dbs.elki.application.internal.DocumentParameters.buildParameterIndex(DocumentParameters.java:317)
    at de.lmu.ifi.dbs.elki.application.internal.DocumentParameters.main(DocumentParameters.java:149)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.FutureTask.get(FutureTask.java:205)
    at de.lmu.ifi.dbs.elki.application.internal.DocumentParameters.buildParameterIndex(DocumentParameters.java:312)
    at de.lmu.ifi.dbs.elki.application.internal.DocumentParameters.main(DocumentParameters.java:149)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] ELKI Data Mining Framework - Parent Project ........ SUCCESS [  3.659 s]
[INFO] ELKI Data Mining Framework ......................... FAILURE [01:31 min]
[INFO] ELKI Data Mining Framework - Batik Visualization ... SKIPPED
[INFO] ELKI Data Mining Framework - Tutorial Algorithms ... SKIPPED
[INFO] ELKI Data Mining Framework - LibSVM based extensions SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:35 min
[INFO] Finished at: 2016-04-15T11:53:41+02:00
[INFO] Final Memory: 36M/682M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (generate-javadoc-parameters) on project elki: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (generate-javadoc-parameters) on project elki: Command execution failed.
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution failed.
    at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:303)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
    ... 20 more
Caused by: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
    at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:402)
    at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:164)
    at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:746)
    at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:292)
    ... 22 more
[ERROR] 
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :elki

How to fix: No 'by label' reference outlier found, which is needed for weighting!

I'm trying to visualize a rtree but I am getting an error:

Task failed
de.lmu.ifi.dbs.elki.utilities.exceptions.AbortException: No 'by label' reference outlier found, which is needed for weighting!
	at de.lmu.ifi.dbs.elki.application.greedyensemble.VisualizePairwiseGainMatrix.run(VisualizePairwiseGainMatrix.java:140)
	at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI$2.doInBackground(MiniGUI.java:600)
	at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI$2.doInBackground(MiniGUI.java:591)
	at javax.swing.SwingWorker$1.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at javax.swing.SwingWorker.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

I tried adding a field called bylabel

[doc] missing variable c

In documentation:

// Relation containing the number vectors:
Relation<NumberVector> rel = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD);
// We know that the ids must be a continuous range:
DBIDRange ids = (DBIDRange) rel.getDBIDs();

int i = 0;
for(Cluster<KMeansModel> clu : c.getAllClusters()) {

The variable c.getAllClusters() is undefined.

Imprecise variance calculation in MeanVariance.java

Hi,

I ported some of your numerically stable methods to C++ and C# and noticed some weird results.

Consider the following series:

x = [
            150494407424305.47,
            150494407424305.47,
            150494407424305.47,
            150494407424305.47,
            150494407424305.47,
            150494407424305.47,
            150494407424305.47,
            150494407424305.47,
            150494407424305.47,
            150494407424305.47,
            150494407424305.47,
            150494407424305.47
]

Since x is constant the variance should be zero or very close to zero. However, using the MeanVariance class, I get:

naive variance = 5.425347222222222e-05
sample variance = 5.9185606060606055e-05

When the series values are low:

then the precision is good:

naive variance = 6.788729774740758e-28
sample variance = 7.405887026989918e-28

Unfortunately I can't test the original java code as I don't have experience with java and my attempt at building ELKI failed, but I think this should be fairly easy to confirm. My code is basically identical to this.

I realize the values are big but they are not that big. The same issue concerning variance is also present in the PearsonCorrelation class.

Null Pointer dereference in PAMInitialMeans.java

There is a NULL Pointer dereference in https://github.com/elki-project/elki/blob/master/elki/src/main/java/de/lmu/ifi/dbs/elki/algorithm/clustering/kmeans/initialization/PAMInitialMeans.java#L151.

This happens when the if condition i.e https://github.com/elki-project/elki/blob/master/elki/src/main/java/de/lmu/ifi/dbs/elki/algorithm/clustering/kmeans/initialization/PAMInitialMeans.java#L155 is never taken.

We have implemented our own distance computation algorithm in which the condition will never be taken.

Geodetic mindist

Hello,

i was reading your paper "Geodetic Distance Queries on R-Trees for Indexing Geographic Data" and i wasn´t sure what did you mean by all those subscript 360 functions, so i checked your code for that defined here https://github.com/elki-project/elki/blob/e8f3c6fdf54e1e0aa8444b94ad5374ad518dfc0c/elki-core-math/src/main/java/de/lmu/ifi/dbs/elki/math/geodesy/SphereUtil.java, it doesn´t follow the pseudocode. For instance lines 806-813

// Determine whether going east or west is shorter.
    double lngE = rminlng - plng;
    if(lngE < 0) {
      lngE += MathUtil.TWOPI;
    }
    double lngW = rmaxlng - plng; // we keep this negative!
    if(lngW > 0) {
      lngW -= MathUtil.TWOPI;
}

where i think you meant to do

//mod360(rminlng-plng) <= mod360(plng-rmaxlng)
// Determine whether going east or west is shorter.
        double lngW = fmod(rminlng - plng, 2 * M_PI);
        if (lngW < 0)
        {
            lngW += 2 * M_PI;
        }
        double lngE = fmod(plng - rmaxlng, 2 * M_PI);
        if (lngE < 0)
        {
            lngE += 2 * M_PI;
}

Also my theory is that 360 subscript means we are working in radians, not degrees. In fact when i followed your pseudocode in paper exactly it is working for me in C++, but i am still not sure what you meant by sentence "In order to distinguish the other cases, we first need to test whether we are on the left or on the right side by rotating the mean longitude of the rectangle by 180◦ – the meridian opposite of the rectangle." . Do i understand correctly, that you wanted to do modulo 360(2 PI) to normalize the angle to range 0-360 [0..2 PI]?

Also i have found out,that the haversine formula should use absolute difference of latitudes and longitudes. Correct me if i am wrong,but if you dont use absolute values there,you may get negative distances. I attach my version of the algorithm including the Haversine formula

static const double wgs84_radius_m = 6378137;
static const double wgs84_flattening = 1.0 / 298.257223563;
static const double earth_radius_m = wgs84_radius_m * (1 - wgs84_flattening);
static const double earth_radius_km = earth_radius_m / 1000.0;
 struct WGS84Mindist
{
    //input is in radians
    //returns distance in kilometers
    static double haversineFormulaRad(double lat1, double lon1, double lat2, double lon2)
    {
        double d_lat = abs(lat1 - lat2);
        double d_lon = abs(lon1 - lon2);

        double a = pow(sin(d_lat / 2), 2) + cos(lat1) * cos(lat2) * pow(sin(d_lon / 2), 2);

        //double d_sigma = 2 * atan2(sqrt(a), sqrt(1 - a));
        double d_sigma = 2 * asin(sqrt(a));

        return earth_radius_km * d_sigma;
    }

    //input is in radians
    //returns angular distance on unit sphere
    static double haversineFormulaRadAngular(double lat1, double lon1, double lat2, double lon2)
    {
        double d_lat = abs(lat1 - lat2);
        double d_lon = abs(lon1 - lon2);

        double a = pow(sin(d_lat / 2), 2) + cos(lat1) * cos(lat2) * pow(sin(d_lon / 2), 2);

        //double d_sigma = 2 * atan2(sqrt(a), sqrt(1 - a));
        double d_sigma = 2 * asin(sqrt(a));

        return d_sigma;
    }

    //input is in radians
    //returns angle in radians
    static double getBearingRad(double lat1, double lon1, double lat2, double lon2)
    {
        double dLon = lon2 - lon1;
        double y = sin(dLon) * cos(lat2);
        double x = cos(lat1) * sin(lat2) - sin(lat1) * cos(lat2) * cos(dLon);
        double radiansBearing = atan2(y, x);

        return radiansBearing;
    }

    //start, end, point - coordinates in radians
    static double getCrossTrackDistanceRad(double lat1, double lon1, double lat2, double lon2, double lat3, double lon3)
    {
        double angDist1Q = haversineFormulaRadAngular(lat1, lon1, lat3, lon3);

        double cos_lat1 = cos(lat1);
        double sin_lat1 = sin(lat1);
        double cos_lat3 = cos(lat3);
        double cos_lat2 = cos(lat2);

        //double theta1Q = getBearing(lat1, lon1, lat3, lon3);
        // double dLon = lon3 - lon1;
        // double y = sin(dLon) * cos(lat3);
        // double x = cos(lat1) * sin(lat3) - sin(lat1) * cos(lat3) * cos(dLon);
        // double radiansBearing = atan2(y, x);
        double dLon1 = lon3 - lon1;
        double y1 = sin(dLon1) * cos_lat3;
        double x1 = cos_lat1 * sin(lat3) - sin_lat1 * cos_lat3 * cos(dLon1);
        double theta1Q = atan2(y1, x1);

        //double theta12 = getBearing(lat1, lon1, lat2, lon2);
        // double dLon = lon2 - lon1;
        // double y = sin(dLon) * cos(lat2);
        // double x = cos(lat1) * sin(lat2) - sin(lat1) * cos(lat2) * cos(dLon);
        // double radiansBearing = atan2(y, x);
        double dLon2 = lon2 - lon1;
        double y2 = sin(dLon2) * cos_lat2;
        double x2 = cos_lat1 * sin(lat2) - sin_lat1 * cos_lat2 * cos(dLon2);
        double theta12 = atan2(y2, x2);

        return asin(sin(angDist1Q) * sin(theta1Q - theta12)) * earth_radius_km;
    }

    static double latlngMinDistDeg(double &plat, double &plng, double &rminlat, double &rminlng, double &rmaxlat, double &rmaxlng)
    {
        return latlngMinDistRad(GeoDistance::deg2rad(plat), GeoDistance::deg2rad(plng),
                                GeoDistance::deg2rad(rminlat), GeoDistance::deg2rad(rminlng),
                                GeoDistance::deg2rad(rmaxlat), GeoDistance::deg2rad(rmaxlng));
    }

    //returns distance in kilometers
    static double latlngMinDistRad(double plat, double plng, double rminlat, double rminlng, double rmaxlat, double rmaxlng)
    {
        // The simplest case is when the query point is in the same "slice":
        if (rminlng <= plng && plng <= rmaxlng)
        {
            if (plat < rminlat)
            {
                return (rminlat - plat) * earth_radius_km; //Sout of MBR
            }
            else if (plat > rmaxlat)
            {
                return (plat - rmaxlat) * earth_radius_km; //North of MBR
            }
            return 0.0; // INSIDE MBR
        }

        // Determine whether going east or west is shorter.
        double lngW = fmod(rminlng - plng, 2 * M_PI);
        if (lngW < 0)
        {
            lngW += 2 * M_PI;
        }
        double lngE = fmod(plng - rmaxlng, 2 * M_PI);
        if (lngE < 0)
        {
            lngE += 2 * M_PI;
        }

        if (lngW <= lngE)
        {
            // West of MBR
            double tau = tan(plat);

            if (lngW >= 0.5 * M_PI) // Large delta of longitude
            {
                if (tau <= tan((rmaxlat + rminlat) * 0.5) * cos(rminlng - plng))
                {
                    return haversineFormulaRad(plat, plng, rmaxlat, rminlng); //North-West
                }
                else
                {
                    return haversineFormulaRad(plat, plng, rminlat, rminlng); //South-West
                }
            }

            if (tau >= tan(rmaxlat) * cos(rminlng - plng))
            {
                return haversineFormulaRad(plat, plng, rmaxlat, rminlng); //North-West
            }

            if (tau <= tan(rminlat) * cos(rminlng - plng))
            {
                return haversineFormulaRad(plat, plng, rminlat, rminlng); //South-West
            }

            return abs(getCrossTrackDistanceRad(rminlat, rminlng, rmaxlat, rminlng, plat, plng)); // West
        }     
        else
        {
             // East of MBR
             double tau = tan(plat);

             if (lngE >= 0.5 * M_PI) // Large delta of longitude
             {
                 if (tau <= tan((rmaxlat + rminlat) * 0.5) * cos(rmaxlng - plng))
                 {
                    return haversineFormulaRad(plat, plng, rmaxlat, rmaxlng); //North-East
                 }
                 else
                 {
                    return haversineFormulaRad(plat, plng, rminlat, rmaxlng); //Sout-East
                 }
             }

             if (tau >= tan(rmaxlat) * cos(rmaxlng - plng))
             {
                 return haversineFormulaRad(plat, plng, rmaxlat, rmaxlng); //North-East
             }

             if (tau <= tan(rminlat) * cos(rmaxlng - plng))
             {
                 return haversineFormulaRad(plat, plng, rminlat, rmaxlng); //Sout-East
             }

             return abs(getCrossTrackDistanceRad(rmaxlat, rmaxlng, rminlat, rmaxlng, plat, plng)); // East
        }
    }
};

The last thing i dont understand why in pseudocode you are returning c * (rminlat-plat)/360 for the N/S case when i think it should be c * (rminlat-plat) , because you say in text we are calculating length of meridian arc, where c is radius of earth.

Thank you for your answer.

Use a @cite doclet to use inline citations in Javadoc

But as we are currently targeting JDK 8, and a new API arrived in JDK 9, it does not make sense to do this yet. The next long-term Java version 11 is scheduled for end of September 2018.
So for ELKI 0.8 it is an option to target JDK 11, and use the new API then.

How can I get outlier form outlierResult?

I get outlierResult and scores, How can I judge a outlier?
Scores:
1 1.017915661656192
2 1.0608605021777988
3 1.171651951509847
4 1.0359532383112164
5 0.9946463130241695
6 1.0021667682045214
7 1.0664994726755364
8 1.0163041670169992
9 1.0792733520499878
10 1.0654301407031426

Implementation of ODIN does not comply with the original definition

In the publishing paper of ODIN, ODIN is defined as global outlier detection approach. The ODIN outlier score is calculated as indegree of an observation regarding the weighted kNN-Graph. The weights in the kNN-Graph are calculated with distances between the particular observations.

The implementation in ELKI however uses the non-weighted kNN-Graph to calculate the indegree for the ODIN score ( more precise,all weights in the kNN-Graph are equal to 1/k). This change not only does not comply with the original paper. It also makes ODIN a local outlier detection approach.

Source:
V. Hautamaki, I. Karkkainen and P. Franti, "Outlier detection using k-nearest neighbour graph," Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., Cambridge, 2004, pp. 430-433 Vol.3.
doi: 10.1109/ICPR.2004.1334558

Fails to build with gradle-5.6

Log: https://people.freebsd.org/~pi/logs/elki-0.7.1.1166.log

Downstream report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239900

Non-Serializable classes

It would be nice to add serialization in classes. In particular, to save cluster models

Clusters of size 0 (with HDBSCAN/DeLiClu/OPTICS)

Perhaps this is related to #41 and therefore expected.

However my (naive) understanding is that for some of these algorithms (HDBSCAN) the cluster is defined by the enclosed data points. Therefore a zero sized cluster is meaningless.

ParallelGeneralizedDBSCAN missing from latest release?

As the title suggests I was wondering if it's expected that the parallelized DBSCAN implementation is not present in the the release 0.7.1 jar file. The Javadocs on elki-project.github.io still mention it.

For reference:

$  jar tf elki-0.7.1.jar | grep -i dbscan | grep -i parallel | wc -l
       0

INFLO does not compute RNN correctly

RNN computed is only based on the neighbor's of the point; and does not include reverse neighbors which are not in the current point's k neighbors.

This is based on de.lmu.ifi.dbs.elki.algorithm.outlier.lof. INFLO#computeNeighborhoods

As currently written, RNN will always be a subset of knn.

Incorrect processing of column names in NumberVectorLabelParser#getTypeInformation()

When using NumberVectorLabelParser and supplying labelIndices, getTypeInformation is stopping after the desired number of column names has been reached even though some columns have been skipped.
I used this data file: https://www.niss.org/sites/default/files/ScotchWhisky01.txt
And designated RowID and Distillery as label indices.
Before the change in PR #78 I see this (note the column names):

After the change I see this:

CLIQUE - Connected dense units

Inspecting results of elki's CLIQUE clustering implementations revealed an issue of the implementation. The original paper AGGR98 at paragraph 2.1 states that two dense units are connected if they have a common face (they are identical in n-1 dimensions and in 1 dimension they are neighboring).

According to this the following output should be impossible:

# Cluster: cluster_1
# Cluster name: cluster_1
# Cluster noise flag: false
# Cluster size: 500
# Model class: de.lmu.ifi.dbs.elki.data.model.SubspaceModel
# Cluster Mean: 0.5058922207711539, 0.5997058865585326
# Subspace: Dimensions: [1]
# Coverage: 500
# Units: 
#    d1-[0.04; 0.33[    127 objects
#    d1-[0.33; 0.62[    207 objects
#    d1-[0.62; 0.92[    166 objects

In this case all dense units are neighboring in both of the dimensions, yet the current implementation considers it one cluster.

PAM Algorithm Cluster Centroids

I use the PAM Algorithm in ELKI where the axis represent Coordinates. The visualization shows the axis but I need the exact values of the centroids. Is it possible to compute the exact numbers?

Task failed java.lang.OutOfMemoryError: Java heap space

I am trying to cluster word2vec vectors which came from text documents. These are 15 decimal numbers. I tried using DBSCAN, fastoptics etc, however i get below error. Can anyone help me on this? I tried using parser-vector-type as SparseFloatVector, FloatVector and the default one too, however i end up getting below error every time

Task failed
java.lang.OutOfMemoryError: Java heap space
	at gnu.trove.set.hash.TIntHashSet.rehash(TIntHashSet.java:410)
	at gnu.trove.impl.hash.THash.ensureCapacity(THash.java:175)
	at de.lmu.ifi.dbs.elki.database.ids.integer.TroveHashSetModifiableDBIDs.addDBIDs(TroveHashSetModifiableDBIDs.java:88)
	at de.lmu.ifi.dbs.elki.index.preprocessed.fastoptics.RandomProjectedNeighborsAndDensities.getNeighs(RandomProjectedNeighborsAndDensities.java:400)
	at de.lmu.ifi.dbs.elki.algorithm.clustering.optics.FastOPTICS.run(FastOPTICS.java:159)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(AbstractAlgorithm.java:91)
	at de.lmu.ifi.dbs.elki.workflow.AlgorithmStep.runAlgorithms(AlgorithmStep.java:105)
	at de.lmu.ifi.dbs.elki.KDDTask.run(KDDTask.java:112)
	at de.lmu.ifi.dbs.elki.application.KDDCLIApplication.run(KDDCLIApplication.java:61)
	at [...]

Re-run of DeLiClu causes Exception

There seems to be a bug in the implementation which means multiple runs of DeLiClu causes an Exception "DeLiClu heap was empty when it shouldn't have been.".

This does not happen if the index is rebuild at each iteration, and there is no issue using other OPTICS Algorithms.

for (int minPnts : new int[]{5, 10, 15, 20}) {
    if (rebuildIndex) {
            db = new StaticArrayDatabase(dbc, Collections.singletonList(indexFactory));
            db.initialize();
            relations = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD);
    }

    clustering = new ELKIBuilder<>(OPTICSXi.class) //
       .with(DeLiClu.Parameterizer.MINPTS_ID, minPoints) //
       .with(OPTICSXi.Parameterizer.XI_ID, xi) //
       .with(OPTICSXi.Parameterizer.XIALG_ID, DeLiClu.class) //
       .build().run(db);
}

It seems that running DeLiClu may be altering the data index. I'm unsure if it's related but using DeLiClu I can also get an ObjectNotFoundException with the following...

  for (Cluster<? extends Model> cluster : clustering.getAllClusters()) {
      for (DBIDIter it = cluster.getIDs().iter(); it.valid(); it.advance()) {
            try {
                double[] latlng = relations.get(it).toArray();
            }
            catch(ObjectNotFoundException e) {
                logger.error(e.getLocalizedMessage());
            }
        }
  }

Distance-based cluster evaluation algorithms will fail, if input numbers are too big

Hi everyone,

I'm right now trying to cluster a matrix, and did some back and forth on what I did.
The values in the matrix are pretty big, biggest is 10e+300, and the matrix is also pretty dense.
I did clustering with k-means, which also produced results, but all internal cluster evaluation algorithms failed to produce anything.
This is a result from k-means with k=4

Distance-based Davies Bouldin Index 0.0
Distance-based Density Based Clustering Validation NaN
Distance-based C-Index 1.0
Distance-based PBM-Index NaN
Distance-based Silhouette +-NaN NaN
Distance-based Simp. Silhouette +-NaN NaN
Distance-based Mean distance Infinity
Distance-based Sum of Squares Infinity
Distance-based RMSD Infinity
Distance-based Variance Ratio Criteria NaN
# Concordance
Concordance Gamma 0.9999772178605122
Concordance Tau 0.04571359658825246

In the meantime I do the clustering only on the exponents (so 10e+300 converts to 300), and I do now get useful outputs.
So... no idea what is causing this, but I guess something should warn the user.

signed long overflow in Xoroshiro128NonThreadsafeRanom

I noticed an issue of signed long overflow in elki-core-util/src/main/java/elki/utilities/random/Xoroshiro128NonThreadsafeRandom.java

 @Override
  public void setSeed(long seed) {
    long xor64 = seed != 0 ? seed : 4101842887655102017L;
    // XorShift64* generator to seed:
    xor64 ^= xor64 >>> 12; // a
    xor64 ^= xor64 << 25; // b
    xor64 ^= xor64 >>> 27; // c
    s0 = xor64 * 2685821657736338717L;
    xor64 ^= xor64 >>> 12; // a
    xor64 ^= xor64 << 25; // b
    xor64 ^= xor64 >>> 27; // c
    s1 = xor64 * 2685821657736338717L;
  }

I've traced the computed results and the following code has overflow issue:

elki/elki-core-util/src/main/java/elki/utilities/random/Xoroshiro128NonThreadsafeRandom.java

Line 89 in e0e673f

s1 = xor64 * 2685821657736338717L;

Is it on purpose? The comment shows the source is from http://xoroshiro.di.unimi.it/, where I can't find the above code. Could you provide a reference to the original code? Thanks!

Anderberg Hierarchical clustering - maximum array size reached

I am trying to run hierarchical clustering with around 170k vectors of size 768 each and ELKI throws the message

This implementation does not scale to data sets larger than 65536 instances (~16 GB RAM), at which point the Java maximum array size is reached.

my code looks like this:

java -cp "src/dependency/*:src/elki/*" de.lmu.ifi.dbs.elki.application.KDDCLIApplication -dbc.in input.tsv -algorithm clustering.hierarchical.AnderbergHierarchicalClustering

Installation Errors

ant -f C:\Users\336943\Documents\NetBeansProjects\Clustering -Dnb.internal.action.name=build jar
init:
Deleting: C:\Users\336943\Documents\NetBeansProjects\Clustering\build\built-jar.properties
deps-jar:
Updating property file: C:\Users\336943\Documents\NetBeansProjects\Clustering\build\built-jar.properties
Compiling 1 source file to C:\Users\336943\Documents\NetBeansProjects\Clustering\build\classes
C:\Users\336943\Documents\NetBeansProjects\Clustering\src\tutorial\clustering\SameSizeKMeansAlgorithm.java:111: error: method chooseInitialMeans in interface KMeansInitialization<V#2> cannot be applied to given types;
means = initializer.chooseInitialMeans(database, relation, k, getDistanceFunction());
required: Database,Relation,int,NumberVectorDistanceFunction<? super T>,Factory
found: Database,Relation<V#1>,int,NumberVectorDistanceFunction<CAP#2>
reason: cannot infer type-variable(s) T,O
(actual and formal argument lists differ in length)
where T,O,V#1,V#2 are type-variables:
T extends CAP#1 declared in method <T,O>chooseInitialMeans(Database,Relation,int,NumberVectorDistanceFunction<? super T>,Factory)
O extends NumberVector declared in method <T,O>chooseInitialMeans(Database,Relation,int,NumberVectorDistanceFunction<? super T>,Factory)
V#1 extends NumberVector declared in class SameSizeKMeansAlgorithm
V#2 extends NumberVector declared in interface KMeansInitialization
where CAP#1,CAP#2 are fresh type-variables:
CAP#1 extends NumberVector super: V#1 from capture of ? super V#1
CAP#2 extends Object super: V#1 from capture of ? super V#1
C:\Users\336943\Documents\NetBeansProjects\Clustering\src\tutorial\clustering\SameSizeKMeansAlgorithm.java:123: error: incompatible types: double[][] cannot be converted to List<? extends NumberVector>
means = means(clusters, means, relation);
C:\Users\336943\Documents\NetBeansProjects\Clustering\src\tutorial\clustering\SameSizeKMeansAlgorithm.java:130: error: incompatible types: double[] cannot be converted to Vector
result.addToplevelCluster(new Cluster<>(clusters.get(i), new MeanModel(means[i])));
C:\Users\336943\Documents\NetBeansProjects\Clustering\src\tutorial\clustering\SameSizeKMeansAlgorithm.java:151: error: cannot find symbol
final double d = c.dists[i] = df.distance(fv, DoubleVector.wrap(means[i]));
symbol: method wrap(double[])
location: class DoubleVector
C:\Users\336943\Documents\NetBeansProjects\Clustering\src\tutorial\clustering\SameSizeKMeansAlgorithm.java:237: error: cannot find symbol
c.dists[i] = df.distance(fv, DoubleVector.wrap(means[i]));
symbol: method wrap(double[])
location: class DoubleVector
C:\Users\336943\Documents\NetBeansProjects\Clustering\src\tutorial\clustering\SameSizeKMeansAlgorithm.java:340: error: incompatible types: double[][] cannot be converted to List<? extends NumberVector>
means = means(clusters, means, relation);
Note: Some messages have been simplified; recompile with -Xdiags:verbose to get full output
6 errors
C:\Users\336943\Documents\NetBeansProjects\Clustering\nbproject\build-impl.xml:930: The following error occurred while executing this line:
C:\Users\336943\Documents\NetBeansProjects\Clustering\nbproject\build-impl.xml:270: Compile failed; see the compiler error output for details.
BUILD FAILED (total time: 1 second)

How can I access class "description" if GNOME error is thrown all the time?

I am trying to list parameters which I can pass to e.g. DBSCAN but there is no way to do that since GNOME error is blocking everything:

java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.lmu.ifi.dbs.elki.application.ELKILauncher.main(ELKILauncher.java:72)
Caused by: java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
	at java.awt.Toolkit.loadAssistiveTechnologies(Toolkit.java:807)
	at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:886)
	at de.lmu.ifi.dbs.elki.gui.GUIUtil.setLookAndFeel(GUIUtil.java:73)
	at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI.main(MiniGUI.java:497)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.lmu.ifi.dbs.elki.application.ELKILauncher.main(ELKILauncher.java:72)

I have my ELKI package located in src and I am running:

java -jar src/elki/elki-0.7.0.jar -description de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.DBSCAN

I also tried

java -cp "src/elki/*:src/dependency/*" -description de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN

but description does not exists.

hello [Parser-ing]

& thanks for ELKI

Dont have any mighty algos but if yr interested in a parser for numerical matlab (.mat) files i made one wrapping this jmatio

It is based on yours arff parser (i copied everything;) but it works

Incorrect comparision in KmedoidsPAM.java.

Location: https://github.com/elki-project/elki/blob/master/elki/src/main/java/de/lmu/ifi/dbs/elki/algorithm/clustering/kmeans/KMedoidsPAM.java#L221
Here, the comparison needs to be with distnew. i.e distnew > distsec.
We check with new distance to see, if it is greater than the second best. The current comparison does not makes sense because distcur is always less than distsec. Thus distcur > distsec will always be false.
Am I missing something here?

Naming JUnit tests

It's quite good convention to name JUnit test files with suffix Test, not a prefix.

It makes much easier to read the code, as most IDEs support switching between class and its test.

Would you accept a PR with renaming test files?

Cluster of size 0 (with EM algorithm)

Hi everyone,

currently I'm trying to cluster some artificial data. While I'm horribly failing at it (due to noise, I think), I stumbled about some weird occurrence in one of my results.
I clustered my data with the EM algorithm. In multiple instances I get clusters of size 0.
How can that happen?
I don't think this is intended behavior, right?

I've attached 2 results, EM algorithm with k set to 7 and 8, with the former having 4 empty clusters and the latter 1 empty cluster.
I've also attached the input table which I've used to get these results.

My Elki version is 7.2 from December 4, cloned from here.
It's running with java version "1.8.0_144" on an Ubuntu 14.04 (cannot update that, it's a cluster).

Hope someone can have a look at this :).

Regards,
Bastian

EM7.tar.gz
EM8.tar.gz
absolute_counts_per_contig.csv.percentages_per_row.tar.gz

Typo on official website

It says "reproducabe" at http://elki.dbs.ifi.lmu.de

Should be "reproducible".

Cheers!

PrimsMinimumSpanningTree ArrayIndexOutOfBoundsException

I'm getting an exception, probably because my data set is too small for HDBSCAN (there are only two data points in in the particular data set when the exception is thrown). The data set works fine with the other Clustering algorithms.

I can catch the Exception in my code but perhaps it would be best if it were caught within ELKI.

  clustering = new ELKIBuilder<>(HDBSCANHierarchyExtraction.class) //
   .with(HDBSCANHierarchyExtraction.Parameterizer.MINCLUSTERSIZE_ID, minPoints) //
   .with(HDBSCANLinearMemory.Parameterizer.MIN_PTS_ID, minPoints) //
   .with(AbstractAlgorithm.ALGORITHM_ID, HDBSCANLinearMemory.class) //
   .build().run(db);


Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
	at de.lmu.ifi.dbs.elki.math.geometry.PrimsMinimumSpanningTree.processDense(PrimsMinimumSpanningTree.java:170)
	at de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.HDBSCANLinearMemory.run(HDBSCANLinearMemory.java:122)
	at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(AbstractAlgorithm.java:87)
	at de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.HDBSCANLinearMemory.run(HDBSCANLinearMemory.java:79)
	at de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.extraction.HDBSCANHierarchyExtraction.run(HDBSCANHierarchyExtraction.java:129)
	at uk.ac.shef.wit.active10.CreateStaypoints.cluster(CreateStaypoints.java:708)
	at uk.ac.shef.wit.active10.CreateStaypoints.main(CreateStaypoints.java:1151

How can I cluster data using a distance matrix with the ELKI library?

I have a distance matrix and I want to use that distance matrix when clustering my data.

I've read the ELKI documentation and it states that I can overwrite the distance method when extending the AbstractNumberVectorDistanceFunction class.

The distance class however, returns the coordinates. So from coordinate x to coordinate y. This is troublesome because the distance matrix is filled only with distance values and we use the indexes to find the distance value from index x to index y. Here's the code from the documentation:

public class TutorialDistanceFunction extends AbstractNumberVectorDistanceFunction {
  @Override
  public double distance(NumberVector o1, NumberVector o2) {
    double dx = o1.doubleValue(0) - o2.doubleValue(0);
    double dy = o1.doubleValue(1) - o2.doubleValue(1);
    return dx * dx + Math.abs(dy);
  }
}

My question is how to correctly use the distance matrix when clustering with ELKI.

Could not find logging configuration file "logging.properties" in version 0.7.1

Hi!
I apologise for disturbing if i am wrong for openning this issue here.

I am using Elki v.0.7.1 library in my project and i am obtain the message in subject issue.
I have checked the library and the "logging.properties" file exists.

I attach two screenshots with the issue.

I hope your answer. Thanks a lot.

How to get the p-value of the Anderson-Darling test?

How can I get the p-value of the two samples Anderson-Darling-Test?

I used

		StandardizedTwoSampleAndersonDarlingTest ad = new StandardizedTwoSampleAndersonDarlingTest();
		pi.AndersonDarlingValue = ad.unstandardized(d1, d2);		
		pi.AndersonDarlingPValue = ad.deviation(d1, d2); //p-value ???

ArrayModifiableIntegerDBIDs.java has protected constructors

Hi,

Is ArrayModifiableIntegerDBIDs supposed to be a non-public class with protected constructors? If yes, then how do we instantiate a new object of this class?

https://github.com/elki-project/elki/blob/master/elki-core-dbids-int/src/main/java/de/lmu/ifi/dbs/elki/database/ids/integer/ArrayModifiableIntegerDBIDs.java

Thanks,
Deepak

cc: @anam282

Eclipse Mars launch of MiniGUI fails

I followed the eclipse configuration for Elki and console indicates build was successful
But when I try to run ELKI MiniGUI from Run Configurations it fails with:

An internal error occurred during: "Launching ELKI MiniGUI".
Model not available for elki

EuclideanRStarTreeKNNQuery NPE

Possibly related to #46.

NPE thrown, I think because I only have a single data point in the data set.

java.lang.NullPointerException
	at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.query.EuclideanRStarTreeKNNQuery.expandNode(EuclideanRStarTreeKNNQuery.java:105)
	at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.query.EuclideanRStarTreeKNNQuery.getKNNForObject(EuclideanRStarTreeKNNQuery.java:87)
	at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.query.EuclideanRStarTreeKNNQuery.getKNNForObject(EuclideanRStarTreeKNNQuery.java:56)
	at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.query.RStarTreeKNNQuery.getKNNForDBID(RStarTreeKNNQuery.java:94)
	at de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.AbstractHDBSCAN.computeCoreDists(AbstractHDBSCAN.java:110)
	at de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.HDBSCANLinearMemory.run(HDBSCANLinearMemory.java:116)
	at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(AbstractAlgorithm.java:87)
	at de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.HDBSCANLinearMemory.run(HDBSCANLinearMemory.java:79)
	at de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.extraction.HDBSCANHierarchyExtraction.run(HDBSCANHierarchyExtraction.java:129)
	at uk.ac.shef.wit.active10.CreateStaypoints.cluster(CreateStaypoints.java:707)
	at uk.ac.shef.wit.active10.CreateStaypoints.main(CreateStaypoints.java:1151)

When will the next official version be released? Release 0.7.1 throws ClassCastException in ELKIServiceRegistry

After downloading the new release and trying to launch it via console a ClassCastException was thrown due to line 53 in ELKIServiceRegistry

private static final URLClassLoader CLASSLOADER = (URLClassLoader) ClassLoader.getSystemClassLoader();

The issue apparently is fixed in the current repository

elki/elki-core-util/src/main/java/de/lmu/ifi/dbs/elki/utilities/ELKIServiceRegistry.java

Line 43 in ec64a0f

 private static final ClassLoader CLASSLOADER = ELKIServiceRegistry.class.getClassLoader(); 

Click to expand stack trace


java.lang.ExceptionInInitializerError
at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI.setupAppChooser(MiniGUI.java:247)
at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI.(MiniGUI.java:198)
at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI$9.run(MiniGUI.java:737)
at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:313)
at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:770)
at java.desktop/java.awt.EventQueue.access$600(EventQueue.java:97)
at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721)
at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:87)
at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:740)
at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)
Caused by: java.lang.ClassCastException: java.base/jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to java.base/java.net.URLClassLoader
at de.lmu.ifi.dbs.elki.utilities.ELKIServiceRegistry.(ELKIServiceRegistry.java:53)
at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI.setupAppChooser(MiniGUI.java:247)
at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI.(MiniGUI.java:198)
at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI$9.run(MiniGUI.java:737)
at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:313)
at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:770)
at java.desktop/java.awt.EventQueue.access$600(EventQueue.java:97)
at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721)
at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:87)
at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:740)
at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)

Java 10 Windows 10

I would like to know when the next pre build release will be rolled out.

UnsupportedOperationException when using DBSCAN with RStarTree

I am using ELKI for clustering and I tried it more than 1k times on many datasets and it was fine :D
but when i started it on one of my files (it was the big one) I saw an error in initializing tree.
the whole command and result is here:
java -jar elki-bundle-0.7.1.jar KDDCLIApplication -verbose -verbose -enableDebug true -dbc.in my_input -parser.labelIndices 0 -db.index tree.spatial.rstarvariants.rstar.RStarTreeFactory -time -algorithm clustering.DBSCAN -algorithm.distancefunction geo.LngLatDistanceFunction -geo.model SphericalHaversineEarthModel -dbscan.epsilon 50.0 -dbscan.minpts 446 -resulthandler ResultWriter,ExportVisualizations -out my_output -vis.output my_visOutput de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection.load: 5716 ms de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeIndex.directory.capacity: 95 de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeIndex.directory.minfill: 38 de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeIndex.leaf.capacity: 153 de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeIndex.leaf.minfill: 61 Node is not a directory node! java.lang.UnsupportedOperationException: Node is not a directory node! at de.lmu.ifi.dbs.elki.index.tree.AbstractNode.addDirectoryEntry(AbstractNode.java:240) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.insertDirectoryEntry(AbstractRStarTree.java:194) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.reInsert(AbstractRStarTree.java:655) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.strategies.overflow.LimitedReinsertOverflowTreatment.handleOverflow(LimitedReinsertOverflowTreatment.java:97) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.overflowTreatment(AbstractRStarTree.java:571) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.adjustTree(AbstractRStarTree.java:676) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.adjustTree(AbstractRStarTree.java:705) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.insertLeafEntry(AbstractRStarTree.java:175) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.reInsert(AbstractRStarTree.java:649) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.strategies.overflow.LimitedReinsertOverflowTreatment.handleOverflow(LimitedReinsertOverflowTreatment.java:97) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.overflowTreatment(AbstractRStarTree.java:571) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.adjustTree(AbstractRStarTree.java:676) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.insertLeafEntry(AbstractRStarTree.java:175) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.AbstractRStarTree.insertLeaf(AbstractRStarTree.java:151) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeIndex.insert(RStarTreeIndex.java:104) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeIndex.insertAll(RStarTreeIndex.java:129) at de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeIndex.initialize(RStarTreeIndex.java:94) at de.lmu.ifi.dbs.elki.database.StaticArrayDatabase.initialize(StaticArrayDatabase.java:168) at de.lmu.ifi.dbs.elki.workflow.InputStep.getDatabase(InputStep.java:63) at de.lmu.ifi.dbs.elki.KDDTask.run(KDDTask.java:108) at de.lmu.ifi.dbs.elki.application.KDDCLIApplication.run(KDDCLIApplication.java:61) at de.lmu.ifi.dbs.elki.application.AbstractApplication.runCLIApplication(AbstractApplication.java:194) at de.lmu.ifi.dbs.elki.application.KDDCLIApplication.main(KDDCLIApplication.java:96) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at de.lmu.ifi.dbs.elki.application.ELKILauncher.main(ELKILauncher.java:60)

multiple calling of LOF from Scala leads to AbortException: DBID range allocation error

Hello,
I am using the LOF implementation from Elki for some experiments. Since I work in scala, I have written a wrapper to call it from the jar (version 0.7.2). The wrapper is the following:

import de.lmu.ifi.dbs.elki.algorithm.outlier.lof.LOF
import de.lmu.ifi.dbs.elki.database.StaticArrayDatabase
import de.lmu.ifi.dbs.elki.datasource.ArrayAdapterDatabaseConnection
import de.lmu.ifi.dbs.elki.distance.distancefunction.minkowski

import scala.collection.mutable.ListBuffer

/**
  * Created by fouchee on 04.09.17.
  */
case class ElkiLOF(k: Int) extends OutlierDetector {
  def computeScores(instances: Array[Array[Double]]): Array[(Int, Double)] = {
    val distance = new minkowski.EuclideanDistanceFunction
    val lof = new LOF(k, distance)

    val dbc = new ArrayAdapterDatabaseConnection(instances) // Adapter to load data from an existing array.
    val db = new StaticArrayDatabase(dbc, null) // Create a database (which may contain multiple relations!)
    db.initialize()
    val result = lof.run(db).getScores()

    var scoreList = new ListBuffer[Double]()
    val DBIDs = result.iterDBIDs()
    while ( {
      DBIDs.valid
    }) {
      scoreList += result.doubleValue(DBIDs)
      DBIDs.advance
    }
    scoreList


    val corrected = scoreList.map {
      case d if d.isNaN => 1.0 // Or whatever value you'd prefer.
      case d if d.isNegInfinity => 1.0 // Or whatever value you'd prefer.
      case d if d.isPosInfinity => 1.0 // Or whatever value you'd prefer.
      case d => d
    }
    corrected.toArray.zipWithIndex.map(x => (x._2, x._1))
  }
}

It works well, and the implementation is really fast I must say. However, if I run it multiple times in parallel (and I mean with a lot of different data sets and repetition) at some point I run into the following error:

[error] Exception in thread "main" de.lmu.ifi.dbs.elki.utilities.exceptions.AbortException: DBID range allocation error - too many objects allocated!
[error] 	at de.lmu.ifi.dbs.elki.database.ids.integer.TrivialDBIDFactory.generateStaticDBIDRange(TrivialDBIDFactory.java:72)
[error] 	at de.lmu.ifi.dbs.elki.database.ids.DBIDUtil.generateStaticDBIDRange(DBIDUtil.java:196)
[error] 	at de.lmu.ifi.dbs.elki.database.StaticArrayDatabase.initialize(StaticArrayDatabase.java:129)
[error] 	at com.edouardfouche.detectors.ElkiLOF$.computeScores(ElkiLOF.scala:29)

I don't know how to correct this error. When I run in sequential, it occurs at some point as well. It seems that the underlying TrivialDBIDFactory does not deallocate the DBIDs that are not in use anymore. I found here the piece of code that launch the error http://www.massapi.com/class/de/lmu/ifi/dbs/elki/utilities/exceptions/AbortException-5.html

Any idea how to avoid that?

Thank you,
Edouard

if -dbc.in does not exist, KDDCLIApplication will throw non-useful error

Hi everyone,

this time reporting not a big issue.
If I run ELKI as KDDCLIApplication, and if I get by accident the -dbc.in wrong, and the file does not exist, I will get as error

ERROR: The following configuration errors prevented execution:
Error instantiating internal class: elki.workflow.InputStep Path component should be '/'
The following parameters were not processed: [/home/bastian/data/nonexisting_file]
Stopping execution because of configuration errors above.

I think it would be nice if that could be caught by an useful error message :)
(bc it took me a while to figure out what's wrong)

java.lang.ClassCastException when running elki 7.0.1 on OpenJDK

I switched from Oracle JDK 8.x to Open JDK 11.x recently, now my simulation based on ELKI doesn't work any more.
The problem seems to be located in ELKIServiceRegistry

java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
at de.lmu.ifi.dbs.elki.utilities.ELKIServiceRegistry.(ELKIServiceRegistry.java:53)

I guess I need to switch back to Oracle Java in the meanwhile.

Naive quantiles for exponentially modified gaussian

I needed 99% and 99.9% quantiles for an EMG and was able to get decent results with something as simple as

double x = emg.getMean() + emg.getStddev() - Math.log(1 - qt) / emg.getLambda();
for (int i = 0; i < 10; i++)
    x -= (emg.cdf(x) - qt) / emg.pdf(x);

I'm wondering: is that worthy of a PR, or too stupid?

Per category evaluation of a clustering

Apart from the evaluations of the complete clustering, it would be nice to be able to get the per-label statistics to understand how the individual quality of the categories affect the global clustering quality.

What do you think?

Implementation of code in ELKI

I am new to elki. I have gone through documentation of elki I found huge collection of clustering algorithms, thanks for the contributes. Since I am new to elki I find difficult in implementation of algorithms using miniGUI. Is there any other way for easy understanding quickly so that contributions can be done in a fast manner. Please suggest any one. Thank you.

Fastutil >8.5.3 not supported

FastUtil removed Int2Float-components in 8.5.3 and 8.5.4 therefore these versions cannot be used. Latest working version is 8.5.2.

elki-project / elki Goto Github PK

elki's People

Stargazers

Watchers

Forkers

elki's Issues

Recommend Projects

Recommend Topics

Recommend Org