Hello! As I told you on mloss, I would like to modify the CMIM to ac

Add pre given feautes to CMIM. about feast HOT 19 CLOSED

craigacp commented on August 21, 2024

Add pre given feautes to CMIM.

from feast.

Comments (19)

mbq commented on August 21, 2024 2

FYI: https://github.com/mbq/praznik

from feast.

Craigacp commented on August 21, 2024

You need to update the score as well, as otherwise the CMIM algorithm doesn't realise those features have been selected so doesn't use their mi values to help select new features (and you might not pop out of the while loop properly). I'd add a loop that added the features before the for(i=1) loop and updated the score function.

from feast.

alecuba16 commented on August 21, 2024

Thank you for your fast reply, I continued today working on it.

True, I forgot to update the score.

So, changing this is enougth to make it possible?

I don't have to change to modify the mitoolbox entropy function and so on?

So it finally can looks like:

// Added pre-selected score updater and output features adder.
for( my pre-selected features ){
  if( classMI[j] > score ) {
    score = classMI[j];
  }
  outputFeatures[i] = j;
}
//Your original loop line 105
for (i=1 ; i <k;..)
   {
   for(j=0; ...)
      {
     **  if( j is not in my pre-selected features)**
           {
            while()
                  {
                  }
           }
       }
     }

thanks!

from feast.

alecuba16 commented on August 21, 2024

Sorry for bother you again,

They told me if I can mod in the same way the condMi.

As it looks almost the same structure that CMIM but I think that is easier to mod:

Mod lines 94 to 109

And replace by:

j=0;
for (i = 0; i < MyPreselectedFeatures;i++)
  {    
    /*calculate mutual info
    **double calculateMutualInformation(double *firstVector, double *secondVector, int vectorLength);
    */
    classMI[i] = calculateMutualInformation(feature2D[i], classColumn, noOfSamples);

    if (classMI[i] > maxMI)
    {
      maxMI = classMI[i];
      maxMICounter = i;
    }/*if bigger than current maximum*/
   selectedFeatures[i] = 1;
   outputFeatures[j] = i;
   j++;
  }/*for noOfFeatures - filling classMI*/

from feast.

Craigacp commented on August 21, 2024

Not quite. You're updating the scores with the class MI which is I(X;Y), but in CMIM you need to update it with min_{s \in preselected features} I(X;Y|X_s), and for CondMI you need to call mergeArrays repeatedly to merge all the preselected features into the condition vector, then start the loop at L119 with i=0 (skipping the loop at L94).

As I said on MLOSS unless you have a very large amount of data then CondMI will perform poorly with any reasonable number of selected features.

You won't have to modify any MIToolbox functions, these operations all happen at the feature selection algorithmic level, it doesn't use different information theoretic operations.

from feast.

alecuba16 commented on August 21, 2024

Thank for your fast reply,

About CondMI, true, I forgot to add all the preselected in the condition vector, I will fix and let you know.

About CMIM , So as far as I understood I have to find the minimal Mutual information of my preselected, that is almost copy the code from 85 to 94 but with my preselected variables and look for the minimal MI, save the position and use it as the maxMiCounter, isn't it?

from feast.

Craigacp commented on August 21, 2024

For CMIM you need to make the classMI array have the minimal value so classMI[j] = min_{s \in preselected_features} I(X_j;Y|X_s), and set the lastUsedFeature[j] = s. That ensures the invariant is setup properly for the loop at line 104 to work properly. You should also start that loop with i = |preselected_features| + 1.

from feast.

alecuba16 commented on August 21, 2024

I think that I catch it.

For the loop I will have to just skip because preselected_features might come interleaved, I mean, maybe the selected features are 2, 4,7 from a total feature list of 1 to 9.

So I will have to skip inside the loop if I find that is already preselected.

I have already told him about the CondMi performance, but they measured different algorithms of your toolbox with 36 to 50 features and the best performing in terms of accuracy of his model is CondMi. But with low number of features the exhaustive method that they have implemented give better results (obvious) but is pretty slow when are selecting more than 10 features with the dataset that they are working.

Since they are using a different type of models and dynamically requieres different features, they asked me to modify your toolbox.

thanks for your help.

from feast.

Craigacp commented on August 21, 2024

You don't need to skip a feature in the loop for CMIM (or CondMI) as I(X;Y|X) = 0 (for CMIM) and I(X;Y|{S,X}) = 0 (for CondMI). This means they won't be selected multiple times, though it will do slightly more computation.

If it's working for them and they are happy with the performance then by all means use CondMI.

from feast.

alecuba16 commented on August 21, 2024

Thank you for your support, you almost gave me the code modified 👍

Can I fork your code on my github with my modifications?

from feast.

Craigacp commented on August 21, 2024

Sure.

from feast.

alecuba16 commented on August 21, 2024

Hello again!

I'm planning make a port of FEAST to R , as a .C import, there is is some work already done by someone?

Greetings!

from feast.

Craigacp commented on August 21, 2024

I'm not aware of anyone writing an R wrapper. A couple of people from Drexel University wrote a Python wrapper, you might find that a helpful starting point. Be careful as the FEAST C API expects Fortran style matrices rather than C style ones (due to it's history as a Matlab library).

from feast.

alecuba16 commented on August 21, 2024

I have implemented a general C interface of your library (callable from any other platform) and packaged into a R package and it is working well, I will upload to github later.

I think that I might have found a "bug" or limitation about your original matrix normalization into arrayoperations at the MItoolbox, that let me unable to use your library with my real-data datasets.

When the range that gets the values of the features are pretty big it reserves lots of memory because the vector normalize reserves the multiplication of the numer of states,

My computer has about 12GB free Ram and I cannot run any feature selection without get out of memory even with your original matlab library.

I was waiting to have enough time to provide a full detailed debug report of memory space but meanwhile I send you some details and a subset of the dataset that I'm using (real data) that generates that problem, as an example.

Message on matlab your original library :
The targets -> alarms.csv.zip
The features -> features.csv.zip

Finally, I would ask how much will change the feature selection result if I discretize and normalize previously the features with Zscore.

from feast.

Craigacp commented on August 21, 2024

That's expected behaviour, you should remap the feature values so they are dense and increasing from 0. It won't change the output of FEAST as the information calculation doesn't actually care about the feature values, it only uses equality not the ordering. I can't get around the memory usage without writing a hashmap and using that to store the probability distributions, or doing linear probing for every single value in the data (which would be incredibly slow).

What else did you have to add to make it useful in C (apart from flipping the arrays so they are columnwise rather than rowwise)?

If you normalize the features in a different way then you'll get a slightly different answer. How different is very difficult to describe as I don't know what your current discretisation is, and figuring out the transform between the two is hard.

from feast.

Craigacp commented on August 21, 2024

@mbq That R library looks good. I'm not sure exactly what the compilation path is for a C extension in R, is there a reason you have to break apart the header files like that? Would adding more flags to the header files help? Also, we should move this discussion to a different issue.

from feast.

mbq commented on August 21, 2024

Not really; praznik started as a fast&dirty project to compare FEAST with my own fs methods, so I aimed to avoid touching your code (not to break anything), statically bundle in a single object (to avoid installing libraries on the system level) and to reduce compilation into a single make call (for convenience, in principle R can even handle an entire autotools build, provided that it will make a single so at the end).

For CRAN (official R library repository) compatibility, though, you cannot have any links to stuff like printf and exit, removing them would require appropriate flags. Also some more detailed documentation and some examples would be adviced.

And thumbs up for moving the discussion about an R port elsewhere.

from feast.

alecuba16 commented on August 21, 2024

@Craigacp Well, the values are continuous ,so if I map to a range from 0 to N I will be bound by the number of samples if each sample is different, Once I return from holidays I will try to map the values and think a way to reduce the number of possible states.

If you want ,you can check my code here:

https://github.com/alecuba16/FEASTR

I have reworked your C-R interface to be as general as posible for others uses and added some additional functions to manage the row by column vectorization or vice versa.

from feast.

Craigacp commented on August 21, 2024

@alecuba16 FEAST doesn't support continuous values. You need to discretise them first, which will account for the extreme memory usage on your test data. In our experiments we found 10 bins of equal width to work pretty well for most things, but there are plenty of more complicated strategies which might work better. Unfortunately the C & MATLAB interfaces expect doubles for historical reasons rather than the ints they should accept.

from feast.

Add pre given feautes to CMIM. about feast HOT 19 CLOSED

Comments (19)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent