Hi, does anyone happen to have the optimized LGB model .txt file? Specifically for the

After relooking at the ember source code for the train , I realize that the defa

Saved Model .txt for Optimised LGB on EMBER2018 about ember HOT 6 CLOSED

elastic commented on May 12, 2024

Saved Model .txt for Optimised LGB on EMBER2018

from ember.

Comments (6)

mrphilroth commented on May 12, 2024

The optimized model is available in the large data download (available at https://pubdata.endgame.com/ember/ember_dataset_2018_2.tar.bz2). The LGB text file named ember_model_2018.txt is in there with the jsonl files:

$ tar xvf ember_dataset_2018_2.tar
x ember2018/
x ember2018/train_features_1.jsonl
x ember2018/train_features_0.jsonl
x ember2018/train_features_3.jsonl
x ember2018/test_features.jsonl
x ember2018/ember_model_2018.txt
x ember2018/train_features_5.jsonl
x ember2018/train_features_4.jsonl
x ember2018/train_features_2.jsonl

from ember.

wilsoncwj commented on May 12, 2024

Thanks for the quick reply! I assumed that the ember_model_2018.txt was the "unoptimized" version. Seems like I've been using the right one all along!

from ember.

wilsoncwj commented on May 12, 2024

To clarify, in your research paper EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models, it was mentioned that:

From the vectorized features, we trained a gradient-boosed decision tree (GBDT) model using LightGBM with
default parameters (100 trees, 31 leaves per tree), resulting in fewer than 10K tunable parameters [14]. Model training
took 3 hours. Baseline model performance may be much improved with appropriate hyper-parameter optimization,
which is of less interest to us in this work.

Furthermore, in your source code for ember, the optimized version would have been saved as lgbm_model.save_model(os.path.join(args.datadir, "optimised_model.txt"))

Therefore, I assumed that the ember_model_2018.txt was the original "unoptimized" version.
Hence the clarification!

from ember.

mrphilroth commented on May 12, 2024

Ah. That makes sense. That quote was written and released with EMBER 2017. In that case, we only released the default model and I don't have any optimized model anymore to post. We did not release a new paper for the EMBER 2018 release. For this one, the model is already optimized with this grid search:
https://docs.google.com/presentation/d/1A13tsUkgWeujTy9SD-vDFfQp9fnIqbSE_tCihNPlArQ/edit#slide=id.g6318784c2c_0_1131

from ember.

wilsoncwj commented on May 12, 2024

I see. So just to confirm the ember_model_2018.txt as part of https://pubdata.endgame.com/ember/ember_dataset_2018_2.tar.bz2 is the unoptimized LGB model?

My problem is that I am unable to train the LGB model locally due to Out-Of-Memory (OOM) issues, hence I am asking around for the optimized_model.txt so I can just load it in.

Once again, wondering if anyone out there has successfully trained LGB with the --optimize flag to arrive at the following best params and able to share the resulting optimized_model.txt?

From the slides shared by Phil:

best_params = {
  “boosting”: “gbdt”,
  “objective”: “binary”,
  “num_iterations”: 1000,
  “learning_rate”: 0.05,
  “num_leaves”: 2048,
  “feature_fraction”: 0.5,
  “bagging_fraction”: 1.0,
  “max_depth”: 15,
  “min_data_in_leaf”: 50,
}

from ember.

wilsoncwj commented on May 12, 2024

After relooking at the ember source code for the train script, I realize that the default set of parameters are already the optimized parameters. If you compare what Phil had in his CAMLIS 2019 presentation (posted in above comment) and the original params:

Default train script:
params = {
        "boosting": "gbdt",
        "objective": "binary",
        "num_iterations": 1000,
        "learning_rate": 0.05,
        "num_leaves": 2048,
        "max_depth": 15,
        "min_data_in_leaf": 50,
        "feature_fraction": 0.5
    }

The only difference is the addition of bagging_fraction”: 1.0, which according to the LGBM documentation is already the default value.

from ember.

Saved Model .txt for Optimised LGB on EMBER2018 about ember HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent