Hi, is it possible to get feature importances in TabularNLPAutoML for regular features

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Feature importances in TabularNLPAutoML about lightautoml HOT 3 OPEN

fingoldo commented on May 24, 2024

Feature importances in TabularNLPAutoML

from lightautoml.

Comments (3)

alexmryzhkov commented on May 24, 2024

Hi @fingoldo,

Thanks for the issue. Could you also share the code how you setup task, roles and TabularNLPAutoml with the full training log as well?

Alex

from lightautoml.

fingoldo commented on May 24, 2024

Thanks for the the quick reply, Alex! Sure.
Basically, it's this:

N_THREADS = multiprocessing.cpu_count()
MEMORY_LIMIT = psutil.virtual_memory().total * 0.9 / 1024 ** 3
verbose = 1
task = Task("reg", loss="mse", metric="mae")
timeout = 60 * 60 * 3
automl=TabularNLPAutoML(task=task, timeout=timeout, cpu_limit=N_THREADS, gpu_ids="all", text_params={"lang": "en"},)

automl.fit_predict(X,roles={"text": ["title"], "drop": [], "target": TARGET_COLUMN})

the log:

[14:43:54] Stdout logging level is INFO.

2022-03-27 14:43:54,513 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - set_verbosity_level-line:267 - Stdout logging level is INFO.
2022-03-27 14:43:54,535 - INFO3 - MainProcess[19272]-MainThread[19072]-text_presets.py-lightautoml.automl.presets.text_presets - infer_auto_params-line:230 - Model language mode: en

[14:43:54] Task: reg

2022-03-27 14:43:54,556 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:196 - Task: reg

[14:43:54] Start automl preset with listed constraints:

2022-03-27 14:43:54,558 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:198 - Start automl preset with listed constraints:

[14:43:54] - time: 10800.00 seconds

2022-03-27 14:43:54,559 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:199 - - time: 10800.00 seconds

[14:43:54] - CPU: 32 cores

2022-03-27 14:43:54,561 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:200 - - CPU: 32 cores

[14:43:54] - memory: 16 GB

2022-03-27 14:43:54,563 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:201 - - memory: 16 GB

[14:43:54] Train data shape: (9000, 290)

2022-03-27 14:43:54,565 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.reader.base - fit_read-line:274 - Train data shape: (9000, 290)

2022-03-27 14:43:57,354 - INFO3 - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.reader.base - advanced_roles_guess-line:607 - Feats was rejected during automatic roles guess: []

[14:43:57] Layer 1 train process start. Time left 10797.12 secs

2022-03-27 14:43:57,443 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:213 - Layer 1 train process start. Time left 10797.12 secs

[14:44:02] Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...

2022-03-27 14:44:02,316 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:245 - Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...

[14:44:05] Fitting Lvl_0_Pipe_0_Mod_0_LinearL2 finished. score = -940.749755859375

2022-03-27 14:44:05,244 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:293 - Fitting Lvl_0_Pipe_0_Mod_0_LinearL2 finished. score = -940.749755859375

[14:44:05] Lvl_0_Pipe_0_Mod_0_LinearL2 fitting and predicting completed

2022-03-27 14:44:05,246 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:296 - Lvl_0_Pipe_0_Mod_0_LinearL2 fitting and predicting completed

[14:44:05] Time left 10789.31 secs

2022-03-27 14:44:05,257 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:223 - Time left 10789.31 secs

2022-03-27 14:44:06,717 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'params': 'FastText(vocab=0, vector_size=64, alpha=0.025)', 'datetime': '2022-03-27T14:44:06.717633', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'created'}
2022-03-27 14:44:06,725 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - scan_vocab-line:578 - collecting all words and their counts
2022-03-27 14:44:06,726 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _scan_vocab-line:561 - PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2022-03-27 14:44:06,745 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - scan_vocab-line:584 - collected 10828 word types from a corpus of 46369 raw words and 9000 sentences
2022-03-27 14:44:06,746 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - prepare_vocab-line:633 - Creating a fresh vocabulary
2022-03-27 14:44:06,824 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'effective_min_count=1 retains 10828 unique words (100.0%% of original 10828, drops 0)', 'datetime': '2022-03-27T14:44:06.824618', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'prepare_vocab'}
2022-03-27 14:44:06,825 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'effective_min_count=1 leaves 46369 word corpus (100.0%% of original 46369, drops 0)', 'datetime': '2022-03-27T14:44:06.825618', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'prepare_vocab'}
2022-03-27 14:44:06,968 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - prepare_vocab-line:741 - deleting the raw counts dictionary of 10828 items
2022-03-27 14:44:06,969 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - prepare_vocab-line:744 - sample=0.001 downsamples 40 most-common words
2022-03-27 14:44:06,970 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'downsampling leaves estimated 40640.463918984155 word corpus (87.6%% of prior 46369)', 'datetime': '2022-03-27T14:44:06.970622', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'prepare_vocab'}
2022-03-27 14:44:07,295 - INFO - MainProcess[19272]-MainThread[19072]-fasttext.py-gensim.models.fasttext - estimate_memory-line:493 - estimated required memory for 10828 words, 2000000 buckets and 64 dimensions: 525048308 bytes
2022-03-27 14:44:07,296 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - init_weights-line:859 - resetting layer weights
2022-03-27 14:44:09,287 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'update': False, 'trim_rule': 'None', 'datetime': '2022-03-27T14:44:09.287742', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'build_vocab'}
2022-03-27 14:44:09,289 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'training model with 3 workers on 10828 vocabulary and 64 features, using sg=0 hs=0 sample=0.001 negative=5 window=3 shrink_windows=True', 'datetime': '2022-03-27T14:44:09.289723', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'train'}
2022-03-27 14:44:09,376 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 2 more threads
2022-03-27 14:44:09,409 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 1 more threads
2022-03-27 14:44:09,414 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 0 more threads
2022-03-27 14:44:09,414 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_end-line:1629 - EPOCH - 1 : training on 46369 raw words (40640 effective words) took 0.1s, 404546 effective words/s
2022-03-27 14:44:09,500 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 2 more threads
2022-03-27 14:44:09,531 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 1 more threads
2022-03-27 14:44:09,544 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 0 more threads
2022-03-27 14:44:09,545 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_end-line:1629 - EPOCH - 2 : training on 46369 raw words (40644 effective words) took 0.1s, 350692 effective words/s
2022-03-27 14:44:09,546 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'training on 92738 raw words (81284 effective words) took 0.3s, 317320 effective words/s', 'datetime': '2022-03-27T14:44:09.546730', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'train'}
100%|████████████████████████████████████████████████████████████████████████████| 9000/9000 [00:07<00:00, 1273.13it/s]
2022-03-27 14:44:18,279 - INFO3 - MainProcess[19272]-MainThread[19072]-text.py-lightautoml.transformers.text - fit-line:788 - Feature concated__title fitted
2022-03-27 14:44:24,936 - INFO3 - MainProcess[19272]-MainThread[19072]-text.py-lightautoml.transformers.text - transform-line:834 - Feature concated__title transformed

[14:44:24] Start fitting Lvl_0_Pipe_1_Mod_0_LightGBM ...

2022-03-27 14:44:24,992 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:245 - Start fitting Lvl_0_Pipe_1_Mod_0_LightGBM ...

[14:44:36] Fitting Lvl_0_Pipe_1_Mod_0_LightGBM finished. score = -924.1246948242188

2022-03-27 14:44:36,807 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:293 - Fitting Lvl_0_Pipe_1_Mod_0_LightGBM finished. score = -924.1246948242188

[14:44:36] Lvl_0_Pipe_1_Mod_0_LightGBM fitting and predicting completed

2022-03-27 14:44:36,809 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:296 - Lvl_0_Pipe_1_Mod_0_LightGBM fitting and predicting completed

[14:44:36] Time left 10757.75 secs

2022-03-27 14:44:36,816 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:223 - Time left 10757.75 secs

[14:44:36] Layer 1 training completed.

2022-03-27 14:44:36,818 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:241 - Layer 1 training completed.

[14:44:36] Blending: optimization starts with equal weights and score -924.7379150390625

2022-03-27 14:44:36,827 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:370 - Blending: optimization starts with equal weights and score -924.7379150390625

[14:44:36] Blending: iteration 0: score = -922.67333984375, weights = [0.25724643 0.74275357]

2022-03-27 14:44:36,850 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:395 - Blending: iteration 0: score = -922.67333984375, weights = [0.25724643 0.74275357]

[14:44:36] Blending: iteration 1: score = -922.67333984375, weights = [0.25724643 0.74275357]

2022-03-27 14:44:36,873 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:395 - Blending: iteration 1: score = -922.67333984375, weights = [0.25724643 0.74275357]

[14:44:36] Blending: no score update. Terminated

2022-03-27 14:44:36,875 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:402 - Blending: no score update. Terminated

[14:44:36] Automl preset training completed in 42.32 seconds

2022-03-27 14:44:36,883 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:214 - Automl preset training completed in 42.32 seconds

[14:44:36] Model description:
Final prediction for new objects (level 0) = 
	 0.25725 * (3 averaged models Lvl_0_Pipe_0_Mod_0_LinearL2) +
	 0.74275 * (3 averaged models Lvl_0_Pipe_1_Mod_0_LightGBM) 

2022-03-27 14:44:36,885 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:215 - Model description:
Final prediction for new objects (level 0) = 
	 0.25725 * (3 averaged models Lvl_0_Pipe_0_Mod_0_LinearL2) +
	 0.74275 * (3 averaged models Lvl_0_Pipe_1_Mod_0_LightGBM)

from lightautoml.

alexmryzhkov commented on May 24, 2024

Hi @fingoldo,

I have checked the situation and the result is that in TabularNLPAutoML preset we don't use feature selector (because it will be pretty slow for this case) - that's why we can't show the fast feature importances. Could you please try use the accurate method instead of fast?

Alex

from lightautoml.

Feature importances in TabularNLPAutoML about lightautoml HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent