Git Product home page Git Product logo

Comments (3)

alexmryzhkov avatar alexmryzhkov commented on May 24, 2024

Hi @fingoldo,

Thanks for the issue. Could you also share the code how you setup task, roles and TabularNLPAutoml with the full training log as well?

Alex

from lightautoml.

fingoldo avatar fingoldo commented on May 24, 2024

Thanks for the the quick reply, Alex! Sure.
Basically, it's this:

N_THREADS = multiprocessing.cpu_count()
MEMORY_LIMIT = psutil.virtual_memory().total * 0.9 / 1024 ** 3
verbose = 1
task = Task("reg", loss="mse", metric="mae")
timeout = 60 * 60 * 3
automl=TabularNLPAutoML(task=task, timeout=timeout, cpu_limit=N_THREADS, gpu_ids="all", text_params={"lang": "en"},)

automl.fit_predict(X,roles={"text": ["title"], "drop": [], "target": TARGET_COLUMN})

the log:

[14:43:54] Stdout logging level is INFO.

2022-03-27 14:43:54,513 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - set_verbosity_level-line:267 - Stdout logging level is INFO.
2022-03-27 14:43:54,535 - INFO3 - MainProcess[19272]-MainThread[19072]-text_presets.py-lightautoml.automl.presets.text_presets - infer_auto_params-line:230 - Model language mode: en

[14:43:54] Task: reg

2022-03-27 14:43:54,556 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:196 - Task: reg

[14:43:54] Start automl preset with listed constraints:

2022-03-27 14:43:54,558 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:198 - Start automl preset with listed constraints:

[14:43:54] - time: 10800.00 seconds

2022-03-27 14:43:54,559 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:199 - - time: 10800.00 seconds

[14:43:54] - CPU: 32 cores

2022-03-27 14:43:54,561 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:200 - - CPU: 32 cores

[14:43:54] - memory: 16 GB

2022-03-27 14:43:54,563 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:201 - - memory: 16 GB

[14:43:54] Train data shape: (9000, 290)

2022-03-27 14:43:54,565 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.reader.base - fit_read-line:274 - Train data shape: (9000, 290)

2022-03-27 14:43:57,354 - INFO3 - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.reader.base - advanced_roles_guess-line:607 - Feats was rejected during automatic roles guess: []

[14:43:57] Layer 1 train process start. Time left 10797.12 secs

2022-03-27 14:43:57,443 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:213 - Layer 1 train process start. Time left 10797.12 secs

[14:44:02] Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...

2022-03-27 14:44:02,316 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:245 - Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...

[14:44:05] Fitting Lvl_0_Pipe_0_Mod_0_LinearL2 finished. score = -940.749755859375

2022-03-27 14:44:05,244 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:293 - Fitting Lvl_0_Pipe_0_Mod_0_LinearL2 finished. score = -940.749755859375

[14:44:05] Lvl_0_Pipe_0_Mod_0_LinearL2 fitting and predicting completed

2022-03-27 14:44:05,246 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:296 - Lvl_0_Pipe_0_Mod_0_LinearL2 fitting and predicting completed

[14:44:05] Time left 10789.31 secs

2022-03-27 14:44:05,257 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:223 - Time left 10789.31 secs

2022-03-27 14:44:06,717 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'params': 'FastText(vocab=0, vector_size=64, alpha=0.025)', 'datetime': '2022-03-27T14:44:06.717633', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'created'}
2022-03-27 14:44:06,725 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - scan_vocab-line:578 - collecting all words and their counts
2022-03-27 14:44:06,726 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _scan_vocab-line:561 - PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2022-03-27 14:44:06,745 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - scan_vocab-line:584 - collected 10828 word types from a corpus of 46369 raw words and 9000 sentences
2022-03-27 14:44:06,746 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - prepare_vocab-line:633 - Creating a fresh vocabulary
2022-03-27 14:44:06,824 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'effective_min_count=1 retains 10828 unique words (100.0%% of original 10828, drops 0)', 'datetime': '2022-03-27T14:44:06.824618', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'prepare_vocab'}
2022-03-27 14:44:06,825 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'effective_min_count=1 leaves 46369 word corpus (100.0%% of original 46369, drops 0)', 'datetime': '2022-03-27T14:44:06.825618', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'prepare_vocab'}
2022-03-27 14:44:06,968 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - prepare_vocab-line:741 - deleting the raw counts dictionary of 10828 items
2022-03-27 14:44:06,969 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - prepare_vocab-line:744 - sample=0.001 downsamples 40 most-common words
2022-03-27 14:44:06,970 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'downsampling leaves estimated 40640.463918984155 word corpus (87.6%% of prior 46369)', 'datetime': '2022-03-27T14:44:06.970622', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'prepare_vocab'}
2022-03-27 14:44:07,295 - INFO - MainProcess[19272]-MainThread[19072]-fasttext.py-gensim.models.fasttext - estimate_memory-line:493 - estimated required memory for 10828 words, 2000000 buckets and 64 dimensions: 525048308 bytes
2022-03-27 14:44:07,296 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - init_weights-line:859 - resetting layer weights
2022-03-27 14:44:09,287 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'update': False, 'trim_rule': 'None', 'datetime': '2022-03-27T14:44:09.287742', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'build_vocab'}
2022-03-27 14:44:09,289 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'training model with 3 workers on 10828 vocabulary and 64 features, using sg=0 hs=0 sample=0.001 negative=5 window=3 shrink_windows=True', 'datetime': '2022-03-27T14:44:09.289723', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'train'}
2022-03-27 14:44:09,376 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 2 more threads
2022-03-27 14:44:09,409 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 1 more threads
2022-03-27 14:44:09,414 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 0 more threads
2022-03-27 14:44:09,414 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_end-line:1629 - EPOCH - 1 : training on 46369 raw words (40640 effective words) took 0.1s, 404546 effective words/s
2022-03-27 14:44:09,500 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 2 more threads
2022-03-27 14:44:09,531 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 1 more threads
2022-03-27 14:44:09,544 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 0 more threads
2022-03-27 14:44:09,545 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_end-line:1629 - EPOCH - 2 : training on 46369 raw words (40644 effective words) took 0.1s, 350692 effective words/s
2022-03-27 14:44:09,546 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'training on 92738 raw words (81284 effective words) took 0.3s, 317320 effective words/s', 'datetime': '2022-03-27T14:44:09.546730', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'train'}
100%|████████████████████████████████████████████████████████████████████████████| 9000/9000 [00:07<00:00, 1273.13it/s]
2022-03-27 14:44:18,279 - INFO3 - MainProcess[19272]-MainThread[19072]-text.py-lightautoml.transformers.text - fit-line:788 - Feature concated__title fitted
2022-03-27 14:44:24,936 - INFO3 - MainProcess[19272]-MainThread[19072]-text.py-lightautoml.transformers.text - transform-line:834 - Feature concated__title transformed

[14:44:24] Start fitting Lvl_0_Pipe_1_Mod_0_LightGBM ...

2022-03-27 14:44:24,992 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:245 - Start fitting Lvl_0_Pipe_1_Mod_0_LightGBM ...

[14:44:36] Fitting Lvl_0_Pipe_1_Mod_0_LightGBM finished. score = -924.1246948242188

2022-03-27 14:44:36,807 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:293 - Fitting Lvl_0_Pipe_1_Mod_0_LightGBM finished. score = -924.1246948242188

[14:44:36] Lvl_0_Pipe_1_Mod_0_LightGBM fitting and predicting completed

2022-03-27 14:44:36,809 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:296 - Lvl_0_Pipe_1_Mod_0_LightGBM fitting and predicting completed

[14:44:36] Time left 10757.75 secs

2022-03-27 14:44:36,816 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:223 - Time left 10757.75 secs

[14:44:36] Layer 1 training completed.

2022-03-27 14:44:36,818 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:241 - Layer 1 training completed.

[14:44:36] Blending: optimization starts with equal weights and score -924.7379150390625

2022-03-27 14:44:36,827 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:370 - Blending: optimization starts with equal weights and score -924.7379150390625

[14:44:36] Blending: iteration 0: score = -922.67333984375, weights = [0.25724643 0.74275357]

2022-03-27 14:44:36,850 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:395 - Blending: iteration 0: score = -922.67333984375, weights = [0.25724643 0.74275357]

[14:44:36] Blending: iteration 1: score = -922.67333984375, weights = [0.25724643 0.74275357]

2022-03-27 14:44:36,873 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:395 - Blending: iteration 1: score = -922.67333984375, weights = [0.25724643 0.74275357]

[14:44:36] Blending: no score update. Terminated

2022-03-27 14:44:36,875 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:402 - Blending: no score update. Terminated

[14:44:36] Automl preset training completed in 42.32 seconds

2022-03-27 14:44:36,883 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:214 - Automl preset training completed in 42.32 seconds

[14:44:36] Model description:
Final prediction for new objects (level 0) = 
	 0.25725 * (3 averaged models Lvl_0_Pipe_0_Mod_0_LinearL2) +
	 0.74275 * (3 averaged models Lvl_0_Pipe_1_Mod_0_LightGBM) 

2022-03-27 14:44:36,885 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:215 - Model description:
Final prediction for new objects (level 0) = 
	 0.25725 * (3 averaged models Lvl_0_Pipe_0_Mod_0_LinearL2) +
	 0.74275 * (3 averaged models Lvl_0_Pipe_1_Mod_0_LightGBM) 

from lightautoml.

alexmryzhkov avatar alexmryzhkov commented on May 24, 2024

Hi @fingoldo,

I have checked the situation and the result is that in TabularNLPAutoML preset we don't use feature selector (because it will be pretty slow for this case) - that's why we can't show the fast feature importances. Could you please try use the accurate method instead of fast?

Alex

from lightautoml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.