Comments (3)
Hi @fingoldo,
Thanks for the issue. Could you also share the code how you setup task, roles and TabularNLPAutoml with the full training log as well?
Alex
from lightautoml.
Thanks for the the quick reply, Alex! Sure.
Basically, it's this:
N_THREADS = multiprocessing.cpu_count()
MEMORY_LIMIT = psutil.virtual_memory().total * 0.9 / 1024 ** 3
verbose = 1
task = Task("reg", loss="mse", metric="mae")
timeout = 60 * 60 * 3
automl=TabularNLPAutoML(task=task, timeout=timeout, cpu_limit=N_THREADS, gpu_ids="all", text_params={"lang": "en"},)
automl.fit_predict(X,roles={"text": ["title"], "drop": [], "target": TARGET_COLUMN})
the log:
[14:43:54] Stdout logging level is INFO.
2022-03-27 14:43:54,513 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - set_verbosity_level-line:267 - Stdout logging level is INFO.
2022-03-27 14:43:54,535 - INFO3 - MainProcess[19272]-MainThread[19072]-text_presets.py-lightautoml.automl.presets.text_presets - infer_auto_params-line:230 - Model language mode: en
[14:43:54] Task: reg
2022-03-27 14:43:54,556 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:196 - Task: reg
[14:43:54] Start automl preset with listed constraints:
2022-03-27 14:43:54,558 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:198 - Start automl preset with listed constraints:
[14:43:54] - time: 10800.00 seconds
2022-03-27 14:43:54,559 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:199 - - time: 10800.00 seconds
[14:43:54] - CPU: 32 cores
2022-03-27 14:43:54,561 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:200 - - CPU: 32 cores
[14:43:54] - memory: 16 GB
2022-03-27 14:43:54,563 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:201 - - memory: 16 GB
[14:43:54] Train data shape: (9000, 290)
2022-03-27 14:43:54,565 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.reader.base - fit_read-line:274 - Train data shape: (9000, 290)
2022-03-27 14:43:57,354 - INFO3 - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.reader.base - advanced_roles_guess-line:607 - Feats was rejected during automatic roles guess: []
[14:43:57] Layer 1 train process start. Time left 10797.12 secs
2022-03-27 14:43:57,443 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:213 - Layer 1 train process start. Time left 10797.12 secs
[14:44:02] Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...
2022-03-27 14:44:02,316 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:245 - Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...
[14:44:05] Fitting Lvl_0_Pipe_0_Mod_0_LinearL2 finished. score = -940.749755859375
2022-03-27 14:44:05,244 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:293 - Fitting Lvl_0_Pipe_0_Mod_0_LinearL2 finished. score = -940.749755859375
[14:44:05] Lvl_0_Pipe_0_Mod_0_LinearL2 fitting and predicting completed
2022-03-27 14:44:05,246 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:296 - Lvl_0_Pipe_0_Mod_0_LinearL2 fitting and predicting completed
[14:44:05] Time left 10789.31 secs
2022-03-27 14:44:05,257 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:223 - Time left 10789.31 secs
2022-03-27 14:44:06,717 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'params': 'FastText(vocab=0, vector_size=64, alpha=0.025)', 'datetime': '2022-03-27T14:44:06.717633', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'created'}
2022-03-27 14:44:06,725 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - scan_vocab-line:578 - collecting all words and their counts
2022-03-27 14:44:06,726 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _scan_vocab-line:561 - PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2022-03-27 14:44:06,745 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - scan_vocab-line:584 - collected 10828 word types from a corpus of 46369 raw words and 9000 sentences
2022-03-27 14:44:06,746 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - prepare_vocab-line:633 - Creating a fresh vocabulary
2022-03-27 14:44:06,824 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'effective_min_count=1 retains 10828 unique words (100.0%% of original 10828, drops 0)', 'datetime': '2022-03-27T14:44:06.824618', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'prepare_vocab'}
2022-03-27 14:44:06,825 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'effective_min_count=1 leaves 46369 word corpus (100.0%% of original 46369, drops 0)', 'datetime': '2022-03-27T14:44:06.825618', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'prepare_vocab'}
2022-03-27 14:44:06,968 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - prepare_vocab-line:741 - deleting the raw counts dictionary of 10828 items
2022-03-27 14:44:06,969 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - prepare_vocab-line:744 - sample=0.001 downsamples 40 most-common words
2022-03-27 14:44:06,970 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'downsampling leaves estimated 40640.463918984155 word corpus (87.6%% of prior 46369)', 'datetime': '2022-03-27T14:44:06.970622', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'prepare_vocab'}
2022-03-27 14:44:07,295 - INFO - MainProcess[19272]-MainThread[19072]-fasttext.py-gensim.models.fasttext - estimate_memory-line:493 - estimated required memory for 10828 words, 2000000 buckets and 64 dimensions: 525048308 bytes
2022-03-27 14:44:07,296 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - init_weights-line:859 - resetting layer weights
2022-03-27 14:44:09,287 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'update': False, 'trim_rule': 'None', 'datetime': '2022-03-27T14:44:09.287742', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'build_vocab'}
2022-03-27 14:44:09,289 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'training model with 3 workers on 10828 vocabulary and 64 features, using sg=0 hs=0 sample=0.001 negative=5 window=3 shrink_windows=True', 'datetime': '2022-03-27T14:44:09.289723', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'train'}
2022-03-27 14:44:09,376 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 2 more threads
2022-03-27 14:44:09,409 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 1 more threads
2022-03-27 14:44:09,414 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 0 more threads
2022-03-27 14:44:09,414 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_end-line:1629 - EPOCH - 1 : training on 46369 raw words (40640 effective words) took 0.1s, 404546 effective words/s
2022-03-27 14:44:09,500 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 2 more threads
2022-03-27 14:44:09,531 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 1 more threads
2022-03-27 14:44:09,544 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_progress-line:1288 - worker thread finished; awaiting finish of 0 more threads
2022-03-27 14:44:09,545 - INFO - MainProcess[19272]-MainThread[19072]-word2vec.py-gensim.models.word2vec - _log_epoch_end-line:1629 - EPOCH - 2 : training on 46369 raw words (40644 effective words) took 0.1s, 350692 effective words/s
2022-03-27 14:44:09,546 - INFO - MainProcess[19272]-MainThread[19072]-utils.py-gensim.utils - add_lifecycle_event-line:447 - FastText lifecycle event {'msg': 'training on 92738 raw words (81284 effective words) took 0.3s, 317320 effective words/s', 'datetime': '2022-03-27T14:44:09.546730', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.17763-SP0', 'event': 'train'}
100%|████████████████████████████████████████████████████████████████████████████| 9000/9000 [00:07<00:00, 1273.13it/s]
2022-03-27 14:44:18,279 - INFO3 - MainProcess[19272]-MainThread[19072]-text.py-lightautoml.transformers.text - fit-line:788 - Feature concated__title fitted
2022-03-27 14:44:24,936 - INFO3 - MainProcess[19272]-MainThread[19072]-text.py-lightautoml.transformers.text - transform-line:834 - Feature concated__title transformed
[14:44:24] Start fitting Lvl_0_Pipe_1_Mod_0_LightGBM ...
2022-03-27 14:44:24,992 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:245 - Start fitting Lvl_0_Pipe_1_Mod_0_LightGBM ...
[14:44:36] Fitting Lvl_0_Pipe_1_Mod_0_LightGBM finished. score = -924.1246948242188
2022-03-27 14:44:36,807 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:293 - Fitting Lvl_0_Pipe_1_Mod_0_LightGBM finished. score = -924.1246948242188
[14:44:36] Lvl_0_Pipe_1_Mod_0_LightGBM fitting and predicting completed
2022-03-27 14:44:36,809 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.ml_algo.base - fit_predict-line:296 - Lvl_0_Pipe_1_Mod_0_LightGBM fitting and predicting completed
[14:44:36] Time left 10757.75 secs
2022-03-27 14:44:36,816 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:223 - Time left 10757.75 secs
[14:44:36] Layer 1 training completed.
2022-03-27 14:44:36,818 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.base - fit_predict-line:241 - Layer 1 training completed.
[14:44:36] Blending: optimization starts with equal weights and score -924.7379150390625
2022-03-27 14:44:36,827 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:370 - Blending: optimization starts with equal weights and score -924.7379150390625
[14:44:36] Blending: iteration 0: score = -922.67333984375, weights = [0.25724643 0.74275357]
2022-03-27 14:44:36,850 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:395 - Blending: iteration 0: score = -922.67333984375, weights = [0.25724643 0.74275357]
[14:44:36] Blending: iteration 1: score = -922.67333984375, weights = [0.25724643 0.74275357]
2022-03-27 14:44:36,873 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:395 - Blending: iteration 1: score = -922.67333984375, weights = [0.25724643 0.74275357]
[14:44:36] Blending: no score update. Terminated
2022-03-27 14:44:36,875 - INFO - MainProcess[19272]-MainThread[19072]-blend.py-lightautoml.automl.blend - _optimize-line:402 - Blending: no score update. Terminated
[14:44:36] Automl preset training completed in 42.32 seconds
2022-03-27 14:44:36,883 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:214 - Automl preset training completed in 42.32 seconds
[14:44:36] Model description:
Final prediction for new objects (level 0) =
0.25725 * (3 averaged models Lvl_0_Pipe_0_Mod_0_LinearL2) +
0.74275 * (3 averaged models Lvl_0_Pipe_1_Mod_0_LightGBM)
2022-03-27 14:44:36,885 - INFO - MainProcess[19272]-MainThread[19072]-base.py-lightautoml.automl.presets.base - fit_predict-line:215 - Model description:
Final prediction for new objects (level 0) =
0.25725 * (3 averaged models Lvl_0_Pipe_0_Mod_0_LinearL2) +
0.74275 * (3 averaged models Lvl_0_Pipe_1_Mod_0_LightGBM)
from lightautoml.
Hi @fingoldo,
I have checked the situation and the result is that in TabularNLPAutoML preset we don't use feature selector (because it will be pretty slow for this case) - that's why we can't show the fast feature importances. Could you please try use the accurate method instead of fast?
Alex
from lightautoml.
Related Issues (20)
- pip installs dev packages with lama HOT 1
- providing CustomIterator to cv_iter in tabular_automl.fit_predict fails HOT 4
- LabelEncoder filtering is not working
- DateSeasons transformer works wrong
- ColumnSelector - possible typing typo HOT 1
- DummyIterator wrong type
- Dependency conflict (library `dataclasses` with `python` >= 3.7)
- Publishing Docker images
- ReportDeco parameter typo HOT 1
- colab crashing for unknown reason HOT 5
- Broken links to images in "Tutorial_4_NLP_Interpretation"
- NSections Issue with Train Dataset HOT 10
- Exploding of linear models for non-smooth loss function HOT 1
- Poetry cant solve deps HOT 3
- TabularAutoML object has no attribute 'reader' HOT 3
- RMSLE metric issue HOT 1
- Demo is not working HOT 2
- Data downloader error HOT 1
- report deco error HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightautoml.