hundredblocks / ml-powered-applications Goto Github PK
View Code? Open in Web Editor NEWCompanion repository for the book Building Machine Learning Powered Applications
Home Page: https://mlpowered.com/
License: MIT License
Companion repository for the book Building Machine Learning Powered Applications
Home Page: https://mlpowered.com/
License: MIT License
Not sure if I have to update something, but I am currently having an issue running the code within the "Training a simple model" notebook. It is specifically with importing joblib. When I ran the code initially how it is set up, I get the first error listed above. (Note: the notebook currently imports with "from sklearn.externals import joblib"). So I rewrote the line to "import joblib" and get the second error above. Not sure what to do from here.
I cloned the repo and followed the instructions. The venv seems to have been installed alright. Following along the book, I wanted to play with the v1. When I click on the "Get Recommendations" button, I see this on the server console:
127.0.0.1 - - [01/Jan/2020 18:52:38] "GET /v1 HTTP/1.1" 200 -
Exception on /v1 [POST]
Traceback (most recent call last):
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/Users/me/gh/ml-powered-applications/app.py", line 15, in v1
return handle_text_request(request, "v1.html")
File "/Users/me/gh/ml-powered-applications/app.py", line 31, in handle_text_request
suggestions = get_recommendations_from_input(question)
File "/Users/me/gh/ml-powered-applications/ml_editor/ml_editor.py", line 279, in get_recommendations_from_input
tokenized_sentences = preprocess_input(processed)
File "/Users/me/gh/ml-powered-applications/ml_editor/ml_editor.py", line 45, in preprocess_input
sentences = nltk.sent_tokenize(text)
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/nltk/tokenize/__init__.py", line 105, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/nltk/data.py", line 868, in load
opened_resource = _open(resource_url)
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/nltk/data.py", line 993, in _open
return find(path_, path + ['']).open()
File "/Users/me/gh/ml-powered-applications/ml_editor/lib/python3.7/site-packages/nltk/data.py", line 701, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/PY3/english.pickle
Searched in:
- '/Users/me/nltk_data'
- '/Users/me/gh/ml-powered-applications/ml_editor/bin/../nltk_data'
- '/Users/me/gh/ml-powered-applications/ml_editor/bin/../share/nltk_data'
- '/Users/me/gh/ml-powered-applications/ml_editor/bin/../lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
127.0.0.1 - - [01/Jan/2020 18:52:44] "POST /v1 HTTP/1.1" 500 -
Is there an instruction missing?
I made sure nltk
is installed in the venv which is activated.
Hello
I am trying to execute the project from the book. However it says that needs Python 3.6 or 3.7. Now in anaconda I don´t see these versions allowed anymore , the oldest one is 3.8 . Aditionally, in the codes you are using Pandas 0.24.2 which is not compatible with the lastst versions of Python.
I am having a bit of problems to follow the project of the book for these reasons. Is there any other repo which is more recent ?
On my Windows 10 laptop with GPU, out of 15 notebook python files of this book, 14 completed successfully without errors. One program vectorizing_text.ipynb is giving error message "Kernel Restarting. Kernel appears to have died. It will restart automatically".
It dies on cell 4,
umap_embedder = umap.UMAP()
umap_bow = umap_embedder.fit_transform(bag_of_words)
Is this error due to less RAM memory on my laptop?
Any suggestions or workarounds to fix this error please? Thanks.
Can this program be run by lowering size of bag_of_words?
How to estimate memory requirements of a notebook program file?
My PC has Intel i7-9750H CPU @ 2.60 GHz and
NVIDIA GeForce RTX 2070 with Max-Q Design, RAM 16 GB
What kind of hardware you have used to test these programs?
How much RAM memory you have on your computer?
Is it a UNIX system computer?
I have UNIX on a basic very low clock speed laptop, (no GPU).
I don't think it will work.
I do not know how to setup my unix (Ubantu 18.4) laptop for Machine Learning use.
Any suggestions to get this program working please?
Thanks,
SSJ
This is not an issue but is a nice to have feature.
This book's notebook file names needed to be prefixed with chapter numbers in which the file is discussed and used.
I have listed below, preliminary prefixed notebook file names.
Author of this book can check if these prefixes are correct and if correct author can update the files names.
ch04_ch05_clustering_data.ipynb
ch04_ch06_ch07_vectorizing_text.ipynb
ch04_ch06_dataset_exploration.ipynb
ch04_exploring_data_to_generate_features.ipynb
ch04_tabular_data_vectorization.ipynb
ch04_third_model.ipynb
ch05_black_box_explainer.ipynb
ch05_comparing_data_to_predictions.ipynb
ch05_feature_importance.ipynb
ch05_splitting_data.ipynb
ch05_top_k.ipynb
ch05_train_simple_model.ipynb
ch07_ch08_second_model.ipynb
ch07_ch11_comparing_models.ipynb
ch07_generating_recommendations.ipynb
Could you please provide a docker image, which has installed all the dependencies of this app, to run your sample code?
It always shows many types of dependencies error when we are preparing the run environments.
Try 'flask run --help' for help.
Error: While importing 'app', an ImportError was raised:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/flask/cli.py", line 218, in locate_app
import(module_name)
File "/Users/xingvoong/github/ml-study/ml-powered-applications/app.py", line 5, in
from ml_editor.ml_editor import get_recommendations_from_input
File "/Users/xingvoong/github/ml-study/ml-powered-applications/ml_editor/ml_editor.py", line 5, in
import pyphen
ModuleNotFoundError: No module named 'pyphen'
I already install pyphen.
Hello,
Would you mind adding a licence to this project code?
Personally I'm not going to use it directly for anything, but it might be a good starter for a project and without a licence i'd hesitate to use it for that purpose.
Apache 2, MIT, CC-BY-SA, etc. would be great.
Thanks
I'm trying to run train_simple_model.ipynb and already seen the ImportError: cannot import name 'joblib' from 'sklearn.externals'
, so I installed and upgraded joblib to 1.01 and did import joblib
directly which cleared that error.
However, in the first cell, the code is failing through this path: from ml_editor.model_v1 import get_model_probabilities_for_input_texts --> VECTORIZER = joblib.load(curr_path / vectorizer_path) --> obj = _unpickle(fobj, filename, mmap_mode) --> obj = unpickler.load() --> dispatchkey[0] --> klass = self.find_class(module, name) --> import(module, level=0) --> ModuleNotFoundError: No module named 'sklearn.externals.joblib'.
I suspect this has got to do with how vectorizer_1.pkl
is created. Is it because vectorizer_1.pkl
was saved with the old joblib, so when loading, it is asking for the old joblib library?
I was trying to recreate the 3 models and 2 vectorizers using my new joblib, hoping that this error will go away, then realized from searching joblib.dump that I can't find where the models and vectorizers are created. It seems that vectorizer_1.pkl
is only created at the end of train_simple_model.ipynb
with joblib.dump(vectorizer, vectorizer_path)
but it is already being used in the 1st cell of the notebook, leading to the error in this issue.
Are those artifacts in models folder pre-trained somewhere already? If not, which notebooks generated them? (so i can run these notebooks on new libraries to create loadable versions of the pickles). I hope to go through these notebooks without downgrading libraries or pinning to old versions as that is not sustainable in the long run.
P.s I also saw a ModuleNotFoundError: No module named 'sklearn.ensemble.forest'
when loading models, it's probably related to the pickled model being trained on an older scikit-learn API.
Attached is the error message that I copy and paste it in the microsoft word doc.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.