Comments (25)
Thanks, I found the problem! When I installed SciPy 1.8.2 and NumPy 0.13.3, I got the same problem as you did. However, when I upgraded the packages to NumPy 1.9.3 and SciPy 0.14.0 it works just fine.
Although I listed the minimum version requirements on page 15 (end of chapter 1), I'd even recommend to install the latest versions, e.g., NumPy 1.11.0 and SciPy 0.17.0.
from python-machine-learning-book.
Is this problem related to this?
python recognize.py --file p364_001.wav
Traceback (most recent call last):
File "recognize.py", line 53, in
mfcc = np.transpose(np.expand_dims(librosa.feature.mfcc(wav, 16000), axis=0), [0, 2, 1])
File "/usr/local/lib/python2.7/dist-packages/librosa/feature/spectral.py", line 1279, in mfcc
S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/librosa/feature/spectral.py", line 1371, in melspectrogram
mel_basis = filters.mel(sr, n_fft, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/librosa/filters.py", line 238, in mel
lower = -ramps[i] / fdiff[i]
ValueError: operands could not be broadcast together with shapes (1,1025) (0,)
did u get the fix to this?
if yes , please help me. I need some help urgently
from python-machine-learning-book.
Sorry to hear that you are having problems with that! I just reran the code from the IPython notebook (https://github.com/rasbt/python-machine-learning-book/tree/master/code/ch12) and it worked fine for me.
nn_check = MLPGradientCheck(n_output=10,
n_features=X_train.shape[1],
n_hidden=10,
l2=0.0,
l1=0.0,
epochs=10,
eta=0.001,
alpha=0.0,
decrease_const=0.0,
minibatches=1,
shuffle=False,
random_state=1)
Then
nn_check.fit(X_train[:5], y_train[:5], print_progress=False)
which should yield something like:
Ok: 2.55068505986e-10
Ok: 2.93547837023e-10
Ok: 2.37449571314e-10
Ok: 3.08194323691e-10
Ok: 3.38249440642e-10
Ok: 3.57890221135e-10
Ok: 2.19231256383e-10
Ok: 2.36583740198e-10
Ok: 3.43584860701e-10
Ok: 2.13345208113e-10
When I inspected the input samples and labels, they looked like this:
Can you maybe double-check that the X and y arrays have the same dimensions? Maybe something went wrong during parsing MNIST.
from python-machine-learning-book.
In #9, you mentioned that a3
is practically always a "nan" matrix. Hm, I think the easiest thing to check is the input arrays X and y. Can you please print the X_train
and y_train
you use to fit the model? I think that will give us some useful clues! Also, have you tried to run the IPython notebook (ch12) from this GitHub repo? I am curious if it's maybe an OS-related issue or if there may be a typo in the script. Let me know how it goes! :)
from python-machine-learning-book.
Hi:
Thank you for the quick response
I just check the X_train and y_train, it seemingly works fine
from python-machine-learning-book.
Hm, I am thinking that it could be a floating point precision problem. Would be nice if you could run the IPython notebook from this repository. It works fine for me, and it worked for the other readers as well. So, if the IPython notebook doesn't run properly on your system, I guess we could narrow the problem down further and look at floating point precision and these things
from python-machine-learning-book.
I ran IPython noteook and the result is the same
from python-machine-learning-book.
Sorry, I have never seen this problem before. I will run it on my other Linux machine tomorrow to see if I get the same prob. If you execute
>>> import numpy as np
>>> np.finfo(np.float).precision
does is return 15
or a value higher than that?
from python-machine-learning-book.
Thank you in advance for your help
I execute the code and it return 15
from python-machine-learning-book.
oh, that's very useful info! I just see that you are running the code on Python 2.7 instead of Python 3. I guess that's what causes the problem (I will check it out on Python 2.7 later and edit the code to make it compatible)
from python-machine-learning-book.
Thank you
from python-machine-learning-book.
Hm, it worked fine for me when I ran it via Python 2.7.9 and it worked fine on my linux machine (CentOS) as well. Sorry that I don't have a good solution for you at hand ...
a) I am curious if this is Python 2.7 related, does it work for you if run the notebook via Python 3.5?
b) If you are using Python 2.7, maybe an additional
from __future__ import division
could help in your case (however, like mentioned before, it worked fine for me on Py27 without it)
c) can you please try to run
nn_check.fit(X_train[:5].astype(float), y_train[:5], print_progress=False)
?
d) Btw. did the rest of the code that came before gradient checking for okay on your machine?
from python-machine-learning-book.
Thank you for your information
I tried the code you provided, but the problem is the same
By the way, the code before gradient checking seemingly works fine
from python-machine-learning-book.
Hm, I am really curious to find out what causes this on your machine. Would you mind attaching the .py script here so that I can run it on my different machines?
In addition, it would be helpful to know which Python version and package versions you are using so that I can recreate your environment. E.g., by executing
$ python -V
Python 3.5.1 :: Continuum Analytics, Inc.
$ python -c 'import numpy; print(numpy.__version__)'
1.11.0
$ python -c 'import numpy; print(scipy.__version__)'
0.17.0
from python-machine-learning-book.
Please remove the extension ".txt", before use it
By the way, the information you need is as follows:
MLPGradientCheck.py.txt
mynewpic.py.txt
from python-machine-learning-book.
Thank you for the important information
from python-machine-learning-book.
You are welcome. Sorry, but the package updates seem to be the only solution to this problem. But aside from this chapter code, I would recommend it anyway since many bugs and problems have been fixed in the latest NumPy and SciPy releases -- I'd always try to stay up to date with these packages.
from python-machine-learning-book.
Is this problem related to this?
python recognize.py --file p364_001.wav
Traceback (most recent call last):
File "recognize.py", line 53, in
mfcc = np.transpose(np.expand_dims(librosa.feature.mfcc(wav, 16000), axis=0), [0, 2, 1])
File "/usr/local/lib/python2.7/dist-packages/librosa/feature/spectral.py", line 1279, in mfcc
S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/librosa/feature/spectral.py", line 1371, in melspectrogram
mel_basis = filters.mel(sr, n_fft, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/librosa/filters.py", line 238, in mel
lower = -ramps[i] / fdiff[i]
ValueError: operands could not be broadcast together with shapes (1,1025) (0,)
from python-machine-learning-book.
What you wanted to do?
from python-machine-learning-book.
I am just seeing in the error output that you are using Python 2.7, it could be also related to this because I am not sure if recent versions of NumPy and scikit-learn support Python 2.7 properly anymore.
from python-machine-learning-book.
I was working on audio to text where I came into the issues. Later I found that due to version mismatch of different packages required. I suggest you to work with latest packages. The error was due to the division where denominator is giving 0
from python-machine-learning-book.
@rasbt @zhangzfmail @atanumandal0491 @YashBangera7 @SumitBando ,
Hi I am getting an error like ValueError: operands could not be broadcast together with shapes (2336,122) (121,) i am using python3 and scikit-learn==0.22.1 could any help to resolve this issue.
Thanks and Regards,
Manikantha Sekhar.
from python-machine-learning-book.
Hi there,
could you share in which chapter this is happening? Is it an error in Ch12 similar to this original issue?
from python-machine-learning-book.
@rasbt ,
from this bellow code:
from flask import Flask,flash,render_template,session, request,redirect,url_for
import PyPDF2
import docx2txt
import pandas as pd
from IPython.display import Markdown, display, clear_output
import _pickle as cPickle
from pathlib import Path
import gensim
from gensim.test.utils import datapath, get_tmpfile
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec
import random
import os
app = Flask(name)
app.secret_key = "super secret key"
@app.route('/')
def home():
return render_template('index.html')
@app.route('/dict_output',methods = ['POST'])
def render():
if request.method == "POST":
file = request.form['file']
file_name,file_ext = os.path.splitext(file)
if file_ext == '.pdf':
data = ""
pdfFileObj = open(file, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
for page in range(pdfReader.numPages):
# print("page:",page)
pageObj = pdfReader.getPage(page)
extracted_data = pageObj.extractText()
data += extracted_data
else:
textfile = open(file,'r')
data = textfile.read()
final = {}
def dumpPickle(fileName, content):
pickleFile = open(fileName, 'wb')
cPickle.dump(content, pickleFile, -1)
pickleFile.close()
def loadPickle(fileName):
file = open(fileName, 'rb')
content = cPickle.load(file)
print("content:",content)
file.close()
return content
def pickleExists(fileName):
file = Path(fileName)
if file.is_file():
return True
return False
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')
#Extract answers and the sentence they are in
def extractAnswers(qas, doc):
# print("qas:",qas)
answers = []
senStart = 0
senId = 0
for sentence in doc.sents:
senLen = len(sentence.text)
for answer in qas:
answerStart = answer['answers'][0]['answer_start']
if (answerStart >= senStart and answerStart < (senStart + senLen)):
answers.append({'sentenceId': senId, 'text': answer['answers'][0]['text']})
senStart += senLen
enId += 1
return answers
#TODO - Clean answers from stopwords?
def tokenIsAnswer(token, sentenceId, answers):
for i in range(len(answers)):
if (answers[i]['sentenceId'] == sentenceId):
if (answers[i]['text'] == token):
return True
return False
#Save named entities start points
def getNEStartIndexs(doc):
neStarts = {}
for ne in doc.ents:
neStarts[ne.start] = ne
return neStarts
def getSentenceStartIndexes(doc):
senStarts = []
for sentence in doc.sents:
senStarts.append(sentence[0].i)
return senStarts
def getSentenceForWordPosition(wordPos, senStarts):
for i in range(1, len(senStarts)):
if (wordPos < senStarts[i]):
return i - 1
def addWordsForParagrapgh(newWords, text):
doc = nlp(text)
neStarts = getNEStartIndexs(doc)
senStarts = getSentenceStartIndexes(doc)
#index of word in spacy doc text
i = 0
while (i < len(doc)):
#If the token is a start of a Named Entity, add it and push to index to end of the NE
if (i in neStarts):
word = neStarts[i]
#add word
currentSentence = getSentenceForWordPosition(word.start, senStarts)
wordLen = word.end - word.start
shape = ''
for wordIndex in range(word.start, word.end):
shape += (' ' + doc[wordIndex].shape_)
newWords.append([word.text,
0,
0,
currentSentence,
wordLen,
word.label_,
None,
None,
None,
shape])
i = neStarts[i].end - 1
#If not a NE, add the word if it's not a stopword or a non-alpha (not regular letters)
else:
if (doc[i].is_stop == False and doc[i].is_alpha == True):
word = doc[i]
currentSentence = getSentenceForWordPosition(i, senStarts)
wordLen = 1
newWords.append([word.text,
0,
0,
currentSentence,
wordLen,
None,
word.pos_,
word.tag_,
word.dep_,
word.shape_])
i += 1
def oneHotEncodeColumns(df):
columnsToEncode = ['NER', 'POS', "TAG", 'DEP']
for column in columnsToEncode:
one_hot = pd.get_dummies(df[column])
one_hot = one_hot.add_prefix(column + '_')
df = df.drop(column, axis = 1)
df = df.join(one_hot)
return df
def generateDf(text):
words = []
addWordsForParagrapgh(words, text)
wordColums = ['text', 'titleId', 'paragrapghId', 'sentenceId','wordCount', 'NER', 'POS', 'TAG', 'DEP','shape']
df = pd.DataFrame(words, columns=wordColums)
return df
def prepareDf(df):
#One-hot encoding
wordsDf = oneHotEncodeColumns(df)
#Drop unused columns
columnsToDrop = ['text', 'titleId', 'paragrapghId', 'sentenceId', 'shape']
wordsDf = wordsDf.drop(columnsToDrop, axis = 1)
#Add missing colums
predictorColumns = ['wordCount','NER_CARDINAL','NER_DATE','NER_EVENT','NER_FAC','NER_GPE','NER_LANGUAGE','NER_LAW','NER_LOC','NER_MONEY','NER_NORP','NER_ORDINAL','NER_ORG','NER_PERCENT','NER_PERSON','NER_PRODUCT','NER_QUANTITY','NER_TIME','NER_WORK_OF_ART','POS_ADJ','POS_ADP','POS_ADV','POS_CCONJ','POS_DET','POS_INTJ','POS_NOUN','POS_NUM','POS_PART','POS_PRON','POS_PROPN','POS_PUNCT','POS_SYM','POS_VERB','POS_X','TAG_''','TAG_-LRB-','TAG_.','TAG_ADD','TAG_AFX','TAG_CC','TAG_CD','TAG_DT','TAG_EX','TAG_FW','TAG_IN','TAG_JJ','TAG_JJR','TAG_JJS','TAG_LS','TAG_MD','TAG_NFP','TAG_NN','TAG_NNP','TAG_NNPS','TAG_NNS','TAG_PDT','TAG_POS','TAG_PRP','TAG_PRP$','TAG_RB','TAG_RBR','TAG_RBS','TAG_RP','TAG_SYM','TAG_TO','TAG_UH','TAG_VB','TAG_VBD','TAG_VBG','TAG_VBN','TAG_VBP','TAG_VBZ','TAG_WDT','TAG_WP','TAG_WRB','TAG_XX','DEP_ROOT','DEP_acl','DEP_acomp','DEP_advcl','DEP_advmod','DEP_agent','DEP_amod','DEP_appos','DEP_attr','DEP_aux','DEP_auxpass','DEP_case','DEP_cc','DEP_ccomp','DEP_compound','DEP_conj','DEP_csubj','DEP_csubjpass','DEP_dative','DEP_dep','DEP_det','DEP_dobj','DEP_expl','DEP_intj','DEP_mark','DEP_meta','DEP_neg','DEP_nmod','DEP_npadvmod','DEP_nsubj','DEP_nsubjpass','DEP_nummod','DEP_oprd','DEP_parataxis','DEP_pcomp','DEP_pobj','DEP_poss','DEP_preconj','DEP_predet','DEP_prep','DEP_prt','DEP_punct','DEP_quantmod','DEP_relcl','DEP_xcomp']
for feature in predictorColumns:
if feature not in wordsDf.columns:
wordsDf[feature] = 0
return wordsDf
def predictWords(wordsDf, df):
predictorPickleName = 'data/pickles/nb-predictor.pkl'
predictor = loadPickle(predictorPickleName)
print("predictor:",predictor)
y_pred = predictor.predict_proba(wordsDf)
print("y_pred:",y_pred)
labeledAnswers = []
for i in range(len(y_pred)):
labeledAnswers.append({'word': df.iloc[i]['text'], 'prob': y_pred[i][0]})
return labeledAnswers
def blankAnswer(firstTokenIndex, lastTokenIndex, sentStart, sentEnd, doc):
leftPartStart = doc[sentStart].idx
leftPartEnd = doc[firstTokenIndex].idx
rightPartStart = doc[lastTokenIndex].idx + len(doc[lastTokenIndex])
rightPartEnd = doc[sentEnd - 1].idx + len(doc[sentEnd - 1])
question = doc.text[leftPartStart:leftPartEnd] + '_____' + doc.text[rightPartStart:rightPartEnd]
return question
def addQuestions(answers, text):
doc = nlp(text)
currAnswerIndex = 0
qaPair = []
#Check wheter each token is the next answer
for sent in doc.sents:
for token in sent:
#If all the answers have been found, stop looking
if currAnswerIndex >= len(answers):
break
#In the case where the answer is consisted of more than one token, check the following tokens as well.
answerDoc = nlp(answers[currAnswerIndex]['word'])
answerIsFound = True
for j in range(len(answerDoc)):
if token.i + j >= len(doc) or doc[token.i + j].text != answerDoc[j].text:
answerIsFound = False
#If the current token is corresponding with the answer, add it
if answerIsFound:
question = blankAnswer(token.i, token.i + len(answerDoc) - 1, sent.start, sent.end, doc)
qaPair.append({'question' : question, 'answer': answers[currAnswerIndex]['word'], 'prob': answers[currAnswerIndex]['prob']})
currAnswerIndex += 1
return qaPair
def sortAnswers(qaPairs):
orderedQaPairs = sorted(qaPairs, key=lambda qaPair: qaPair['prob'])
return orderedQaPairs
glove_file = 'data/embeddings/glove.6B.300d.txt'
tmp_file = 'data/embeddings/word2vec-glove.6B.300d.txt'
glove2word2vec(glove_file, tmp_file)
model = KeyedVectors.load_word2vec_format(tmp_file)
def generate_distractors(answer,count):
# print("Answer:",answer)
# count = 3
# print("Count:",count)
answer = str.lower(answer)
##Extracting closest words for the answer.
try:
closestWords = model.most_similar(positive=[answer], topn=count)
# print("ClosestWords:",closestWords)
except:
#In case the word is not in the vocabulary, or other problem not loading embeddings
return []
#Return count many distractors
distractors = list(map(lambda x: x[0], closestWords))[0:count]
# print("distractors:",distractors)
return distractors
def addDistractors(qaPairs, count):
for qaPair in qaPairs:
distractors = generate_distractors(qaPair['answer'], count)
qaPair['distractors'] = distractors
return qaPairs
def generateQuestions(text, count):
# print("text:",text)
# Extract words
df = generateDf(text)
# print("DF:",df)
wordsDf = prepareDf(df)
# print("wordsdf:",wordsDf)
# print("DF:",df)
# Predict
labeledAnswers = predictWords(wordsDf, df)
# Transform questions
qaPairs = addQuestions(labeledAnswers, text)
# Pick the best questions
orderedQaPairs = sortAnswers(qaPairs)
# Generate distractors
questions = addDistractors(orderedQaPairs[:count], 3)
# print("QQQQQQQQQ:",questions)
for i in range(count):
dic1 = {}
dic2 = {}
questions[i]['distractors'].append(questions[i]['answer'])
options = questions[i]['distractors']
random.shuffle(options)
dic1['A--Question'] = questions[i]['question']
dic1['B--Answer'] = questions[i]['answer']
# dic2[i] = dic1
list1 = []
for distractor in options:
list1.append(distractor)
# print("LIST1:",list1)
dic1['C--Options'] = list1
final[i] = dic1
# print("final:",final)
return final
generateQuestions(data,5)
return render_template("output.html",final_data = final)
if name == "main":
app.run(debug=True)
-
while trying to upload txt file i am getting the output as like
-
while trying to upload pdf file i am getting the output like
I am little confused to find out the issue i am getting could you help me if you need more information i will help you
Thanks & Regards,
Manikantha Sekhar.
from python-machine-learning-book.
Hi there,
it doesn't look like any code from the book?
from python-machine-learning-book.
Related Issues (20)
- Machine learning HOT 1
- i need to find somebody who can automate a trading strategy HOT 1
- ch.06(Tuning hyperparameters via grid search) HOT 4
- mlxtend no longer has tf_classifier HOT 3
- ValueError: operands could not be broadcast together with shapes HOT 1
- Chapter 8: Shuffling the DataFrame in newer versions of pandas HOT 1
- ValueError: operands could not be broadcast together with shapes (400,2) (400,)
- AttributeError: 'SGDClassifier' object has no attribute 'max_iter' HOT 1
- Chapter 15: Padding modes figure HOT 1
- Notebook error HOT 2
- IndexError: too many indices for array in CH-5 PCA Plot Code HOT 3
- Python Machine learning
- ValueError: operands could not be broadcast together with shapes (200,) (30000,) HOT 1
- Machine Learning (Python)
- Hello Kaggle! It is a guide for new kaggler
- How to Use the code files HOT 2
- [QUESTION] in ch3.py line 64 HOT 3
- why this is the self.eta*X.T.dot(errors) rather than the -self.eta*X.T.dot(errors) HOT 1
- Graph of curve is misleading HOT 1
- Python machine learning HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-machine-learning-book.