Git Product home page Git Product logo

data-wrangling's People

Contributors

dependabot[bot] avatar jackiekazil avatar kjam avatar samedhaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-wrangling's Issues

"Table 9" sheet does not exist in "SOWC 2014 Stat Tables_Table 9.xlsx"

But "Table 9 " sheet exists. (Blank character at the end)
use data: https://github.com/jackiekazil/data-wrangling/blob/master/data/chp4/SOWC%202014%20Stat%20Tables_Table%209.xlsx

sample code

import xlrd
book = xlrd.open_workbook('SOWC 2014 Stat Tables_Table 9.xlsx')
sheet = book.sheet_by_name('Table 9')
print sheet

result:

$ python test.py
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    sheet = book.sheet_by_name('Table 9')
  File "/env27/lib/python2.7/site-packages/xlrd/book.py", line 441, in sheet_by_name
    raise XLRDError('No sheet named <%r>' % sheet_name)
xlrd.biffh.XLRDError: No sheet named <'Table 9'>

exists sheet name code:

import xlrd
book = xlrd.open_workbook('SOWC 2014 Stat Tables_Table 9.xlsx')
sheet = book.sheet_by_name('Table 9 ')  # <- MODIFIED
print sheet

result:

$ python test.py
<xlrd.sheet.Sheet object at 0x102a50f90>

UnicodeDecodeError: 'gbk' codec can't decode byte when running parse_pdf_text.py

Hi, thank you for your wonderful book on data wrangling
I encountered some issue when I was running the parse_pdf_text.py of chapter 5 in anaconda (python3.5)
The IDE show me the followning error message

Traceback (most recent call last):

  File "<ipython-input-10-957ab6bc6f5e>", line 39, in <module>
    for line in openfile:

UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 46: illegal multibyte sequence

it looks like the code opened the file in text mode with a "gbk" encoding. It should probably be opened in binary mode? I'm not sure. How can I fix this problem? thank you.

raw download issue

GitHub displayed error: "(Sorry about that, but we can’t show files that are this big right now.)" but allowed option to View Raw.

Raw view doesn't include tabs, making code provided in book throw an error.

Here is alternate (Python 3) code from pydocs that worked for me:

from xml.etree import ElementTree as ET

# this opens data-text.xml and read it as a string into variable read_xml
read_xml = open('data-text.xml').read()

# this finds the root from xml stored as a string
root = ET.fromstring(read_xml)
print(root) # for Python 2.7 use > print root

OK, u's guys. What's going on?

While running the code in Chapter 3, importing the .json data file, the script worked fine, but the output was thus:

{u'Indicator': u'Healthy life expectancy (HALE) at birth (years)', u'Country': u'Zambia', u'Comments': u'', u'Display Value': 36, u'World Bank income group': u'Low-income', u'Numeric': 36.0, u'Sex': u'Both sexes', u'High': u'', u'Low': u'', u'Year': 2000, u'WHO region': u'Africa', u'PUBLISH STATES': u'Published'}
{u'Indicator': u'Healthy life expectancy (HALE) at birth (years)', u'Country': u'Zimbabwe', u'Comments': u'', u'Display Value': 51, u'World Bank income group': u'Low-income', u'Numeric': 51.0, u'Sex': u'Female', u'High': u'', u'Low': u'', u'Year': 2012, u'WHO region': u'Africa', u'PUBLISH STATES': u'Published'}

Any idea why all the "u's"?

python code error

(null): can't open file 'python.py': [Errno 2] No such file or directory
how to solve this problem

The data can't be downloaded

The data can't be downloaded from this website, which bring me so much difficult that I can't learn it smoothly, would you please help to solve this issue?

CH4 Page 76 - Parse Excel Setup

I've created a folder on the desktop, inserted the SOWC 2014 Stat Tables_Table 9.xlsx along with parse_excel.py.

It says to now run 'python parse_script.py' from the command line, which gives the following:
C:\>python parse_script.py python: can't open file 'parse_script.py': [Errno 2] No such file or directory

Also, I cannot store the opened file in the book variable:
book = xlrd.open_workbook('SOWC 2014 Stat Tables_Table 9.xlsx')

>>> book = xlrd.open_workbook('SOWC 2014 Stat Tables_Table 9.xlsx') Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> book = xlrd.open_workbook('SOWC 2014 Stat Tables_Table 9.xlsx') File "C:\Python\Python36\lib\site-packages\xlrd\__init__.py", line 111, in open_workbook with open(filename, "rb") as f: FileNotFoundError: [Errno 2] No such file or directory: 'SOWC 2014 Stat Tables_Table 9.xlsx'

Is a Python 3 version of this book in the works or available?

This isn't an issue per se. I love the style of your writing; it is clear that the you wanted to focus upon educating the reader, instead of making sure you delved into all the gory details of each method.

However, Python 2.7 is now getting very old, especially considering your book was fairly recently published. I was curious if a Python 3 version is either in the works or if you already have one available somewhere (perhaps provided to folks who can prove they bought the book).

Thank you!
Regards,
Mike

PDFSyntaxError('No /Root object! - Is this really a PDF?')

code like this:
import slate with open('xxx.pdf') as f: doc = slate.PDF(f)
raise problem:
Traceback (most recent call last):
File "", line 2, in
File "C:\Python27\lib\site-packages\slate\slate.py", line 38, in init
self.doc.set_parser(self.parser)
File "C:\Python27\lib\site-packages\pdfminer\pdfparser.py", line 333, in set_parser
raise PDFSyntaxError('No /Root object! - Is this really a PDF?')
pdfminer.pdfparser.PDFSyntaxError: No /Root object! - Is this really a PDF?

"How to import XML Data" on page 57 with Python 3.6.3.

I try to run "How to import XML Data" on page 57 with Python 3.6.3. However I get the following errors:
TypeError: unhashable type: 'dict_keys'.;
TypeError: 'dict_keys' object does not support indexing.
Unsuccessfully I have tried to write the code with a 'list' in Python 3.6.3.
Can you please give me the code for Python 3.6.3.?
Most appreciated ... Johannes

Problem with XML exercise, Chapter 3

Hello,

I was working through the exercises and had a problem when I tried:

from xml.etree import ElementTree as ET

tree = ET.parse('data-text.xml')
root = tree.getroot()


print list(root) 

(pages 57-61). It wouldn't list the elements in the list. I ended up writing a for loop to get it to look like the example in the book (bottom of page 60, top of page 61):

import xml.etree.ElementTree as ET

tree = ET.parse('data-text.xml')
root = tree.getroot()

# data = root.find('Data')

for element in root:
	print element

Not sure if I did that correctly or efficiently, but this worked!

Thank you for writing the book.

Edit:
Python version:

Python 2.7.10 (default, Jul 30 2016, 19:40:32)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Mac OS: Sierra 10.12.3

Thanks!

Chapter 4. Working with Excel Files

Hi Jackie:

I think this code has a little bug on line that says 'if count < 10:' because I got empty list as the result.
count = 0
data = {}
for i in xrange(sheet.nrows):
if count < 10:
if i >= 14:
row = sheet.row_values(i)
country = row[1]
data[country] = {}
count += 1

print data

After doing some investigation, I found that it should be written as 'if count < 20' as follows:
count = 0
data = {}
for i in xrange(sheet.nrows):
if count < 20:
if i >= 14:
row = sheet.row_values(i)
country = row[1]
data[country] = {}
count += 1

print data

noob question regarding JSON

I noticed on Chap 3 that the JSON structure on the sample data starts with [ (open-bracket) on the first row and ends with a ] (close-bracket); however, the original WHO JSON data only starts with { and ends with }. Can Python handle JSON data that does not start with a bracket? If I run the book example using the original WHO data, I don't get a full listing of the contents of the data, but just the 5 top-level objects.

Thanks.

Chapter 8 our_first_script_with_functions.py issue

When the script runs the following function I get the following error.

def find_missing_data(zipped_data):
missing_count = 0
for question, answer in zipped_data:
if not answer:
missing_count += 1
return missing_count

_in find_missing_data
for question, answer in zipped_data:
ValueError: too many values to unpack

Chapter 5, pg. 97 possible error

@jackiekazil I am trying to work through chapter 5 and having lots of problems with slate, pdfminer, etc. Specifically, I get no print results from this (p. 97)--any recommendations? I've exhausted google searches and stackoverflow for possible solutions.

`pdf_txt = 'en-final-table9.txt
openfile = open(pdf_txt, 'r')

for line in openfile:
print (line)`

An error on chapter 7.2.1

The code reading csv data and using list comprehension in this chapter gets an error in my Windows:
"Error: iterator should return strings, not bytes (did you open the file in
text mode?)"
BTW, the code is:
`from csv import DictReader
data_rdr = DictReader(open(r'F:\Learn\python data wrangling\data-wrangling-master\data\unicef\mn.csv','rb'))
header_rdr = DictReader(open(r'F:\Learn\python data wrangling\data-wrangling-master\data\unicef\mn_headers.csv','rb'))

data_rows = [d for d in data_rdr]
header_rows = [h for h in header_rdr]

print(data_rows[:5])
print(header_rows[:5])`
I searched in stackoverflow, which says

You need to wrap the file in a io.TextIOWrapper() instance, and you need to figure out the encoding

Is this correct?
I am using Python 3.5.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.