Git Product home page Git Product logo

dsc-1-03-04-importing-data-using-pandas's Introduction

Importing Data Using Pandas

Introduction

We've already used Pandas to import data in previous lessons, but in this lesson we'll take a little longer to dive into what's actually going on.

Objectives

You will be able to:

  • Import data from csv files and Excel files
  • Understand and explain key arguments for imports
  • Save information to csv and Excel files
  • Access data within a Pandas DataFrame (print() and .head())

Loading Pandas

As usual, we import pandas under the standard alias pd

import pandas as pd

Importing Data

There are a few main functions for importing data into a pandas DataFrame including:

  • pd.read_csv()
  • pd.read_excel()
  • pd.read_json()
  • pd.DataFrame.from_dict()

Most of these methods are fairly straightforward; you use .read_csv() for csv files, .read_excel() for excel files (both new and old .xlx and .xlsx) and .read_json() for json files. That said, there are a few nuances you should know about. The first is that the .read_csv() format can be used for any plain-text delimited file. This may include (but is not limited to) pipe (|) delimited files (.psv) and tab seperated files (.tsv).

Let's look at an example by investigating a file, bp.txt, stored in the Data folder.

df = pd.read_csv('Data/bp.txt', delimiter='\t')
df.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Pt BP Age Weight BSA Dur Pulse Stress
0 1 105 47 85.4 1.75 5.1 63 33
1 2 115 49 94.2 2.10 3.8 70 14
2 3 116 49 95.3 1.98 8.2 72 10
3 4 117 50 94.7 2.01 5.8 73 99
4 5 112 51 89.4 1.89 7.0 72 95

This example shows that the data was tab delimited (\t), so an appropriate file extension could have also been .tsv. Once we've loaded the dataset, we can export it to any format we would like with the related methods:

  • df.to_csv()
  • df.to_excel()
  • df.to_json()
  • df.to_dict()

There are also several other options available, but these are the most common.

Skipping and Limiting Rows

Another feature that you may have to employ is skipping rows when there is metadata stored at the top of a file. You can do this using the optional paramater skiprows. Similarly, if you want to only load a portion of a large file as an initial preview, you can use the nrows parameter.

df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', nrows=100)
df.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
GEO.id GEO.id2 GEO.display-label HD01_VD01 HD02_VD01 HD01_VD02 HD02_VD02 HD01_VD03 HD02_VD03 HD01_VD04 ... HD01_VD32 HD02_VD32 HD01_VD33 HD02_VD33 HD01_VD34 HD02_VD34 HD01_VD35 HD02_VD35 HD01_VD36 HD02_VD36
0 Id Id2 Geography Estimate; Total: Margin of Error; Total: Estimate; Total: - Management, business, scien... Margin of Error; Total: - Management, business... Estimate; Total: - Management, business, scien... Margin of Error; Total: - Management, business... Estimate; Total: - Management, business, scien... ... Estimate; Total: - Natural resources, construc... Margin of Error; Total: - Natural resources, c... Estimate; Total: - Production, transportation,... Margin of Error; Total: - Production, transpor... Estimate; Total: - Production, transportation,... Margin of Error; Total: - Production, transpor... Estimate; Total: - Production, transportation,... Margin of Error; Total: - Production, transpor... Estimate; Total: - Production, transportation,... Margin of Error; Total: - Production, transpor...
1 0500000US01001 01001 Autauga County, Alabama 33267 2306 48819 1806 55557 4972 63333 ... 31402 5135 35594 3034 36059 3893 47266 13608 19076 4808
2 0500000US01003 01003 Baldwin County, Alabama 31540 683 49524 1811 57150 6980 63422 ... 35603 3882 30549 1606 29604 4554 35504 6260 24182 3580
3 0500000US01005 01005 Barbour County, Alabama 26575 1653 41652 2638 51797 5980 52775 ... 37847 11189 26094 4884 25339 4900 37282 6017 16607 3497
4 0500000US01007 01007 Bibb County, Alabama 30088 2224 40787 2896 50069 12841 67917 ... 45952 5622 28983 3401 31881 2317 26580 2901 23479 4942

5 rows × 75 columns

Notice the first row is descriptions of the variables

We could manually remove:

df = df.drop(0)
df.head(2)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
GEO.id GEO.id2 GEO.display-label HD01_VD01 HD02_VD01 HD01_VD02 HD02_VD02 HD01_VD03 HD02_VD03 HD01_VD04 ... HD01_VD32 HD02_VD32 HD01_VD33 HD02_VD33 HD01_VD34 HD02_VD34 HD01_VD35 HD02_VD35 HD01_VD36 HD02_VD36
1 0500000US01001 01001 Autauga County, Alabama 33267 2306 48819 1806 55557 4972 63333 ... 31402 5135 35594 3034 36059 3893 47266 13608 19076 4808
2 0500000US01003 01003 Baldwin County, Alabama 31540 683 49524 1811 57150 6980 63422 ... 35603 3882 30549 1606 29604 4554 35504 6260 24182 3580

2 rows × 75 columns

Or if we knew from the start, we could use the skiprows argument:

df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', skiprows=1, nrows=100)
df.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Id Id2 Geography Estimate; Total: Margin of Error; Total: Estimate; Total: - Management, business, science, and arts occupations: Margin of Error; Total: - Management, business, science, and arts occupations: Estimate; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: Margin of Error; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: Estimate; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: - Management occupations ... Estimate; Total: - Natural resources, construction, and maintenance occupations: - Installation, maintenance, and repair occupations Margin of Error; Total: - Natural resources, construction, and maintenance occupations: - Installation, maintenance, and repair occupations Estimate; Total: - Production, transportation, and material moving occupations: Margin of Error; Total: - Production, transportation, and material moving occupations: Estimate; Total: - Production, transportation, and material moving occupations: - Production occupations Margin of Error; Total: - Production, transportation, and material moving occupations: - Production occupations Estimate; Total: - Production, transportation, and material moving occupations: - Transportation occupations Margin of Error; Total: - Production, transportation, and material moving occupations: - Transportation occupations Estimate; Total: - Production, transportation, and material moving occupations: - Material moving occupations Margin of Error; Total: - Production, transportation, and material moving occupations: - Material moving occupations
0 0500000US01001 1001 Autauga County, Alabama 33267 2306 48819 1806 55557 4972 63333 ... 31402 5135 35594 3034 36059 3893 47266 13608 19076 4808
1 0500000US01003 1003 Baldwin County, Alabama 31540 683 49524 1811 57150 6980 63422 ... 35603 3882 30549 1606 29604 4554 35504 6260 24182 3580
2 0500000US01005 1005 Barbour County, Alabama 26575 1653 41652 2638 51797 5980 52775 ... 37847 11189 26094 4884 25339 4900 37282 6017 16607 3497
3 0500000US01007 1007 Bibb County, Alabama 30088 2224 40787 2896 50069 12841 67917 ... 45952 5622 28983 3401 31881 2317 26580 2901 23479 4942
4 0500000US01009 1009 Blount County, Alabama 34900 2063 46593 2963 47003 6189 50991 ... 42489 7176 32969 3767 31814 4551 41375 5280 26755 2963

5 rows × 75 columns

Header

Relatedly to skiprows is the header option. This specifies the row where column names are and starts the load from that point:

df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', header=1)
df.head()
---------------------------------------------------------------------------

UnicodeDecodeError                        Traceback (most recent call last)

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens (pandas\_libs\parsers.c:14858)()


pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype (pandas\_libs\parsers.c:17119)()


pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert (pandas\_libs\parsers.c:17347)()


pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8 (pandas\_libs\parsers.c:23041)()


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte


During handling of the above exception, another exception occurred:


UnicodeDecodeError                        Traceback (most recent call last)

<ipython-input-6-a2d61668e2c4> in <module>()
----> 1 df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', header=1)
      2 df.head()


~\Anaconda3wenv\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    653                     skip_blank_lines=skip_blank_lines)
    654 
--> 655         return _read(filepath_or_buffer, kwds)
    656 
    657     parser_f.__name__ = name


~\Anaconda3wenv\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    409 
    410     try:
--> 411         data = parser.read(nrows)
    412     finally:
    413         parser.close()


~\Anaconda3wenv\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1003                 raise ValueError('skipfooter not supported for iteration')
   1004 
-> 1005         ret = self._engine.read(nrows)
   1006 
   1007         if self.options.get('as_recarray'):


~\Anaconda3wenv\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1746     def read(self, nrows=None):
   1747         try:
-> 1748             data = self._reader.read(nrows)
   1749         except StopIteration:
   1750             if self._first_chunk:


pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)()


pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)()


pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:12175)()


pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data (pandas\_libs\parsers.c:14136)()


pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens (pandas\_libs\parsers.c:14972)()


pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype (pandas\_libs\parsers.c:17119)()


pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert (pandas\_libs\parsers.c:17347)()


pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8 (pandas\_libs\parsers.c:23041)()


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte

Encoding

Encoding errors like the one above are always frustrating. This has to do with how the strings within the file itself are formatted. The most common encoding other then utf-8 that you are likely to come across is latin-1.

df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', header=1, encoding='latin-1')
df.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Id Id2 Geography Estimate; Total: Margin of Error; Total: Estimate; Total: - Management, business, science, and arts occupations: Margin of Error; Total: - Management, business, science, and arts occupations: Estimate; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: Margin of Error; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: Estimate; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: - Management occupations ... Estimate; Total: - Natural resources, construction, and maintenance occupations: - Installation, maintenance, and repair occupations Margin of Error; Total: - Natural resources, construction, and maintenance occupations: - Installation, maintenance, and repair occupations Estimate; Total: - Production, transportation, and material moving occupations: Margin of Error; Total: - Production, transportation, and material moving occupations: Estimate; Total: - Production, transportation, and material moving occupations: - Production occupations Margin of Error; Total: - Production, transportation, and material moving occupations: - Production occupations Estimate; Total: - Production, transportation, and material moving occupations: - Transportation occupations Margin of Error; Total: - Production, transportation, and material moving occupations: - Transportation occupations Estimate; Total: - Production, transportation, and material moving occupations: - Material moving occupations Margin of Error; Total: - Production, transportation, and material moving occupations: - Material moving occupations
0 0500000US01001 1001 Autauga County, Alabama 33267 2306 48819 1806 55557 4972 63333 ... 31402 5135 35594 3034 36059 3893 47266 13608 19076 4808
1 0500000US01003 1003 Baldwin County, Alabama 31540 683 49524 1811 57150 6980 63422 ... 35603 3882 30549 1606 29604 4554 35504 6260 24182 3580
2 0500000US01005 1005 Barbour County, Alabama 26575 1653 41652 2638 51797 5980 52775 ... 37847 11189 26094 4884 25339 4900 37282 6017 16607 3497
3 0500000US01007 1007 Bibb County, Alabama 30088 2224 40787 2896 50069 12841 67917 ... 45952 5622 28983 3401 31881 2317 26580 2901 23479 4942
4 0500000US01009 1009 Blount County, Alabama 34900 2063 46593 2963 47003 6189 50991 ... 42489 7176 32969 3767 31814 4551 41375 5280 26755 2963

5 rows × 75 columns

Selecting Specific Columns

You can also specify specific columns if you only want to load specific features.

df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', usecols=[0,1,2,5,6], encoding='latin-1')
df.head(2)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
GEO.id GEO.id2 GEO.display-label HD01_VD02 HD02_VD02
0 Id Id2 Geography Estimate; Total: - Management, business, scien... Margin of Error; Total: - Management, business...
1 0500000US01001 01001 Autauga County, Alabama 48819 1806

Or

df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', usecols=['GEO.id', 'GEO.id2'], encoding='latin-1')
df.head(2)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
GEO.id GEO.id2
0 Id Id2
1 0500000US01001 01001

Selecting Specific Sheets

You can also specify specific sheets for Excel files!

import pandas as pd

This can be done by index number

df1 = pd.read_excel('Data/Yelp_Selected_Businesses.xlsx')
df1.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8
0 business_id cool date funny review_id stars text useful user_id
1 RESDUcs7fIiihp38-d6_6g 0 2015-09-16 0 gkcPdbblTvZDMSwx8nVEKw 5 Got here early on football Sunday 7:30am as I ... 0 SKteB5rgDlkkUa1Zxe1N0Q
2 RESDUcs7fIiihp38-d6_6g 0 2017-09-09 0 mQfl6ci46mu0xaZrkRUhlA 5 This buffet is amazing. Yes, it is expensive,... 0 f638AHA_GoHbyDB7VFMz7A
3 RESDUcs7fIiihp38-d6_6g 0 2013-01-14 0 EJ7DJ8bm7-2PLFB9WKx4LQ 3 I was really looking forward to this but it wa... 0 -wVPuTiIEG85LwTK46Prpw
4 RESDUcs7fIiihp38-d6_6g 0 2017-02-08 0 lMarDJDg4-e_0YoJOKJoWA 2 This place....lol our server was nice. But fo... 0 A21zMqdN76ueLZFpmbue0Q
df2 = pd.read_excel('Data/Yelp_Selected_Businesses.xlsx', sheet_name=2)
df2.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8
0 business_id cool date funny review_id stars text useful user_id
1 YJ8ljUhLsz6CtT_2ORNFmg 1 2013-04-25 0 xgUz0Ck4_ciNaeIk-H8GBQ 5 I loved this place. Easily the most hipsters p... 1 6cpo8iqgnW3jnozhmY7eAA
2 YJ8ljUhLsz6CtT_2ORNFmg 0 2014-07-07 0 Au7MG4QlAxqq9meyKSQmaw 5 So my boyfriend and I came here for my birthda... 0 8bFE3u1dMoYXkS7ORqlssw
3 YJ8ljUhLsz6CtT_2ORNFmg 0 2015-12-04 0 8IQnZ54nenXjlK-FGZ82Bg 5 I really enjoyed their food. Went there for th... 1 bJmE1ms0MyZ6KHjmfZDWGw
4 YJ8ljUhLsz6CtT_2ORNFmg 2 2016-07-06 1 XY42LMhKoXzwtLoku4mvLA 5 A complete Vegas experience. We arrived right ... 3 PbccpC-I-8rxzF2bCDh8YA

Or the name of the sheet itself

df = pd.read_excel('Data/Yelp_Selected_Businesses.xlsx', sheet_name='Biz_id_RESDU')
df.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8
0 business_id cool date funny review_id stars text useful user_id
1 RESDUcs7fIiihp38-d6_6g 0 2015-09-16 0 gkcPdbblTvZDMSwx8nVEKw 5 Got here early on football Sunday 7:30am as I ... 0 SKteB5rgDlkkUa1Zxe1N0Q
2 RESDUcs7fIiihp38-d6_6g 0 2017-09-09 0 mQfl6ci46mu0xaZrkRUhlA 5 This buffet is amazing. Yes, it is expensive,... 0 f638AHA_GoHbyDB7VFMz7A
3 RESDUcs7fIiihp38-d6_6g 0 2013-01-14 0 EJ7DJ8bm7-2PLFB9WKx4LQ 3 I was really looking forward to this but it wa... 0 -wVPuTiIEG85LwTK46Prpw
4 RESDUcs7fIiihp38-d6_6g 0 2017-02-08 0 lMarDJDg4-e_0YoJOKJoWA 2 This place....lol our server was nice. But fo... 0 A21zMqdN76ueLZFpmbue0Q

Loading a Full Workbook and Previewing Sheetnames

You can also load an entire excel workbook (which is a collection of spreadsheets) with the pd.ExcelFile() method.

workbook = pd.ExcelFile('Data/Yelp_Selected_Businesses.xlsx')
workbook.sheet_names
['Biz_id_RESDU',
 'Biz_id_4JNXU',
 'Biz_id_YJ8lj',
 'Biz_id_ujHia',
 'Biz_id_na4Th']
df = workbook.parse(sheet_name=1)
df.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8
0 business_id cool date funny review_id stars text useful user_id
1 4JNXUYY8wbaaDmk3BPzlWw 0 2012-06-10 0 wl8BO_I-is-JaMwMW5c_gQ 4 I booked a table here for brunch and it did no... 0 fo4mpUqgXL2mJqALc9AvbA
2 4JNXUYY8wbaaDmk3BPzlWw 0 2012-01-20 0 cf9RrqHY9eQ9M53OPyXLtg 4 Came here for lunch after a long night of part... 0 TVvTtXwPXsvrg2KJGoOUTg
3 4JNXUYY8wbaaDmk3BPzlWw 0 2017-05-10 0 BvmhSQ6WFm2Jxu01G8OpdQ 5 Loved the fried goat cheese in tomato sauce al... 0 etbAVunw-4kwr6VTRweZpA
4 4JNXUYY8wbaaDmk3BPzlWw 0 2014-05-03 0 IoKp9n1489XohTV_-EJ0IQ 5 Love the outdoor atmosphere. Price was right, ... 0 vKXux2Xx3xcicTgYZoR0pg

Saving Data

Once we have data loaded that we may want to export back out, we use the .to_csv() or .to_excel() methods of any dataframe object.

df.to_csv('NewSavedView.csv', index=False) #Notice how we have to pass index=False if we do not want it included in our output
df.to_excel('NewSavedView.xlsx')

Summary

We've spent some time looking into how data importing with Pandas works and some of the methods you can use to manage the import and access the data. In the next lesson, lets get some hands on practice!

dsc-1-03-04-importing-data-using-pandas's People

Contributors

mathymitchell avatar mike-kane avatar peterbell avatar tkoar avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dsc-1-03-04-importing-data-using-pandas's Issues

Paragraph: "Header"

The code in the paragraph stated above using the header=1 argument doesn't work and gives this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte

Error importing pandas

When I shift+enter on the first code snippet (import pandas as pd) I am getting the following error:

/opt/conda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/opt/conda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

Error at header section

Executing the following cell

df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', header=1)
df.head()

creates the following error

UnicodeDecodeErrorTraceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte

During handling of the above exception, another exception occurred:

UnicodeDecodeErrorTraceback (most recent call last)
<ipython-input-13-139234aa94b9> in <module>
----> 1 df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', header=1)
      2 df.head()

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    444 
    445     try:
--> 446         data = parser.read(nrows)
    447     finally:
    448         parser.close()

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
   1034                 raise ValueError('skipfooter not supported for iteration')
   1035 
-> 1036         ret = self._engine.read(nrows)
   1037 
   1038         # May alter columns / col_dict

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
   1846     def read(self, nrows=None):
   1847         try:
-> 1848             data = self._reader.read(nrows)
   1849         except StopIteration:
   1850             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte

Error under "Selecting Specific Sheets"

----Code -----

df1 = pd.read_excel('Data/Yelp_Selected_Businesses.xlsx')
df1.head()

----Error -----

ModuleNotFoundErrorTraceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/io/excel.py in init(self, io, **kwds)
351 try:
--> 352 import xlrd
353 except ImportError:

ModuleNotFoundError: No module named 'xlrd'

During handling of the above exception, another exception occurred:

ImportErrorTraceback (most recent call last)
in
----> 1 df1 = pd.read_excel('Data/Yelp_Selected_Businesses.xlsx')
2 df1.head()

/opt/conda/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
176 else:
177 kwargs[new_arg_name] = new_arg_value
--> 178 return func(*args, **kwargs)
179 return wrapper
180 return _deprecate_kwarg

/opt/conda/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
176 else:
177 kwargs[new_arg_name] = new_arg_value
--> 178 return func(*args, **kwargs)
179 return wrapper
180 return _deprecate_kwarg

/opt/conda/lib/python3.6/site-packages/pandas/io/excel.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds)
305
306 if not isinstance(io, ExcelFile):
--> 307 io = ExcelFile(io, engine=engine)
308
309 return io.parse(

/opt/conda/lib/python3.6/site-packages/pandas/io/excel.py in init(self, io, **kwds)
352 import xlrd
353 except ImportError:
--> 354 raise ImportError(err_msg)
355 else:
356 ver = tuple(map(int, xlrd.VERSION.split(".")[:2]))

ImportError: Install xlrd >= 0.9.0 for Excel support

Code under 'Header' Section Gives Error

UnicodeDecodeErrorTraceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte

During handling of the above exception, another exception occurred:

UnicodeDecodeErrorTraceback (most recent call last)
in
----> 1 df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', header=1)
2 df.head()

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
676 skip_blank_lines=skip_blank_lines)
677
--> 678 return _read(filepath_or_buffer, kwds)
679
680 parser_f.name = name

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
444
445 try:
--> 446 data = parser.read(nrows)
447 finally:
448 parser.close()

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
1034 raise ValueError('skipfooter not supported for iteration')
1035
-> 1036 ret = self._engine.read(nrows)
1037
1038 # May alter columns / col_dict

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
1846 def read(self, nrows=None):
1847 try:
-> 1848 data = self._reader.read(nrows)
1849 except StopIteration:
1850 if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte

Missing package?

When you run df.to_excel('NewSavedView.xlsx') towards the bottom on learn it states

ModuleNotFoundErrorTraceback (most recent call last)
in ()
----> 1 df.to_excel('NewSavedView.xlsx')

/opt/conda/envs/learn-env/lib/python3.6/site-packages/pandas/core/frame.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose, freeze_panes)
1764 formatter.write(excel_writer, sheet_name=sheet_name, startrow=startrow,
1765 startcol=startcol, freeze_panes=freeze_panes,
-> 1766 engine=engine)
1767
1768 def to_stata(self, fname, convert_dates=None, write_index=True,

/opt/conda/envs/learn-env/lib/python3.6/site-packages/pandas/io/formats/excel.py in write(self, writer, sheet_name, startrow, startcol, freeze_panes, engine)
644 need_save = False
645 else:
--> 646 writer = ExcelWriter(_stringify_path(writer), engine=engine)
647 need_save = True
648

/opt/conda/envs/learn-env/lib/python3.6/site-packages/pandas/io/excel.py in init(self, path, engine, **engine_kwargs)
996 def init(self, path, engine=None, **engine_kwargs):
997 # Use the openpyxl module as the Excel writer.
--> 998 from openpyxl.workbook import Workbook
999
1000 super(_OpenpyxlWriter, self).init(path, **engine_kwargs)

ModuleNotFoundError: No module named 'openpyxl'

Header error

UnicodeDecodeErrorTraceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.