For today's warmup, we will practice our matplotlib visualization skills while visualizing bird sighting data gathered from the Cornell Ornithology eBird API.
import pickle
import matplotlib.pyplot as plt
with open('data/central_park_birds', 'rb') as read_file:
central_park_birds = pickle.load(read_file)
The dictionary loaded above contains recent counts of bird species sighted in Central Park.
Recent visitors to the park include this little dude:
Your first task is to make a bar chart visualizing the counts of the top 10 most commonly sighted birds in the park. You should aspire to use the fig, ax = plt.subplots() syntax, but if you're not there yet, use the plt stateless syntax.
The first challenge is to sort the dictionary. Google how to do so.
# Your code here
sorted_birds = None
# If you are spending to much time on that, run this cell to return the sorted object.
with open('data/sorted_birds', 'rb') as read_file:
sorted_birds = pickle.load(read_file)
Next, select the first 10 birds in the sorted_birds object. Then create two separate lists: one list containing the bird names, and one list containing the counts. These lists should preserve the relationship between name and count by aligned indices.
top_10_bird_names = None
top_10_bird_counts = None
Now, create a bar chart with these two lists. You may have some tick problems, which I'll let you sort out your own solution to. Don't forget plt.subplots()
Also, don't forget a title and axis labels
# Your code here
Now we will broaden our territory, but focus on a single species: The American Crow.
with open('data/crow_counts', 'rb') as read_file:
cp_crows, loop_crows, seattle_crows = pickle.load(read_file)
The above cell loaded 3 lists of crow sighting counts near the central latitude/longitude of Central Park, Chicago's Loop, and Seattle.
One could say each element represents a murder.
The first part of the task is to make a histogram visualizing the distribution of number of crows per sighting.
So, firstly, you must create a new list that merges all 3 lists.
# Your code here
all_crows = None
Now, create a histogram that shows the distribution of crow counts.
# Your code here
Your histogram should have looked pretty boring: 1 large spike near zero, with an x-axis range from 0 to around 10000.
Pray for the people of Seattle if that major outlier is not an input error.
Next, let's plot a boxplot to see what it tells us.
# Your code here
The boxplot should also look odd. Those little circles are the outliers. Recreate the boxplot, but this time remove the outliers via the showfliers
argument.
By removing the fliers, we can see the components of the boxplot: the median at 3 crows, and the top and bottom edges of the box representing the interquartile ranges, and the whiskers, which are 1.5 times the IQR above the top of the box.
If you assign the boxplot axis to a variable (perhaps bp), you can then access information about the plot. We can get the value of the end of the whisker, i.e. the value whereafter matplotlib designates points as outliers, via the following code bp['whiskers'][1].get_ydata()
bp['whiskers'][1].get_ydata()
array([ 5., 11.])
Given the above output, reduce the original all_crows object to contain only non-outlier elements.
# your code here
Lastly, create a histogram without outliers.
# your code here