This short lesson summarizes the topics we covered in this section and why they'll be important to you as a data scientist.
This section was all about building further on your statistics foundations by introducing the Central Limit Theorem and confidence intervals. Some of the key takeaways include:
- The Central Limit Theorem states that often, independent random variables summed together will converge to a normal distribution as the number of variables increases
- Using the Central Limit Theorem, we can work with non-normally distributed data sets as if they were normally distributed
- The Standard Error is a measure of spread - it is the standard deviation of samples from the sample mean
- If you take repeated samples and compute the 95% confidence interval for a given parameter for each sample, 95% of the intervals would contain the population parameter.
- The
$z$ -critical value is the number of standard deviations you'd have to go from the mean of the normal distribution to capture the proportion of the data associated with the desired confidence level. - If you don't know the standard deviation for a population, you need to use t-distributions to compute the margin of error for calculating a confidence interval.