- Understand a new dataset.
- Process it by applying exploratory data analysis (EDA).
- Model the data using logistic regression.
- Analyze the results and optimize the model.
You will not be forking this time, please take some time to read these instructions:
- Create a new repository based on machine learning project by clicking here.
- Open the newly created repository in Codespace using the Codespace button extension.
- Once the Codespace VSCode has finished opening, start your project by following the instructions below.
Once you have finished solving the exercises, be sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.
Sociodemographic and health resource data have been collected by county in the United States and we want to find out if there is any relationship between health resources and sociodemographic data.
To do this, you need to set a target variable (health-related) to conduct the analysis.
The dataset can be found in this project folder under the name demographic_health_data.csv
. You can load it into the code directly from the link (https://raw.githubusercontent.com/4GeeksAcademy/regularized-linear-regression-project-tutorial/main/demographic_health_data.csv
) or download it and add it by hand in your repository. In this dataset you will find a large number of variables, which you will find defined here.
This second step is vital to ensure that we keep the variables that are strictly necessary and eliminate those that are not relevant or do not provide information. Use the example Notebook we worked on and adapt it to this use case.
Be sure to conveniently divide the data set into train
and test
as we have seen in previous lessons.
Start solving the problem by implementing a linear regression model and analyze the results. Then, using the same data and default attributes, build a Lasso model and compare the results with the baseline linear regression.
Analyze how
After training the Lasso model, if the results are not satisfactory, optimize it using one of the techniques seen above.
NOTE: Solution: https://github.com/4GeeksAcademy/regularized-linear-regression-project-tutorial/blob/main/solution.ipynb