Git Product home page Git Product logo

openfoodfacts's Introduction

OpenFoodFacts data analysis

The repository you are looking as is a data science project. The main content is a Python notebook that handles data loading, transformation,

Installation

The additional libraries on top of the standard distribution of Python 3.* necessary are:

Project Motivation

For this project, I was interestested in analysing some features of the OpenFoodFacts data available from https://world.openfoodfacts.org/data and understand the geographical information and nutritional information that it contains.

I was also interested in combining this information with additional data sources, such as the worldbank life eexpectancy data. I wanted to identify if the product selection per country and nutritional value (measured through the nutri-score) was in any way correlated with the health of the population (measured through life expectancy).

File Descriptions

The main file in the repository is openfoodfacts.ipynb which is the python notebook to reproduce the calculations and generate the visualisations. The first set of notebook cells will download additional data not included in this repository. These have been marked in the git exclusion list in the .gitignore file.

Results

The main findings of the code can be found at the post available here.

The notebook shows that the majority of the openfoodfacts database is made of products sold in France. Products available in other countries are present in the database but with a much lower representativity. Quality of data is not exceptional, probably mainly due to to the crowdsourcing as the primary creation mechanism of new data.

This project also shows that it is possible to calculate the nutri-score using a random forest classifier with almost 90% accuracy based only on nutrients information. Finally, the data available does not show a correlation between the nutri-score of products sold in a country and life expectancy in the same country. This does not mean the two statistics are not correlated, but the quality and quantity of data available was not sufficient to demonstrate any link.

Licensing, Authors, Acknowledgements

Credits go to OpenFoodFacts for making the data available and the platform for users to contribute. You can find the Licensing for the data and other descriptive information at the link available here.

Credits also go to Worldbank for making world development data available. Licensing for the worldbank datasets is available here.

Feel free to use the code from the notebook for your own purposes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.