boavizta / environmental-footprint-data Goto Github PK

💾 Boavizta.org Data repository

Python 100.00%

ghg-emissions footprint environment carbon-emissions carbon-footprint sustainability digital-sustainability

environmental-footprint-data's Introduction

Boavizta Project - Environmental Footprint Data

This data repository is maintained by Boavizta and is complementary to Boavizta's environmental footprint evaluation methology. It aims to reference as much data as possible to help organizations to evaluate the environmental footprint of their information systems, applications and digital services.

Boavizta database is quite exclusively derived from PCF (Product Carbon Footprint) sheets provided by the manufacturers. Methodologies used by manufactureres are not transparent and have very large margins of error and the purpose of making these data available is mainly to give ideas of orders of magnitude and to compare different models from the same manufacturer.

Therefore WE RECOMMAND NOT USING THESE DATA TO MAKE ACCURATE IMPACTS EVALUATIONS or to compare the impacts of devices from different manufacturers.

In addition, most manufacturers rely on the PAIA evaluation method developed by MIT. This method is based on data from non-public studies and Boavizta was therefore unable to evaluate its relevance.

To browse data, you can use https://dataviz.boavizta.org.

License

This dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/This data can be freely used for any purpose including without using Boavizta's methodology.

Data sets

At this time, we provide two CSV files grouping together data collected from manufacturers (mainly Product Carbon Footprint reports) publicly avaiblable :

boavizta-data-fr.csv: French version (; used as a delimiter, comma as a decimal separator)
boavizta-data-us.csv: English version (, used as a delimiter, dot as a decimal separator)

We encourage all manufacturers to provide us with similar data or to correct potential errors in these files.

Boavizta working group works actively to enrich these files with new data :

from manufacturers
resulting from its analyzes and intended to provide ratios or average values that would simplify the evaluation

Please refer to sources.md for a complete list of sources.

Contribute

People are encouraged to contribute to these files.

You can easily contribute by :

forking this repo and submitting PRs
sending us an email to [email protected]
submitting data through dedicated form on Boavizta's website

If any manufacturers wish to share data with us, we will be happy to discuss with them how we can efficiently synchronize this data.

Running the code

Download Chromedriver for your version of Chrome: https://chromedriver.chromium.org/downloads and move it to a folder that belongs to your path. For Mac you can also run

brew install chromedriver

and restart Chrome.

Then create a python3.9 virtual environment, run

pip install -r tools/requirements.txt

to install the required packages and follow the instructions on the spiders README.md to run a spider and parse the pdfs of the associated brand.

When developing a new parser you can also follow the instructions on the parsers README.md.

Data format

manufacturer: Manufacturer name, e.g. "Dell" or "HP"
name: Product name
category:
- Workplace: product commonly used in a workplace
- Datacenter: product commonly used in a data center (e.g. server, network switch, etc.)
gwp_total: GHG emissions (estimated as CO2 equivalent, the unit is kgCO2eq) through the total lifecycle of the product (Manufacturing, Transportation, Use phase and Recycling)
gwp_use_ratio: part of the GHG emissions coming from the use phase (the hypothesis for this use phase are detailed in the other columns, especially the lifetime and the use_location)
yearly_tec: Yearly estimated energy demand in kWh
lifetime: Expected lifetime (in years)
use_location: The region of the world in which the device usage footprint has been estimated.
- US: United States of America
- EU: Europe
- DE: Germany
- CN: China
- WW: Worldwide
report_date: the date at which the Product Carbon Footprint report of the device was published
sources: the original URLs from which the data for this row was sourced
gwp_error_ratio: the datasheets commonly come with a diagram that shows the error margin for the footprint
gwp_manufacturing_ratio part of the GHG emissions coming from the manufacturing phase
weight: product weight in kg
assembly_location: The region of the world in which the device is assembled
- US: United States of America
- EU: Europe
- CN: China
- Asia: Asia
screen_size: in inches
server_type: the type of server
hard_drive: the hard drive of the device if any
memory: RAM in GB
number_cpu: number of CPUs
height: the height of the device in a datacenter rack, in U
added_date: the date at which this row was added
add_method: how was the data for this row collected

About Boavizta.org

Boavizta.org is a working group:

Working to improve and generalize environmental footprint evaluation in organizations
Federating and connecting stakeholders of the "environmental footprint evaluation" ecosystem
Helping members to improve their skills and to carry out their own projects
Leveraging group members initiatives

environmental-footprint-data's People

Contributors

Stargazers

Watchers

environmental-footprint-data's Issues

Merge Boavizta and Ecodiag parsers

Ecodiag collects data with similar parsers as Boavizta :
https://gitlab.inria.fr/guenneba/ecodiag-data/-/tree/main/pyscripts

Boavizta parsers could be improved at least by using Ecodiag reverse_piechart code to improve all parsers using OCR.

Then Boavizta and Ecodiag parser results could be compared to identify other improvements.

More Apple numbers available

I think you are missing some values for Apple products, there are some in this report:
https://www.apple.com/environment/pdf/products/desktops/Mac_Pro_PER_Dec2019.pdf

3.5GHz (8-core) processor, Radeon Pro 580X, 32GB memory, and 256GB storage
2765 kg CO2e
2.5GHz (28-core) processor, dual Radeon Pro Vega II Duo with Infinity Fabric Link,
1.5TB memory, Afterburner card, and 4TB storage
6994 kg CO2e

Amazing work!

Add IBM PCF data to the database

The following IBM PCF could be added to the database
IBM Power E1080 https://www.ibm.com/downloads/cas/VGL0LLMZ
IBM z16 https://www.ibm.com/downloads/cas/KLMA1MPR
IBM LinuxONE Emperor 4 https://www.ibm.com/downloads/cas/2JBPXBMK

Unify location names

Same locations are sometimes spelled with long names (China) or as two letters (CN). This needs to be unified.

HP vs HPE

Hewlett Packard Inc. (HP) and Hewlett Packard Enterprise (HPE) are two separate legal entities, and have been since 2015.

It would therefore be better if the products were correctly designated as such in the list.

Automate monitoring of all manufacturers

Monitoring of all manufacturers webpages could be automated with GH actions to :

run spiders to regularly check for new PDFs to analyse and launch parsers if needed
run generate-gh-pr.py to generate Pull Requests for new devices to add to the database

Warning : Some improvement could be needed on enerate-gh-pr.py as it was not tested since Novembre 2021.

Add non regression testing on parsers

non regression testing should be implemented to automatically check if parsers results are the same before and after commit.

Erroneous Dell's subcategory

Dell's parser assumes that 'Precision' models are Desktop whereas there also exists Precision laptops.

In ecodiag, I extract the sub-categories from the main html file itself rather than from the PCF file.

wrong manufacturer name in boavizta-data-us.csv

At line 451
the correct name for manufacturer is fujitsu instead of fujistu

Add manufacturing breakdown details

Most PCF files provides breakdown details for the manufacturing part. They are, however, not always fully consistent on the partitioning. Here is the list I ended up on ecodiag's side:

packaging (PAIA)
chassis (PAIA desktop+laptops, a very very few HP monitors -> their mistake ?)
mainboard (PAIA+HPE)
daughterboard (HPE)
power supply unit (PAIA)
HDD (PAIA)
SSD (PAIA+HPE)
optical drive (PAIA)
display (PAIA laptops & AiO, but also a very few HP monitors -> their mistake ?)
battery (PAIA)
housing (PAIA monitors)
electronics (PAIA monitors + wise-thin-client)
panel (PAIA monitors)
assembly (Dell wise-thin-client, a very few HP laptops, many Lenovo laptops, HPE)
materials (Dell wise-thin-client, a very few HP laptops)
LCD assembly (1 dell + 2 HP laptops)
PWBs (2 HP laptops)
Integrated circuits (1 dell + 3 HP laptops)
chassis+PSU (HPE)
others (various HP)

This long list is conservative, but that's a lot ! So maybe some components could be merged together ?

For instance, when the PSU is combined with the chassis, maybe we could just put it to "others" since this does not provide much information.

Some other propositions:

Merge housing and chassis (their use is exclusive)
Merge display and panel (their use is exclusive)
Merge mainboard, daughterboard, IC and PWB within electronics ?

Check memory unit

The memory attribute is expected to be a float number in GB. This means that 1) the GB must be removed, and 2) that the parsers have to guaranty that they parsed a number in the right unit, that is not the case yet.

add new available data from Samsung

New multicriteria data is available on https://images.samsung.com/is/content/samsung/assets/latin_en/sustainability/environment/environment-data/2022_Life-Cycle_Assessment_for_HHP_220613.pdf and could be added to the database

🙏 Please use comma for boavizta-data-fr.csv (instead of semicolons)

❔ Context

Actually, the separator used is ; while using , on GitHub gives a very nice preview.
It is done this way on boavizta-data-us.csv :

🙏 Action

Please use , on boavizta-data-fr.csv

Where is the data for calculating the cloud impact?

I see this repository contains manufacturer data, but what about the data for the multicritera cloud instances impacts in the api?

Unify added date format

Initial parsing date format is 01-11-2020 and manually added rows are on the same format but Auto parsers are on a different format (2022-10-18).
I think it would be easier to change Initial parsing and manually added rows. It will avoid to modify all spiders.

Unify screen size unit

For Apple's smartphones and the likes, screen_size corresponds to the screen resolution in pixels, whereas for monitors and laptops it corresponds to inches.

Monitoring new data sources on Internet

The objective is to create a monitoring tool to detect publication of unknown product environmental footprint reports.
We could regularly search for specific keywords and alert when new reports are found.

Search could be build with a combination of :

Name of manufacturer not already monitored by Boavizta's spiders (ex: ibm, cisco, samsung...)
Typical keywords found in known PCF (ex: PAIA, Product Carbon Footprint, Product Environmental Report, kgCO2eq, kgCO2e)
Typical filetype of document (ex: "type:pdf")

New data from Framework Laptop and VR Headsets

Hello,

Framework Laptops published their LCA : https://downloads.frame.work/resources/Framework-Life-Cycle-Report.pdf
And CEPIR published data for VR Headsets in the ADEME footprint dataset : https://base-empreinte.ademe.fr/documentation/base-impact?idDocument=167

Could it be interesting to add this into boavizta ?

Thx

Add a script to automatically merge multiple .csv files and deal with duplicates

We need a dedicated tool to merge merge multiple .csv files while detecting and merging duplicates.

I've started to implement it through a new static method of DeviceCarbonFootprint:

@staticmethod
    def merge(device1: 'DeviceCarbonFootprint', device2: 'DeviceCarbonFootprint',
              conflict: Literal['keep2nd','interactive'] = 'keep2nd', verbose: bool = False) -> 'DeviceCarbonFootprint':

and a merge_csv.py file1 file2 standalone script written on top of the above merge function.

By default, priority is given to device2/file2.

Conflicts are detected only for attributes that provided for both devices and when they are clearly different. If they are close enough, then merge only print a warning in verbose mode.

Then, there are two modes to resolve the conflicts:

Simply keep device2 (and print the differences in verbose mode)
Ask the user which version should be kept.

TODO:

Add a non-regression mode only testing that device2 is consistent with device1 and that device1 does not contain more information.
Cleanup and unify some entries prior to fusion to avoid false negative (i.e., CN versus China, issue #64)
Find a way to deal with PCF files reporting the same model name whereas they are not the same (in ecodiag I also extract the model name from the main html files)

Create Philips spiders and parsers

All Philips PCF are available here: https://www.philips.fr/c-w/search.html#q=Philips%20Product%20Carbon%20Footprint&cq=%40ps_contenttype_key%3C%3Eproduct

Lenovo parser should be a good starting point to create Philips one.

Names do not match model names in the device system

Hello,
We encounter issue when using this database as the name did not always fit the model name in the device system.
Example:
In this database --> EliteBook ...
In the device registry --> HP EliteBook ...

It make it harder for the automation as all inventory softwares use the device registry to get that information.

Is this issue known?

Outdated/incorrect data

Hi,
First of all, thank you for this very useful data !

Some of the data reported in the csv file do not correspond to the data in the sources. For example,
HP ProLiant DL360 Gen10 server reports a gwp_total of 1710 kgeqCO2 (with 77% of this caused by the server usage). However, the corresponding source document (https://assets.ext.hpe.com/is/content/hpedam/a50002430enw) reports 6270 kgeqCO2 (with 87% caused by the server usage).

Some other HP server have the same problem (eg. "ProLiant ML30 Gen10 server", "ProLiant DL160 Gen10 server"). HPE probably updated their datasheets.

License

Hi and thanks for sharing this project :)

It is currently without license, so it would be difficult for people to contribute to it.

There is this sentence in the README:

This data can be freely used for any purpose including without using Boavizta's methodology.

Then I'd advise you to use a creative common public license. If you agree, I can make the PR.

Thanks and have a nice day!

New Spider and parser for HPE

HPE hardware PCF documents could be downloaded here

Spider should :

retrieve all PDF links :
- get number of documents
- create empty list of links
- While number of links < number of documents
  - for all links (href) on element with class="gsr-result-head-link"
    - Click link
    - Get link (href) on element with id="downloadPdfLink"
    - add link to list of links
  - if number of links < number of documents
    - Go to next result page by clicking element with class="gsr-pagination-button next"
launch HPE parser

Parser could be build based on existing HP Workplace parser.
No need for OCR to analyse pie charts as all data is available as text.

New spider and parser for Apple

Apple hardware PCF documents could be downloaded here

Spider should get all pdf links on the page as tools/monitoring/apple_check.py does and simply launch the parser for each of these links.

Parser could be build based on existing parsers such as tools/parsers/hp_workplace.py
ECODIAG parser could also be used to find all needed regex.

Duplicated entries

Hello,
I found 4 duplicates in this database:

Apple Watch SE 44mm Aluminum Case with Sport Band --> L40 and L41
Apple Watch Series 8 45mm Aluminum Case with Sport Band --> L42 and L43
Apple Watch Ultra (GPS + Cellular) Titanium Case with Ocean Band --> L44 and L45
iPad (9th generation) Wi-Fi + Cellular with 64GB --> L50 and L51

Add multi criteria impact data from available LCA

Currently the database only focuses on Carbon footprint wheras other impacts such as Abiotic Depletion, Primary Energy , Water, Human toxicity should be assessed and are available in several Life Cycle Assessments provided by manufacturers.
We already identified the following :

Apple Vision Pro LCA

Hello,

Apple published GHG émissions from their LCA for the Apple Vision Pro : https://www.apple.com/environment/pdf/products/vision-pro/Apple_Vision_Pro_PER_Feb2024.pdf
335kg CO2e for the whole life cycle.

Interesting to add into the dataset ?

Alex

list all prerequisites to run parsers and spiders
explain how to run parsers in standalone
explain how to run spiders

Nvidia numbers

If you find any data about Nvidia GPUs, can you please let me know? This is something I'm really interested in!
Thank you :)

boavizta / environmental-footprint-data Goto Github PK

environmental-footprint-data's Introduction

Boavizta Project - Environmental Footprint Data

License

Data sets

Contribute

Running the code

Data format

About Boavizta.org

environmental-footprint-data's People

Contributors

Stargazers

Watchers

Forkers

environmental-footprint-data's Issues

❔ Context

🙏 Action

Recommend Projects

Recommend Topics

Recommend Org