Git Product home page Git Product logo

environmental-footprint-data's Introduction

Boavizta Project - Environmental Footprint Data

This data repository is maintained by Boavizta and is complementary to Boavizta's environmental footprint evaluation methology. It aims to reference as much data as possible to help organizations to evaluate the environmental footprint of their information systems, applications and digital services.

Boavizta database is quite exclusively derived from PCF (Product Carbon Footprint) sheets provided by the manufacturers. Methodologies used by manufactureres are not transparent and have very large margins of error and the purpose of making these data available is mainly to give ideas of orders of magnitude and to compare different models from the same manufacturer.

Therefore WE RECOMMAND NOT USING THESE DATA TO MAKE ACCURATE IMPACTS EVALUATIONS or to compare the impacts of devices from different manufacturers.

In addition, most manufacturers rely on the PAIA evaluation method developed by MIT. This method is based on data from non-public studies and Boavizta was therefore unable to evaluate its relevance.

To browse data, you can use https://dataviz.boavizta.org.

License

This dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/This data can be freely used for any purpose including without using Boavizta's methodology.

Data sets

At this time, we provide two CSV files grouping together data collected from manufacturers (mainly Product Carbon Footprint reports) publicly avaiblable :

  • boavizta-data-fr.csv: French version (; used as a delimiter, comma as a decimal separator)
  • boavizta-data-us.csv: English version (, used as a delimiter, dot as a decimal separator)

We encourage all manufacturers to provide us with similar data or to correct potential errors in these files.

Boavizta working group works actively to enrich these files with new data :

  • from manufacturers
  • resulting from its analyzes and intended to provide ratios or average values that would simplify the evaluation

Please refer to sources.md for a complete list of sources.

Contribute

People are encouraged to contribute to these files.

You can easily contribute by :

If any manufacturers wish to share data with us, we will be happy to discuss with them how we can efficiently synchronize this data.

Running the code

Download Chromedriver for your version of Chrome: https://chromedriver.chromium.org/downloads and move it to a folder that belongs to your path. For Mac you can also run

brew install chromedriver

and restart Chrome.

Then create a python3.9 virtual environment, run

pip install -r tools/requirements.txt

to install the required packages and follow the instructions on the spiders README.md to run a spider and parse the pdfs of the associated brand.

When developing a new parser you can also follow the instructions on the parsers README.md.

Data format

  • manufacturer: Manufacturer name, e.g. "Dell" or "HP"
  • name: Product name
  • category:
    • Workplace: product commonly used in a workplace
    • Datacenter: product commonly used in a data center (e.g. server, network switch, etc.)
  • gwp_total: GHG emissions (estimated as CO2 equivalent, the unit is kgCO2eq) through the total lifecycle of the product (Manufacturing, Transportation, Use phase and Recycling)
  • gwp_use_ratio: part of the GHG emissions coming from the use phase (the hypothesis for this use phase are detailed in the other columns, especially the lifetime and the use_location)
  • yearly_tec: Yearly estimated energy demand in kWh
  • lifetime: Expected lifetime (in years)
  • use_location: The region of the world in which the device usage footprint has been estimated.
    • US: United States of America
    • EU: Europe
    • DE: Germany
    • CN: China
    • WW: Worldwide
  • report_date: the date at which the Product Carbon Footprint report of the device was published
  • sources: the original URLs from which the data for this row was sourced
  • gwp_error_ratio: the datasheets commonly come with a diagram that shows the error margin for the footprint
  • gwp_manufacturing_ratio part of the GHG emissions coming from the manufacturing phase
  • weight: product weight in kg
  • assembly_location: The region of the world in which the device is assembled
    • US: United States of America
    • EU: Europe
    • CN: China
    • Asia: Asia
  • screen_size: in inches
  • server_type: the type of server
  • hard_drive: the hard drive of the device if any
  • memory: RAM in GB
  • number_cpu: number of CPUs
  • height: the height of the device in a datacenter rack, in U
  • added_date: the date at which this row was added
  • add_method: how was the data for this row collected

About Boavizta.org

Boavizta.org is a working group:

  • Working to improve and generalize environmental footprint evaluation in organizations
  • Federating and connecting stakeholders of the "environmental footprint evaluation" ecosystem
  • Helping members to improve their skills and to carry out their own projects
  • Leveraging group members initiatives

environmental-footprint-data's People

Contributors

airloren avatar boavizta-gh-api avatar bpetit avatar elenaaab avatar nitot avatar pabluk avatar pcorpet avatar redapengam avatar sbaudoin avatar vincentvillet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

environmental-footprint-data's Issues

Unify location names

Same locations are sometimes spelled with long names (China) or as two letters (CN). This needs to be unified.

HP vs HPE

Hewlett Packard Inc. (HP) and Hewlett Packard Enterprise (HPE) are two separate legal entities, and have been since 2015.

It would therefore be better if the products were correctly designated as such in the list.

Automate monitoring of all manufacturers

Monitoring of all manufacturers webpages could be automated with GH actions to :

  • run spiders to regularly check for new PDFs to analyse and launch parsers if needed
  • run generate-gh-pr.py to generate Pull Requests for new devices to add to the database

Warning : Some improvement could be needed on enerate-gh-pr.py as it was not tested since Novembre 2021.

Erroneous Dell's subcategory

Dell's parser assumes that 'Precision' models are Desktop whereas there also exists Precision laptops.

In ecodiag, I extract the sub-categories from the main html file itself rather than from the PCF file.

Add manufacturing breakdown details

Most PCF files provides breakdown details for the manufacturing part. They are, however, not always fully consistent on the partitioning. Here is the list I ended up on ecodiag's side:

  • packaging (PAIA)
  • chassis (PAIA desktop+laptops, a very very few HP monitors -> their mistake ?)
  • mainboard (PAIA+HPE)
  • daughterboard (HPE)
  • power supply unit (PAIA)
  • HDD (PAIA)
  • SSD (PAIA+HPE)
  • optical drive (PAIA)
  • display (PAIA laptops & AiO, but also a very few HP monitors -> their mistake ?)
  • battery (PAIA)
  • housing (PAIA monitors)
  • electronics (PAIA monitors + wise-thin-client)
  • panel (PAIA monitors)
  • assembly (Dell wise-thin-client, a very few HP laptops, many Lenovo laptops, HPE)
  • materials (Dell wise-thin-client, a very few HP laptops)
  • LCD assembly (1 dell + 2 HP laptops)
  • PWBs (2 HP laptops)
  • Integrated circuits (1 dell + 3 HP laptops)
  • chassis+PSU (HPE)
  • others (various HP)

This long list is conservative, but that's a lot ! So maybe some components could be merged together ?

For instance, when the PSU is combined with the chassis, maybe we could just put it to "others" since this does not provide much information.

Some other propositions:

  • Merge housing and chassis (their use is exclusive)
  • Merge display and panel (their use is exclusive)
  • Merge mainboard, daughterboard, IC and PWB within electronics ?

Check memory unit

The memory attribute is expected to be a float number in GB. This means that 1) the GB must be removed, and 2) that the parsers have to guaranty that they parsed a number in the right unit, that is not the case yet.

Unify added date format

Initial parsing date format is 01-11-2020 and manually added rows are on the same format but Auto parsers are on a different format (2022-10-18).
I think it would be easier to change Initial parsing and manually added rows. It will avoid to modify all spiders.

Unify screen size unit

For Apple's smartphones and the likes, screen_size corresponds to the screen resolution in pixels, whereas for monitors and laptops it corresponds to inches.

Monitoring new data sources on Internet

The objective is to create a monitoring tool to detect publication of unknown product environmental footprint reports.
We could regularly search for specific keywords and alert when new reports are found.

Search could be build with a combination of :

  1. Name of manufacturer not already monitored by Boavizta's spiders (ex: ibm, cisco, samsung...)
  2. Typical keywords found in known PCF (ex: PAIA, Product Carbon Footprint, Product Environmental Report, kgCO2eq, kgCO2e)
  3. Typical filetype of document (ex: "type:pdf")

Add a script to automatically merge multiple .csv files and deal with duplicates

We need a dedicated tool to merge merge multiple .csv files while detecting and merging duplicates.

I've started to implement it through a new static method of DeviceCarbonFootprint:

@staticmethod
    def merge(device1: 'DeviceCarbonFootprint', device2: 'DeviceCarbonFootprint',
              conflict: Literal['keep2nd','interactive'] = 'keep2nd', verbose: bool = False) -> 'DeviceCarbonFootprint':

and a merge_csv.py file1 file2 standalone script written on top of the above merge function.

By default, priority is given to device2/file2.

Conflicts are detected only for attributes that provided for both devices and when they are clearly different. If they are close enough, then merge only print a warning in verbose mode.

Then, there are two modes to resolve the conflicts:

  1. Simply keep device2 (and print the differences in verbose mode)
  2. Ask the user which version should be kept.

TODO:

  1. Add a non-regression mode only testing that device2 is consistent with device1 and that device1 does not contain more information.
  2. Cleanup and unify some entries prior to fusion to avoid false negative (i.e., CN versus China, issue #64)
  3. Find a way to deal with PCF files reporting the same model name whereas they are not the same (in ecodiag I also extract the model name from the main html files)

Names do not match model names in the device system

Hello,
We encounter issue when using this database as the name did not always fit the model name in the device system.
Example:
In this database --> EliteBook ...
In the device registry --> HP EliteBook ...

It make it harder for the automation as all inventory softwares use the device registry to get that information.

Is this issue known?

Outdated/incorrect data

Hi,
First of all, thank you for this very useful data !

Some of the data reported in the csv file do not correspond to the data in the sources. For example,
HP ProLiant DL360 Gen10 server reports a gwp_total of 1710 kgeqCO2 (with 77% of this caused by the server usage). However, the corresponding source document (https://assets.ext.hpe.com/is/content/hpedam/a50002430enw) reports 6270 kgeqCO2 (with 87% caused by the server usage).

Some other HP server have the same problem (eg. "ProLiant ML30 Gen10 server", "ProLiant DL160 Gen10 server"). HPE probably updated their datasheets.

License

Hi and thanks for sharing this project :)

It is currently without license, so it would be difficult for people to contribute to it.

There is this sentence in the README:

This data can be freely used for any purpose including without using Boavizta's methodology.

Then I'd advise you to use a creative common public license. If you agree, I can make the PR.

Thanks and have a nice day!

New Spider and parser for HPE

HPE hardware PCF documents could be downloaded here

Spider should :

  • retrieve all PDF links :
    • get number of documents
    • create empty list of links
    • While number of links < number of documents
      • for all links (href) on element with class="gsr-result-head-link"
        • Click link
        • Get link (href) on element with id="downloadPdfLink"
        • add link to list of links
      • if number of links < number of documents
        • Go to next result page by clicking element with class="gsr-pagination-button next"
  • launch HPE parser

Parser could be build based on existing HP Workplace parser.
No need for OCR to analyse pie charts as all data is available as text.

New spider and parser for Apple

Apple hardware PCF documents could be downloaded here

Spider should get all pdf links on the page as tools/monitoring/apple_check.py does and simply launch the parser for each of these links.

Parser could be build based on existing parsers such as tools/parsers/hp_workplace.py
ECODIAG parser could also be used to find all needed regex.

Duplicated entries

Hello,
I found 4 duplicates in this database:

  • Apple Watch SE 44mm Aluminum Case with Sport Band --> L40 and L41
  • Apple Watch Series 8 45mm Aluminum Case with Sport Band --> L42 and L43
  • Apple Watch Ultra (GPS + Cellular) Titanium Case with Ocean Band --> L44 and L45
  • iPad (9th generation) Wi-Fi + Cellular with 64GB --> L50 and L51

Add multi criteria impact data from available LCA

Currently the database only focuses on Carbon footprint wheras other impacts such as Abiotic Depletion, Primary Energy , Water, Human toxicity should be assessed and are available in several Life Cycle Assessments provided by manufacturers.
We already identified the following :

Improve tools documentation

README files should:

  • list all prerequisites to run parsers and spiders
  • explain how to run parsers in standalone
  • explain how to run spiders

Nvidia numbers

If you find any data about Nvidia GPUs, can you please let me know? This is something I'm really interested in!
Thank you :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.