Git Product home page Git Product logo

genai-impact / ecologits Goto Github PK

View Code? Open in Web Editor NEW
24.0 8.0 2.0 1.82 MB

๐ŸŒฑ EcoLogits tracks the energy consumption and environmental footprint of using generative AI models through APIs.

Home Page: https://ecologits.ai/

License: Mozilla Public License 2.0

Python 99.74% Makefile 0.26%
genai generative-ai green-ai green-software llm llm-inference python sustainability sustainable-ai

ecologits's People

Contributors

adrienbanse avatar aqwvinh avatar d4gtech avatar lucberton avatar samuelrince avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ecologits's Issues

Support for stream and async functions

Description

We currently have no support for "advanced" function for chat completion in async and/or in streaming.

Solution

Maybe we can look at what openllmetry does (again). ๐Ÿ˜„

Additional context

Examples for OpenAI SDK:

Streaming:

from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Say this is a test"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Async:

import os
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

async def main() -> None:
    chat_completion = await client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Say this is a test"}]
    )

asyncio.run(main())

Async + Streaming:

from openai import AsyncOpenAI

client = AsyncOpenAI()

async def main():
    stream = await client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Say this is a test"}],
        stream=True,
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="")

asyncio.run(main())

Energy consumption of using OpenAI

Description

Enable energy consumption estimation of using the chat-completions-api with the python SDK.

Solution

Create a wrapper of the OpenAI client and insert energy consumption to the response when calling the chat-completions-api.

from genai_impact import OpenAI

client = OpenAI()

response = client.chat.completions.create(
	model="gpt-3.5-turbo",
	messages=[
		{"role": "user", "content": "Hello World!"}
	]
)

print(response.impacts) 	# Outputs an impact object containing the estimated energy consumption of the query.

Considerations

Model size

Parameter count of OpenAI models is unknown, so we will need to guesstimate it in the methodology working group. For now, I propose to follow this convention:

Model name # Parameter count Additional information
gpt-3.5-turbo ~20B Potentially leaked in a paper + Similar performance compared to mistral-small.
gpt-4-turbo ~70B Similar performance compared to mistral-medium which a previous version leaked
gpt-4 ~200B ?
gpt-4-vision ? ?

Other variations of these models are considered equal in terms of the number of parameters.

Energy estimation

Based on the methodology v0, we can estimate the energy consumption of using a LLM with the following formula:

$$ Energy(model\_size, output\_tokens) = A * model\_size * output\_tokens $$

With:

  • $A=1.17e-4\ Wh$ a constant ;
  • $model\_size$ the number of model parameters counted in billions ;
  • $output\_tokens$ the number of tokens generated.

Open discussion on licenses

Description

As stated in the last weekly meeting, let's take the opportunity of the GitHub organization migration to review the license we use. Currently, we use the default from Data For Good which is MIT, a very permissive license.

The main issue I have with a very permissive license like MIT is that the distribution or modification of the library can be done without using the same license.

Example: a third party using a modified version of EcoLogits has no obligation to distribute and open source the modified version.

This as at least two consequences:

  • The third party is not encouraged (forced) to contribute to the open-source project; thus modifications are not available for everyone to enjoy.
  • The third party can use a degraded version of the library without having to specify it. For instance, impacts calculation could be altered and lead to wrong results, without the end users knowing that it is not running the original EcoLogits library. This reduces the transparency of the methodology and impact reporting when used in other projects.

Solution

A solution to this issue is to use a copyleft license that can force an individual or a company to redistribute the modifications that have been made, for instance. Here is a list of the well-known copyleft licenses that we could use, with their pros and cons.

LGPL 3.0

This license is mainly applied to libraries. You may copy, distribute and modify the software provided that modifications are described and licensed for free under LGPL. Derivatives works (including modifications or anything statically linked to the library) can only be redistributed under LGPL, but applications that use the library don't have to be.

Sources: tldrlegal.com, fossa.com

Well-known license that is especially designed for libraries. In theory, anyone can install and use the library in any project (including proprietary and commercial) without releasing the project under the same license. One caveat is that it is usually a red flag for some people / companies to have dependencies with GPL-like code because GPLv3 or AGPLv3 licenses force you to redistribute your code under a GPLv3 compatible license (even if you use one small function in an enormous stack).

Another implication of this license is the ability for the end-user to be able to replace the library by another one. Meaning, a company that uses EcoLogits in its dashboard should provide the user an option to use an equivalent library to EcoLogits in replacement for the computation of the impacts, for instance. This can be a burden and it is probably another reason why the LGPLv3 is not commonly liked.

MPL 2.0

MPL is a copyleft license that is easy to comply with. You must make the source code for any of your changes available under MPL, but you can combine the MPL software with proprietary code, as long as you keep the MPL code in separate files. Version 2.0 is, by default, compatible with LGPL and GPL version 2 or greater. You can distribute binaries under a proprietary license, as long as you make the source available under MPL.

Sources: tldrlegal.com, fossa.com

Less-known but actively used license created by Mozilla. The main difference I see with MPL 2.0 compared to LGPL 3.0 is that this license is completely permissive when you only use the library. This is true when the library is in its own separate file, which is always the case in Python if you "pip install" the library. I think the specification of the separate file is used for low-level programming languages. Otherwise, if you modify the library you are required to make it available with the same license

Other licenses

There are of course other licenses available that are copyleft or not. I have only listed the two I consider for this project. We should consider that it is possible to change the license again in the future if required. It is generally poorly regarded to change to a more restrictive license, but ok or encouraged to go for a more permissive license.

Other considerations

Another issue that I have mentioned it is the commercial exploitation of the library with no or low added value. This a very classical issue with open-source projects, and two solutions are possible:

  • Use a very restrictive license that makes the software only usable in other open-source projects, that can be an issue if we want to address organizations that produce proprietary code (which we want).
  • Use dual-licensing 1. Open-source license that is very restrictive and 2. A business license to sell to companies. This is not very doable as a non-profit (or not easily), plus it is usually viewed as not an open-source practice, but more as a bait to use paid software.

So, we will not try to address this issue here and with a license. We need to find other innovative ways to fund the project and encourage companies to do so.

Other resources

As you @LucBERTON @aqwvinh @adrienbanse @AndreaLeylavergne have contributed to the project (made a commit or created a file) you are concerned by this change and I would be glad to hear your opinion on this.

Package documentation

Description

Write a clear documentation explaining how to install and use the package.

Solution

  • update the README.me file
  • create a full package documentation

Considerations

Technology and format for package documentation to be determined.
e.g. :

Additional context

N/A

Unit tests

Description

Implement unit tests in this package.

Solution

All features and cases should be explicitly verified through unit testing

Considerations

As a reference :
https://github.com/dataforgoodfr/12_genai_impact/blob/main/tests/test_compute_impacts.py

import pytest

from genai_impact.compute_impacts import compute_llm_impact


@pytest.mark.parametrize('model_size,output_tokens', [
    (130, 1000),
    (7, 150)
])
def test_compute_impacts_is_positive(model_size: float, output_tokens: int) -> None:
    impacts = compute_llm_impact(model_size, output_tokens)
    assert impacts.energy >= 0

Additional context

N/A

Package install with optional dependencies

Description

Allow to install this package with extra/optional dependencies.
The goal is that users can install only useful dependencies for their usecase, hence reducing install size.

Solution

This package could be installed only with some dependancies.

e.g. Package with with mistral client and not OpenAI

https://python-poetry.org/docs/pyproject#extras

poetry install --extras "mysql pgsql"
poetry install -E mysql -E pgsql

Considerations

Should work for pip and poetry package managers.

Additional context

N/A

Add Google generative AI models

Description

Add Google generative AI models, e.g. Gemini.

Solution

Google has it's own Python package for its generative AI models.

Python package: google-generativeai

Documentation:

Code example:

import google.generativeai as genai
genai.configure(api_key=GOOGLE_API_KEY)

# Print models
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

model = genai.GenerativeModel('gemini-pro')

# Non-streamed response
response = model.generate_content("What is the meaning of life?")

# Streamed response
streamed_response = model.generate_content("What is the meaning of life?", stream=True)
for chunk in streamed_response:
  print(chunk.text)
  print("_"*80)

Implement methodology v1

Description

Implement the new release of the methodology that includes impacts (gwp, adpe, pe) and multisteps (usage + embodied)

Methodology v1 is on notion

Solution

Update the comupte llm impacts function.

Recurrent `ModuleNotFoundError`

It's probably very basic but any idea why I have to run poetry install --all-extras --with dev,docs everytime I make a change in the code to be able to run my __main__.py script? Otherwise I very often get ModuleNotFoundError: No module named 'genai_impact'.

A new name for gen AI impact package

Description

We need to find a fancy and easy to remember package name :)

Solution

List and discuss package name ideas.

Considerations

Should be easy to remember.
Should be in line with Data for good values and ethical standards.

Additional context

N/A

Support regional energy mixes

For now, we only use the world mix to compute GWP, ADPE and PE:

IF_ELECTRICITY_MIX_GWP = 5.90478e-1     # kgCO2eq / kWh (World)
IF_ELECTRICITY_MIX_ADPE = 7.37708e-8    # kgSbeq / kWh (World)
IF_ELECTRICITY_MIX_PE = 9.988           # MJ / kWh (World)

But we know that some providers only have servers in a specific region (e.g. OpenAI in the US).

We could add a column in data/models.csv indicating a zone (country, continent, ...), and then query another .csv (or query some database in another way) to ask for an average energy mix of the zone, or keep the World mix if unspecified.

Unit test GitHub workflow

Description

Create a new GitHub workflow to run unit tests

Solution

Add a new workflow in .github to run tests with pytest.

Add Cohere provider

Description

Add Cohere LLM provider.

Solution

Cohere has its own API and python client.

Available models: https://docs.cohere.com/docs/models

Code example:

import cohere

client = cohere.Client('<<apiKey>>')

response = client.chat(
	chat_history=[
		{"role": "USER", "message": "Who discovered gravity?"},
		{"role": "CHATBOT", "message": "The man who is widely credited with discovering gravity is Sir Isaac Newton"}
	],
	message="What year was he born?",
	connectors=[{"id": "web-search"}]
)

print(response)

Automatic model update for Hugging Face

Description

We've recently added support for Hugging Face Inference Endpoints through the huggingface_hub python package. We now need to reference models that are available on Hugging Face (model name, number of parameters).

Solution

Possible solutions:

  • Scrapping at runtime (possibly not a good idea)
  • Telemetry (update in a github action triggered by package telemetry?)
    ...

Ping @aqwvinh

Dependancies are too restrictive

Describe the bug

Hello, I'd like to use EcoLogits in a project using LangChain. But it's impossible because EcoLogits prevent to use packaging>=25.0

If you did not need specific version of a package, I think it's better to remain open to all versions.

To Reproduce
Steps to reproduce the behavior:
poetry add ecologits langchain-core

Because no versions of ecologits match >0.1.5,<0.2.0
 and ecologits (0.1.5) depends on packaging (>=24.0,<25.0), ecologits (>=0.1.5,<0.2.0) requires packaging (>=24.0,<25.0).
Because langchain-core (0.1.52) depends on packaging (>=23.2,<24.0)
 and no versions of langchain-core match >0.1.52,<0.2.0, langchain-core (>=0.1.52,<0.2.0) requires packaging (>=23.2,<24.0).
Thus, ecologits (>=0.1.5,<0.2.0) is incompatible with langchain-core (>=0.1.52,<0.2.0).
And because langchain-experimental (0.0.58) depends on langchain-core (>=0.1.52,<0.2.0)
 and no versions of langchain-experimental match >0.0.58,<0.0.59, ecologits (>=0.1.5,<0.2.0) is incompatible with langchain-experimental (>=0.0.58,<0.0.59).
So, because fiscal-qa depends on both langchain-experimental (^0.0.58) and ecologits (^0.1.5), version solving failed.

Decorator mode vs client wrapper

Description

Investigate ways to implement this package through wrappers or decorators.

Solution

Client wrappers

from openai import OpenAI
from genai_impact import OpenAI

client = OpenAI()

response = client.chat.completions.create(
	model="gpt-3.5-turbo",
	messages=[...]
)

impacts = response.impacts
print(impacts)
# Outputs: Impacts(energy=XX, energy_unit='kWh', gwp=XX ...) 

Patch decorator

from openai import OpenAI
from genai_impact import impact_estimator


@impact_estimator()
def summarize(client: OpenAI, doc: str)
		response = client.chat.completions.create(
				model="gpt-3.5-turbo",
				messages=[...]  # prompt summarize document
		)
		return response.choices[0].message.content


if __name__ == "__main__":
		client = OpenAI()
		document = "Lorem ipsum..."
		summarize(client, document)
		# After running this function the impacts are dumped in a CSV file

Maybe Patchy project could be useful :
https://pypi.org/project/patchy/

Considerations

N/A

Additional context

N/A

This package should work with most LLM clients

Description

This package should work with most LLM clients :

  • Open AI
  • Mistral AI
  • Anyscale
  • Anthropic
  • transformers
  • Cloud APIs

Solution

The first versions of this package may focus on OpenAI client only but, future versions should allow all features to be used with any LLM clients.

Considerations

N/A

Additional context

N/A

Model characteristics dataset

Description

To compute the impacts of a query, we need some characteristics of the model that was used. Especially in the case of LLMs we need the total count of parameters.

Solution

A CSV or JSON file to store all known models and metadata like the total parameters count.

Considerations

Proprietary models

In many cases, we don't know the underlying architecture of models. thus we will need to guesstimate it (see issue #1 for OpenAI). The estimation can be based on performance achieved by this models in various leaderboards compared to open-weight models. It is crucial to keep the source of this assessment because it influences a lot the impacts.

Total parameters vs active parameters

In the case of mixture of experts models we can definie the active parameters count as the sum of all active/used parameters to run the computation. (example with mixtral).

Units

We should screen the documentation to make sure that all units are specified. For example, I couldn't find the units of the latency.

Add Amazon Bedrock provider

Description

Add Amazon Bedrock provider.

Solution

Amazon Bedrock uses its on API and Python package.

Python package: boto3

Amazon Bedrock:

Code example:

import boto3
import json
brt = boto3.client(service_name='bedrock-runtime')

body = json.dumps({
    "prompt": "\n\nHuman: explain black holes to 8th graders\n\nAssistant:",
    "max_tokens_to_sample": 300,
    "temperature": 0.1,
    "top_p": 0.9,
})

modelId = 'anthropic.claude-v2'
accept = 'application/json'
contentType = 'application/json'

response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)

response_body = json.loads(response.get('body').read())

# text
print(response_body.get('completion'))

OpenAI cassette bug

Bug description
OpenAI test fails even with the cassette, I get the following error: "FAILED tests/test_openai.py::test_openai_chat - openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"

Am I doing something wrong?

To Reproduce
Update deps poetry install --all-extras --with dev,docs
Run tests poetry run pytest

System information
OS: macOS Ventura 13.5.2

Add Perplexity provider

Description

Add perplexity.ai LLM provider.

Solution

Perplexity uses the same API as OpenAI, meaning the OpenAI python client is compatible with their service, and it only requires changing the API endpoint.

Client example: https://docs.perplexity.ai/docs/getting-started
Supported models: https://docs.perplexity.ai/docs/model-cards

We need to identify when another provider is used in the case of the OpenAI client. Plus support and register the models that they provide through their API.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.