đ What kind of environmental impacts are AI companies disclosing? (And can we compare them?) đ
In recent months, several major AI companies have released initial numbers about their models' energy, water and emissions. But since their methodologies are different (and unclear!), we can't compare the numbers that they disclose. In this blog post, we explain why that's the case and why we need more standardized approaches to measure and compare AI's environmental impacts.
Introduction
This summer, OpenAI, Mistral and Google all released analyses on the environmental impacts of their generative AI tools. All three organisations report metrics regarding the energy, water and carbon emissions of their tools, as summarized below:
Methodology đ | Type of query â | Answer length â | Energy/ query ⥠| Water/ query đ§ | COâe/query đ | |
OpenAI | Not provided | Not provided | Not provided | 0.34 Wh | 0.32176 mL | Not provided |
Mistral | LCA | Text only | 400 tokens | Not provided | 45 mL | 1.14 g |
Custom | Text only | Not provided | 0.24 Wh | 0.26 mL | 0.03 g |
While this appears to be positive for transparency, it's actually more problematic than it seems. These numbers cannot be used to meaningfully compare systems because each company employs different methodologies and assumptions. Most critically, none of the companies provide sufficient detail to enable valid comparisons between their approaches, leaving significant gaps in the information needed for informed evaluation.
In this blog post, we will describe what each organization reports and assign a score* to each company based on the clarity and soundness of its methodology, as well as the degree of transparency of their disclosures:
OpenAI
Score: D |
|
Pros
|
Cons
|
As part of his âGentle Singularityâ blog post written in June 2025, OpenAI CEO Sam Altman casually slipped in that âthe average query uses about 0.34 watt-hours [and] about 0.000085 gallons of waterâ, with this statement quickly being picked up by users and media alike. Based on estimates that ChatGPT processes 2.5 billion prompts daily , this would represent approximately 310 million kWh of energy â as much as 30,000 average US homes or up to 100,000 electric vehiclesâ annual energy use.
While this is the first official environmental metric reported by OpenAI regarding its environmental impacts, what exactly that number represents is unclear, since we donât have any details about what constitutes an average ChatGPT query in terms of input/output length or modality, as well as what model this refers to (GPT-4.5, GPT-4o, o4-mini, etc.). For this reason, while it can give us an order of magnitude with regards to ChatGPTâs energy and water use, it is impossible to compare it with other text generation models, either closed- or open-source, nor even use it to choose between different models offered by OpenAI. |
Mistral
Score: B |
|
Pros
|
Cons
|
In their first environmental report, Mistral reported that the training of Mistral Large 2 and the subsequent 18 months of usage generated 20,4 kt of COâe, consumed 281,000 m3 of water, and was responsible for 660 kg Sb eq of resource depletion. In terms of the marginal costs of inference, they also report that prompting Le Chat for a 400-token response emits 1.14 g of COâe and uses 45 mL of water and 0.16 mg of Sb eq.
While Mistral AI does not release precise numbers of daily users for Le Chat, some estimates report that it has over 4.2 million active users. If each user queried the service only once per day, this would add up to 1,748 metric tons of COâe and 68,985,000 liters of water per year, roughly equivalent to 408 gasoline-powered average cars driven for one year and 28 Olympic-sized swimming pools of water. Given that this represents only one of Mistralâs models, this doesnât represent the totality of their resource usage and emissions, nor does it show what percentage of their emissions are from training vs inference (although in a recent article on the topicâpublished under a paywallâthe company stated that, for Large 2, training represented 95% of the modelâs lifecycle emissions). While Mistralâs environmental assessment is laudable for its differentiation of upstream and marginal impacts, as well as providing more details about the characteristics of an average prompt (i.e. the length of the response), it fails to share the energy used per prompt, which is an important factor necessary to be able to compare different LLM providers and models. Also, while the authors say that âAI companies ought to publish the environmental impacts of their models using standardized, internationally recognized frameworks,â these frameworks already exist and are not used in the disclosure. For example, many international standards like ISSBâs IFRS 2 or EUâs CSRD require disclosure for total carbon emissions and energy consumption. For the latter, Mistral falls within scope as a âlarge undertakingâ (public sources put its headcount above 250 and revenue in the âŹ60mââŹ100m range) based in the EU, which means ESRS E1 would require disclosure of gross Scope 1, 2, 3 and a total GHG figure or total energy consumption in MWh as well as the energy mix used. While these directives are not specific to AI, they do provide guidance regarding the categories of impacts that companies should report - which includes metrics like total energy and total carbon, which are missing from Mistralâs report. |
Score: B- |
|
Pros
|
Cons
|
Googleâs study is the first to report environmental metrics for serving customer-facing AI tools in a production environment. Using a methodology that accounts for full system dynamic power, idle machines, and data center overhead, they estimate that the median Gemini Apps text prompt uses 0.24 Wh of energy, 0.03g CO2e, and 0.26 mL of water (compared to 0.10 Wh of energy, 0.02 gCO2e, and 0.12 mL of water based on TPU/GPU consumption alone). External estimates suggest Google Gemini averages nearly 400 million monthly visitors. Assuming each visitor submits only one query, this would add up to 35,040 MWh of energy, 4,380 metric tons of CO2e and 17 million L of water annually. To put this in perspective: In 2024, Google as an organization consumed 32,727,800 MWh of energy and 26.56 billion litres of freshwater, while reporting 11.5 million tons of CO2e in "ambition-based emissions".
Googleâs study is the first of its kind to detail interesting elements on the consumption of hardware infrastructure and the challenges of deploying generative AI tools at scale. They describe their methodology in detail and stress the importance of using similar approaches to reflect the constraints and context of in situ AI deployment, which include optimization strategies and distributed processing, as opposed to more theoretical studies that fail to take these factors into account. However, there are still several factors missing to allow for a fundamental comparison with both the other companiesâ numbers as well as studies done by researchers. Notably, similar to OpenAIâs disclosure, there is no information about what a median Gemini Apps text prompt represents and what it includes (i.e. whether it is a simple Web search or also AI summaries as well). Also, the numbers used for emissions are market-based, which does not reflect the true carbon intensity of the energy used for serving AI tools, but includes things like offsets and renewable energy credits, which have been shown to underrepresent the true emissions of energy generation. Furthermore, they only report onsite water usage, which fails to account for total water consumption, such as that used for energy generation, prompting some researchers to question the exactness of the numbers reported. Finally, given that there is no command line interface for getting real-time energy information regarding TPUs (as opposed to NVML for GPUs or RAPL for Intel CPUs), it is impossible to validate the energy numbers provided by Google, and precludes AI developers from gathering similar numbers for AI systems deployed on TPUs. |
Can we compare these numbers?
While reporting the environmental impacts of queries made to generative AI tools can be seen as a step in the right direction, the incomplete nature of these disclosures can be counterproductive at best and misleading at worst. Without disclosing more details about the methodology (e.g., what constitutes a median query, how much energy is being used, and how that energy is generated), and allowing third-party verification and validation of results, any disclosures made are intrinsically incomparable between themselves and with open-source models.
Also, any information about individual queries is hard to interpret outside of the context of cumulative usage and emissions; while this information can be useful for comparing and managing optimizations, it can never be the sole indicator used for informed decision-making. Publishing metrics regarding the impacts of individual queries without the total number of queries masks things like rebound effects and exponential increases in usage, which are liable to outweigh any potential optimization gains.
Metrics at the level of an individual query are often called intensity metrics. But publishing an intensity without publishing the total is widely considered across many industries to be greenwashing, and can lead to fines: EasyJet, for example, was banned in the UK from running ads about its gCOâ per passenger-kilometre intensity, promoting itself as the âlowest-emissionsâ airline, while its absolute emissions were rising. TotalEnergies is being sued for greenwashing for highlighting a 16.5% drop in the intensity of the energy mix sold since 2015 while planning a massive increase in oil and gas production. This is the point of intensity-only storytelling: it hides the compounding effect of scale: the intensity-per-unit can fall while your units explode, and the total balloons. For instance, in the case of Google, while they report a 33x reduction in the consumption of a "median" query, they have increased their emissions by at least 51% since 2019, as per their own ESG report.
The carbon/energy intensity of a product or service should always be accompanied by the absolute total impact â this is actually mandated by most sustainability disclosure frameworks. For instance, under the EUâs CSRD (ESRS E1), companies must disclose gross Scope 1, 2, and 3, and a total, and report a GHG-intensity metric per net revenue. EFRAG IFRS S2 likewise requires disclosure of absolute gross Scope 1, 2 and 3 emissions, measured to the GHG Protocol. And the GHG Protocol Corporate Standard makes the hierarchy explicit in practice: totals are required content, while ratio performance indicators (i.e., intensity metrics like tCOâe per kWh or per tonne) sit under optional information: supplements, not substitutes. This insistence on totals makes sense because COâ is a stock pollutant: global warming scales roughly linearly with cumulative COâ according to the IPCC, so the constraint is a carbon budget, not a yearly flow.
While OpenAI and Google both focus on text-based queries, much of the usage of systems such as ChatGPT and Gemini is increasingly multi-modal, with users prompting systems to interpret or generate images. These are much more energy- and compute-intensive compared to text-only queries (Luccioni et al, 2024). The increased popularity of video generation is equally worrisome, since this represents orders of magnitude more energy usage compared to image and text, and yet we have no information regarding these systems and usages (apart from our own initial work on the subject).
Finally, as highlighted in Mistralâs environmental impact report, standardization is the name of the game towards âhelping buyers and users identify the least carbon-, water- and material-intensive modelsâ*, *and we have to continue building upon existing corporate disclosure standards such as the Corporate Sustainability Reporting Directive, and bridging the gap between these and AI-specific initiatives such as the AI Energy Score project. In order to do so, we need standard metrics that can be reported for AI systems with similar functionalities (e.g. text-only chatbots or video-generation tools), as well as a formal definition of input and output constraints to enable meaningful comparisons. Developing tools that allow measuring the energy consumption of different types of hardware, including GPUs and TPUs, is important to let AI developers contribute to this effort, and government regulation can be used to incentivize these efforts, especially in places like the EU, where there are AI-specific regulations already in place.
In conclusion, we find that more work needs to be done to collectively move forward towards standardized and transparent reporting on the environmental impacts of AI-based tools and services, and to move beyond the current fragmented state of reporting.
*The scores that we provide are subjective and based on our personal opinions regarding the disclosures made.
Citation:
@inproceedings{ai_environmental_disclosures,
author = {Sasha Luccioni and
Theo Alves da Costa},
title = {What kind of environmental impacts are AI companies disclosing? (And can we compare them?)},
booktitle = {Hugging Face Blog},
year = {2025},
url = {https://huggingface.co/blog/sasha/environmental-impact-disclosures},
}
Thank you to Boris Gamazaychikov for providing relevant information and Brigitte Tousignant for editorial feedback and suggestions.