diff --git a/contents/benchmarking/benchmarking.qmd b/contents/benchmarking/benchmarking.qmd index 8df0adb6..114ffe1d 100644 --- a/contents/benchmarking/benchmarking.qmd +++ b/contents/benchmarking/benchmarking.qmd @@ -32,7 +32,7 @@ This chapter will provide an overview of popular ML benchmarks, best practices f ::: -## Introduction +## Introduction {#sec-benchmarking-ai} Benchmarking provides the essential measurements needed to drive progress in machine learning and to truly understand system performance. As the physicist Lord Kelvin famously said, "To measure is to know." Benchmarks give us the ability to know the capabilities of different models, software, and hardware quantitatively. They allow ML developers to measure the inference time, memory usage, power consumption, and other metrics that characterize a system. Moreover, benchmarks create standardized processes for measurement, enabling fair comparisons across different solutions. diff --git a/contents/sustainable_ai/sustainable_ai.bib b/contents/sustainable_ai/sustainable_ai.bib index d169aaa6..1b6c67fe 100644 --- a/contents/sustainable_ai/sustainable_ai.bib +++ b/contents/sustainable_ai/sustainable_ai.bib @@ -681,3 +681,25 @@ @inproceedings{schwartzDeploymentEmbeddedEdgeAI2021 url = {https://doi.org/10.1109/icmla52953.2021.00170}, year = {2021} } + +@inproceedings{zeus-nsdi23, + author = {Jie You and Jae-Won Chung and Mosharaf Chowdhury}, + title = {Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training}, + booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)}, + year = {2023}, + isbn = {978-1-939133-33-5}, + address = {Boston, MA}, + pages = {119--139}, + url = {https://www.usenix.org/conference/nsdi23/presentation/you}, + publisher = {USENIX Association}, + month = apr +} + +@article{perseus-arxiv23, + author = {Jae-Won Chung and Yile Gu and Insu Jang and Luoxi Meng and Nikhil Bansal and Mosharaf Chowdhury}, + journal = {ArXiv preprint}, + title = {Perseus: Removing Energy Bloat from Large Model Training}, + url = {https://arxiv.org/abs/2312.06902}, + volume = {abs/2312.06902}, + year = {2023} +} diff --git a/contents/sustainable_ai/sustainable_ai.qmd b/contents/sustainable_ai/sustainable_ai.qmd index 4b880cab..261e369c 100644 --- a/contents/sustainable_ai/sustainable_ai.qmd +++ b/contents/sustainable_ai/sustainable_ai.qmd @@ -141,6 +141,16 @@ The environmental impact of data centers is not only caused by direct energy con Next to electricity usage, there are many more aspects to the environmental impacts of these data centers. The water usage of the data centers can lead to water scarcity issues, increased water treatment needs and proper wastewater discharge infrastructure. Also raw materials required for construction and network transmission pose considerable impacts on the environment. Finally, components in data centers need to be upgraded and maintained. Where almost 50 percent of servers were refreshed within 3 years of usage, refresh cycles have shown to slow down [@uptime]. Still, this generates a significant amount of e-waste which can be hard to recycle. +### Energy Optimization {#energy-optimization} + +Ultimately, measuring and understanding the energy consumption of AI facilitate the optimization of energy consumption. + +One way to reduce the energy consumption of a given amount of computational work is to run it on more energy-efficient hardware. +For instance, TPU chips can be more energy-efficient compared to CPUs when it comes to running large tensor computations for AI, as TPUs can run such computations much faster without drawing significantly more power than CPUs. +Another way is to build software systems that are aware of energy consumption and application characteristics. +Good examples are systems works such as Zeus [@zeus-nsdi23] and Perseus [@perseus-arxiv23], both of which characterize the trade-off between computation time and energy consumption at various levels of an ML training system to achieve energy reduction without end-to-end slowdown. +In reality, building both energy-efficient hardware and software and combining their benefits should be promising, along with open-source frameworks (e.g., [Zeus](https://ml.energy/zeus)) that facilitate community efforts. + ## Carbon Footprint {#carbon-footprint} The massive electricity demands of data centers can lead to significant environmental externalities absent an adequate renewable power supply. Many facilities rely heavily on non-renewable energy sources like coal and natural gas. For example, data centers are estimated to produce up to [2% of total global $\textrm{CO}_2$ emissions](https://www.independent.co.uk/climate-change/news/global-warming-data-centres-to-consume-three-times-as-much-energy-in-next-decade-experts-warn-a6830086.html) which is [closing the gap with the airline industry](https://www.computerworld.com/article/3431148/why-data-centres-are-the-new-frontier-in-the-fight-against-climate-change.html). As mentioned in previous sections, the computational demands of AI are set to increase. The emissions of this surge are threefold. First, data centers are projected to increase in size [@EnergyCons_Emission]. Secondly, emissions during training are set to increase significantly [@Carbon_LNN]. Thirdly, inference calls to these models are set to increase dramatically as well. @@ -312,7 +322,7 @@ For example, little public data from companies exists quantifying energy use and While electronic waste generation levels can be estimated, specifics on hazardous material leakage, recycling rates, and disposal methods for the complex components are hugely uncertain without better corporate documentation or regulatory reporting requirements. -Even for the usage phase, the lack of fine-grained data on computational resource consumption for training different model types makes reliable per-parameter or per-query emissions calculations difficult. Attempts to create lifecycle inventories estimating average energy needs for key AI tasks exist [@henderson2020towards; @anthony2020carbontracker] but variability across hardware setups, algorithms, and input data uncertainty remains extremely high. +Even for the usage phase, the lack of fine-grained data on computational resource consumption for training different model types makes reliable per-parameter or per-query emissions calculations difficult. Attempts to create lifecycle inventories estimating average energy needs for key AI tasks exist [@henderson2020towards; @anthony2020carbontracker] but variability across hardware setups, algorithms, and input data uncertainty remains extremely high. Furthermore, real time carbon intensity data, which is critical in accurately tracking operational carbon footprint, is lacking in many geographic locations, thereby rendering existing tools for operational carbon emission mere approximations based on annual average carbon intensity values. The challenge is that tools like [CodeCarbon](https://codecarbon.io/) and [ML $\textrm{CO}_2$](https://mlco2.github.io/impact/#compute) but these are ad hoc approaches at best. Bridging the real data gaps with more rigorous corporate sustainability disclosures and mandated environmental impact reporting will be key for AI’s overall climatic impacts to be understood and managed. @@ -407,6 +417,20 @@ Beyond energy efficiency, sustainability assessment tools help evaluate the broa The availability and ongoing development of Green AI frameworks and tools are critical for advancing sustainable AI practices. By providing the necessary resources for developers and researchers, these tools facilitate the creation of more environmentally friendly AI systems and encourage a broader shift towards sustainability in the tech community. As Green AI continues to evolve, these frameworks and tools will play a vital role in shaping a more sustainable future for AI. +### Benchmarks and Leaderboards + +Benchmarks and leaderboards are important for driving progress in Green AI by providing standardized ways to measure and compare different methods. Well-designed benchmarks that capture relevant metrics around energy efficiency, carbon emissions, and other sustainability factors enable the community to track advancements in a fair and meaningful way. + +There exist extensive benchmarks for tracking AI model performance, such as those extensively discussed in the [Benchmarking](./contents/benchmarking.qmd) chapter, but there is a clear and pressing need for additional standardized benchmarks focused on sustainability metrics like energy efficiency, carbon emissions, and overall ecological impact. Understanding the environmental costs of AI is currently hampered by a lack of transparency and standardized measurement around these factors. + +Emerging efforts such as the [ML.ENERGY Leaderboard](https://ml.energy/leaderboard), which provides performance and energy consumption benchmarking results for large language models (LLMs) text generation, assists in enhancing the understanding of the energy cost of GenAI deployment. + +As with any benchmark, it is important that Green AI benchmarks represent realistic usage scenarios and workloads. Benchmarks that focus narrowly on easily gamed metrics may lead to short-term gains but fail to reflect actual production environments where more holistic measures of efficiency and sustainability are needed. The community should continue expanding benchmarks to cover diverse use cases. + +Wider adoption of common benchmark suites by industry players will accelerate innovation in Green AI by allowing easier comparison of techniques across organizations. Shared benchmarks lower the barrier for demonstrating the sustainability benefits of new tools and best practices. However, care must be taken around issues like intellectual property, privacy, and commercial sensitivity when designing industry-wide benchmarks. Initiatives to develop open reference datasets for Green AI evaluation may help drive broader participation. + +As methods and infrastructure for Green AI continue maturing, the community also needs to revisit benchmark design to ensure existing suites capture new techniques and scenarios well. Tracking the evolving landscape through regular benchmark updates and reviews will be important to maintain representative comparisons over time. Community efforts for benchmark curation can enable sustainable benchmark suites that stand the test of time. Comprehensive benchmark suites owned by research communities or neutral third parties like [MLCommons](https://mlcommons.org) may encourage wider participation and standardization. + ## Case Study: Google’s 4Ms {#case-study-google-4ms} Over the past decade, AI has rapidly moved from the realm of academic research to large-scale production systems powering numerous Google products and services. As AI models and workloads have grown exponentially in size and computational demands, concerns have emerged about their energy consumption and carbon footprint. Some researchers predicted runaway growth in ML's energy appetite that could outweigh efficiencies gained from improved algorithms and hardware [@9563954].