International Conference on Architectural Support for Programming Languages and Operating Systems / 2021

Breaking the computation and communication abstraction barrier in distributed machine learning workloads

Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, M. Musuvathi, Todd Mytkowicz, Olli Saarikivi

Foundation ModelsLarge Language ModelsML SystemsPopular and Landmark Papers

Recent trends towards large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, the current logical separation between computation and communication kernels in machine learning frameworks misses optimization opportunities across this barrier. Breaking this abstraction can provide many optimizations to improve the performance of distributed workloads. However, manually applying these optimizations requires modifying the underlying computation and communication libraries for each scenario, which is both time consuming and error-prone. Therefore, we present CoCoNet, which contains (i) a domain specific language to express a distributed machine learning program in the form of computation and communication operations, (ii) a set of semantics preserving transformations to optimize the program, and (iii) a compiler to generate jointly optimized communication and computation GPU kernels. Providing both computation and communication as first class constructs allows users to work on a high-level abstraction and apply powerful optimizations, such as fusion or overlapping of communication and computation. CoCoNet enabled us to optimize data-, model- and pipeline-parallel workloads in large language models with only a few lines of code. Our experiments show that CoCoNet significantly outperforms state-of-the-art distributed machine learning implementations.

95 citations7 influential

Full paper

Read the original paper

Open PDF Source page

Learning resources

arXiv PDFPDF arXiv abstract pagearXiv Google Scholar referencesGoogle Scholar Papers with Code searchPapers with Code Semantic Scholar paper pageSemantic Scholar YouTube explanationsYouTube

Reading state

Discuss in ChatGPT

Uses your own ChatGPT account. The paper context is copied into a tutor prompt before ChatGPT opens.

Preview prompt

You are my AI/ML research paper instructor. I want to deeply understand the paper below.

First, teach it in layers:
1. One-paragraph intuition.
2. Problem statement and why it mattered.
3. Key method, architecture, or algorithm.
4. Important equations or mechanisms, explained intuitively.
5. Experiments and evidence.
6. Limitations, assumptions, and failure modes.
7. How this paper influenced later AI/ML/Deep Learning/GenAI work.
8. A 30-minute study plan with checkpoints.
9. Quiz me with 5 questions and wait for my answers.

When something is not available in the attached context, say what is missing and infer carefully.

### Paper attached as context
Title: Breaking the computation and communication abstraction barrier in distributed machine learning workloads
Authors: Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, M. Musuvathi, Todd Mytkowicz, Olli Saarikivi
Year: 2021
Venue: International Conference on Architectural Support for Programming Languages and Operating Systems
Categories: Foundation Models, Large Language Models, ML Systems, Popular and Landmark Papers
Citations: 95
Paper URL: https://arxiv.org/abs/2105.05720
Open PDF: https://arxiv.org/pdf/2105.05720

Abstract:
Recent trends towards large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, the current logical separation between computation and communication kernels in machine learning frameworks misses optimization opportunities across this barrier. Breaking this abstraction can provide many optimizations to improve the performance of distributed workloads. However, manually applying these optimizations requires modifying the underlying computation and communication libraries for each scenario, which is both time consuming and error-prone. Therefore, we present CoCoNet, which contains (i) a domain specific language to express a distributed machine learning program in the form of computation and communication operations, (ii) a set of semantics preserving transformations to optimize the program, and (iii) a compiler to generate jointly optimized communication and computation GPU kernels. Providing both computation and communication as first class constructs allows users to work on a high-level abstraction and apply powerful optimizations, such as fusion or overlapping of communication and computation. CoCoNet enabled us to optimize data-, model- and pipeline-parallel workloads in large language models with only a few lines of code. Our experiments show that CoCoNet significantly outperforms state-of-the-art distributed machine learning implementations.

Learning resources:
- PDF: arXiv PDF (https://arxiv.org/pdf/2105.05720)
- arXiv: arXiv abstract page (https://arxiv.org/abs/2105.05720)
- Google Scholar: Google Scholar references (https://scholar.google.com/scholar?q=Breaking%20the%20computation%20and%20communication%20abstraction%20barrier%20in%20distributed%20machine%20learning%20workloads)
- Papers with Code: Papers with Code search (https://paperswithcode.com/search?q=Breaking%20the%20computation%20and%20communication%20abstraction%20barrier%20in%20distributed%20machine%20learning%20workloads)
- Semantic Scholar: Semantic Scholar paper page (https://www.semanticscholar.org/paper/91b29761840442005da39bc258e2298b528f31aa)
- YouTube: YouTube explanations (https://www.youtube.com/results?search_query=Breaking%20the%20computation%20and%20communication%20abstraction%20barrier%20in%20distributed%20machine%20learning%20workloads+paper+explained)