Alluxio
Original author(s) | Haoyuan Li |
---|---|
Developer(s) | UC Berkeley AMPLab |
Initial release | April 8, 2013 |
Stable release | v2.9.4
/ June 11, 2024[1] |
Repository | https://github.com/Alluxio/alluxio |
Written in | Java |
Operating system | macOS, Linux |
Available in | Java |
License | Apache License 2.0 |
Website | www |
Alluxio is a software company with solutions used by AI platform engineering and infrastructure teams to improve the performance of AI/ML model training, model distribution, and model inference serving. Alluxio’s AI-scale distributed caching software, initially named Tachyon, was born out of the University of California, Berkeley’s AMPLab by Alluxio founder and CEO Haoyuan (HY) Li in 2013.
Overview
[edit]At the heart of Alluxio’s solutions is their large-scale distributed caching software. Built on a Decentralized Object Repository Architecture (DORA),[2] Alluxio’s caching software delivers low-latency, high-throughput data access to AI/ML workloads by caching workload data in close proximity to the GPU clusters executing the AI/ML code. By reducing data access times, the performance of AI/ML workloads improves and GPU utilization increases.[3] Alluxio’s distributed caching software scales horizontally and supports caching of billions of objects and petabytes of data.
Alluxio is not a persistent storage solution. Alluxio’s distributed caching software is deployed between the GPU cluster and the persistent storage system housing the AI/ML data. Alluxio automatically brings data from the persistent storage system into the cache on demand or in batch via a cache loading API. Alluxio automatically manages the cache contents to keep frequently and recently accessed data in cache and evicting less frequently used data as needed.
How Alluxio is Used
[edit]Alluxio plays a key role in accelerating AI-related workloads, particularly in the areas of model training and model distribution. In machine learning and AI workflows, model training often requires access to large datasets stored across multiple platforms, including on-premises and cloud storage. Alluxio addresses this challenge by providing a unified data layer that caches frequently accessed data, reducing data retrieval latency and eliminating I/O bottlenecks. This leads to faster and more efficient model training. Additionally, Alluxio facilitates model distribution by enabling seamless data access across heterogeneous storage systems, making it easier to share models and datasets between different computational frameworks and environments. By integrating with popular AI frameworks such as TensorFlow and PyTorch, Alluxio ensures fast data access, regardless of its physical location, which is essential for the efficient training and distribution of AI models at scale.
Storage Integrations
[edit]Alluxio integrates with a wide range of storage systems, allowing it to function as a caching layer that bridges compute frameworks and storage backends. Supported storage integrations include:[4]
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
- HDFS
- Network Attached Storage (NAS)
- Baidu Object Storage (BOS)
- Aliyun OSS
- Tencent COS
- Volcengine TOS
Through these integrations and Alluxio’s global unified namespace, AI/ML engineers and applications get unified data access across different storage platforms.
Client Interfaces
[edit]Alluxio provides various interfaces for AI/ML engineers and applications to access data:[5]
POSIX API: The Alluxio POSIX API is based on the Filesystem in Userspace (FUSE) project. This client-side protocol allows mounting an Alluxio File System as a standard file system on most Unix variants. Alluxio’s solution is different from projects like S3Fs or mountable HDFS, which mount specific storage services like S3 or HDFS to the local filesystem. The Alluxio POSIX API is a generic solution for the many storage systems supported by Alluxio. Data orchestration and caching features from Alluxio speed up I/O access to frequently used data.
S3 API Endpoint: Alluxio supports a RESTful API that is compatible with the basic operations of the Amazon S3 API.
Python SDK: The Alluxio Python SDK (alluxiofs[6]) is based on FSSpec[7] and enables Python applications to access Alluxio managed data through Python’s existing filesystem interface and is compatible with popular AI frameworks, including PyTorch, PyArrow, and Ray.
Global Unified Namespace
[edit]Alluxio provides a global unified namespace to provide AI/ML engineers and applications with a single interface for accessing data stored on heterogeneous storage systems. This reduces the need for customizing application code to support multiple protocols.
Enterprises Using Alluxio
[edit]The following is a list of notable enterprises that have used or are using Alluxio:
See also
[edit]References
[edit]- ^ "Releases · Alluxio/alluxio". github.com. Retrieved 2025-02-09.
- ^ "Introducing DORA: The Next-generation Alluxio Architecture". www.alluxio.io. Retrieved 2025-04-30.
- ^ "GPU Utilization: What Is It and How to Maximize It". www.alluxio.io. Retrieved 2025-04-30.
- ^ "Storage Integrations | Alluxio". documentation.alluxio.io. Archived from the original on 2025-02-19. Retrieved 2025-04-30.
- ^ "Client APIs | Alluxio". documentation.alluxio.io. Archived from the original on 2025-02-19. Retrieved 2025-04-30.
- ^ fsspec/alluxiofs, python filesystem spec, 2025-03-27, retrieved 2025-04-30
- ^ "fsspec: Filesystem interfaces for Python — fsspec 2025.3.0.post3+g6b85a47.d20250314 documentation". filesystem-spec.readthedocs.io. Retrieved 2025-04-30.
- ^ "This New Open Source Project Is 100X Faster than Spark SQL In Petabyte-Scale Production".
- ^ "Making the Impossible Possible with Tachyon: Accelerate Spark Jobs from Hours to Seconds".
- ^ "China Unicom's big bet on open source".
- ^ "Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions".
- ^ "Cray Analytics and Alluxio – Wrangling Enterprise Storage". Archived from the original on 2019-07-14. Retrieved 2019-02-19.
- ^ "Alluxio's Use and Practice in Didi".
- ^ "Data Transformation in Financial Services".
- ^ "ArcGIS and Alluxio - Using Alluxio to enhance ArcGIS data capability and get faster insights from all your data".
- ^ "Huawei hugs open-sourcey Alluxio: Thanks for the memories". The Register.
- ^ "How Alluxio is Accelerating Apache Spark Workloads". Archived from the original on 2019-07-14. Retrieved 2019-02-19.
- ^ "Getting Started with Tachyon by Use Cases".
- ^ "Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks".
- ^ "World's Largest Computer Maker Lenovo Selects Alluxio for Data Management of Worldwide Smartphone Data".
- ^ "RedNote Accelerates Model Training & Distribution with Alluxio".
- ^ "Enhancing the Value of Alluxio with Samsung NVMe SSDs".
- ^ "Tencent Delivering Customized News to Over 100 Million Users per Month with Alluxio".
- ^ "Millions Saved Annually: Unleashing the Power of Alluxio HDFS at Uber".
- ^ "The Practice of Alluxio in Near Real-Time Data Platform at VIPShop".
- ^ "Bringing Data to Life - Data Management and Visualization Techniques".