Sora is here! The troika of Quantum s storage system meets the rigid needs of AI

Mondo Technology Updated on 2024-03-01

AI is evolving so fast! When people have not yet fully appreciated the great convenience brought by Wensheng Wen and Wensheng Diagram large models, OpenAI SORA is here! Its birth marks the beginning of the AI model to enter a more "advanced" Wensheng ** era! Now, with the SORA model, anyone can simply type in simple text to produce a detailed, visually appealing **.

It is foreseeable that a large number of Wensheng models will be launched quickly, which is bound to greatly promote the rapid development of film and television, advertising, media, short and other industries. At the same time, the huge amount of unstructured data will also increase at a faster rate, which puts forward higher requirements and challenges for the storage system.

In addition to the huge demand for computing power, AI also has a very high demand for storage power. What are the "rigid requirements" of AI for storage?

Training data storeThe training of AI models usually requires a large amount of data, including various types of data such as images, text, audio, and **. These data need storage space to save and read during preprocessing, feature extraction, and model training.

Model parameter storage: The parameter scale of AI models such as deep learning is often very large, and the number of parameters of large language models such as GPT can reach billions or even hundreds of billions. The trained model parameters need to be stored for a long time for subsequent use or further optimization.

Intermediate results are stored: Intermediate results, log information, and version iteration records generated during training also occupy storage space.

High-speed access requirementsAI training has high requirements for IO performance, especially for those massively parallel computing tasks, an efficient distributed storage system is indispensable, which can quickly read and write a large amount of data to improve training efficiency.

Real-time Near-real-time data processingIn the AI inference stage, especially for real-time or near-real-time AI applications, such as intelligent security monitoring and autonomous driving, it is necessary to quickly store and process a steady stream of new data.

The development of AI is driving the demand for high-capacity, high-performance storage technologies, including SSDs, distributed file systems, object storage services, and other storage solutions. At the same time, it also promotes the innovation of storage architecture, such as the use of cache acceleration and hierarchical data storage to meet different levels of storage requirements.

For the storage needs of AI, Quantum has already made a perfect layout, launching the "troika" of Myriad, Stornext, and ActiveScale storage systems.

Quantum's Myriad all-flash scale-out file and object storage platform

Quantum Myriad is a high-performance storage solution designed for modern data center and AI workloads. With an emphasis on its speed, scalability, and hardware-agnostic design, this all-flash storage platform addresses the needs of data-intensive workloads, especially those involving AI model storage, training data management, and high-performance computing use cases.

All-flash designDesigned for flash drives, the Myriad platform leverages the high performance and low latency of flash technology, making it ideal for the unstructured data storage needs of modern enterprises**.

Scale-out architecture: The share-nothing scalable architecture means that the system can scale horizontally as data grows without sacrificing performance, with extremely low latency.

Multi-protocol supportClient components include support for NFS (Network File System), SMB (Server Message Block, i.e., Windows File Sharing Protocol), S3 Object Storage protocol, and possibly proprietary services and GPU direct connection services to meet diverse data access needs.

File and object storage converge: Operates as both a file system and an object storage platform, supporting mixed workloads and facilitating unified management and access to different types of data.

The data service layer is feature-rich: Support inline deduplication and compression to reduce storage space consumption; Provides snapshot and clone capabilities to accelerate backup and recovery operations; In addition, AI ML data processing is optimized to speed up the training of machine learning and deep learning models.

Kubernetes orchestrationBy adopting Kubernetes container orchestration technology, Myriad's microservice architecture can achieve better resource allocation and fault recovery, further reduce latency and improve the concurrent processing capacity of the system.

High performance and low latency: Based on flash memory and RDMA (Remote Direct Memory Access) technology, it ensures that a high level of IO performance can be maintained even under high loads.

Quantum StorNext shared storage file system

Quantum StorNext is a highly scalable shared storage file system and data management platform known for its superior performance, large data transfer speeds, and ability to consolidate multiple storage media, especially in the ** and entertainment industries. There are a few key features that StorNext has that make it an effective way to support the needs of AI applications.

High-performance storage: AI workloads, especially deep learning and machine learning training, require fast, sustained access to data at scale. The high-speed file system and data migration capabilities provided by StorNext ensure the efficient flow of training data, thereby shortening the training cycle.

Data management and tiered storage:StorNext supports data lifecycle management, which can automatically migrate data between different tiers of storage media based on its importance, frequency, and cost, which helps optimize the cost-effectiveness of the entire process from raw data ingestion and preprocessing to model training to model deployment in AI projects.

Cloud integrationStorNext integrates with public cloud services such as AWS, Azure, and Google Cloud, making it easy for users to store and process data at scale in the cloud, which is especially useful for AI training and analysis using cloud computing resources.

Large-scale data processing capabilitiesAI applications often involve processing petabytes and exabytes of unstructured data, and StorNext is able to manage and process this level of data volume to ensure data integrity and availability.

Cross-platform supportThanks to Stornext's POSIX compatibility and extensive API support, it can seamlessly connect with various operating systems and application environments, making it easy for AI developers to work under different AI frameworks.

Quantum ActiveScale object storage system

Quantum ActiveScale is a highly scalable object storage system for storing and managing massive amounts of unstructured data. For AI workloads, Quantum ActiveScale is mainly reflected in the following aspects:

Large-scale data storageAI and machine learning projects often involve the collection, storage, and processing of massive amounts of data. ActiveScale provides an object-based storage architecture that can be easily scaled to the exabyte level to meet the large-scale data storage needs required for AI training.

Build a data lakeAs the infrastructure of the data lake, ActiveScale can centrally store all kinds of unstructured data, such as images and log files, which are important inputs for AI model training.

Data analysis friendlyBy integrating with big data processing frameworks such as Hadoop and Spark, ActiveScale can support AI-related data preprocessing and feature extraction, simplifying data preparation in the AI development process.

Cost-effectiveActiveScale enables cost-effective management of hot and cold data through a tiered storage strategy that combines disk, tape, and other low-cost storage media, helping enterprises cope with the huge storage pressure brought by AI applications while controlling costs.

API integrationActiveScale supports S3 interfaces and other standard APIs, making it easy to integrate with various AI development platforms and services, so that data can flow directly into the AI training pipeline.

Related Pages