PureStorage is a global storage company that provides all-flash storage solutions for AI and machine learning. As a Technology Alliance Partner (TAP) of PureStorage, Lablup helps AI developers and data scientists efficiently focus on AI development in an optimal environment, by providing PureStorage FlashBlade dedicated storage plug-ins in Backend.AI. Mr. Yifeng Jiang is working as Principal Solutions Architect in Data Science at PureStorage. As an expert with years of experience in big data and machine learning, he has written the following article on how to optimally operate GPU servers and storages within an AI infrastructure.
Address challenges of managing and sharing GPU and data in AI infrastructure with a couple of clicks.
Let’s say that the company bought you and your fellow data scientists several powerful GPU servers. You feel excited because you don’t have to wait for your training jobs to complete on your laptop anymore. You want to use the GPUs immediately, but it might not be that simple. Here are some questions you IT team may ask you before you are able to access the GPUs:
How many CPU and GPU resources do you need?
What’s the estimated start and finish time?
Do you need shared storage and how big should it be?
How big are your datasets and how are you going to upload them?
And this may happen every time anyone in the team wants to use the GPUs.
Making AI accessible has never been easy. AI is such a broad range of technologies. AI is far beyond just writing some Python codes to train machine learning models. Building and operating an end-to-end AI infrastructure and system is not easy, even for big enterprises. As described in this paper from Google, only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.
For now, let’s just focus on the small box — the ML code part. It is still very different from traditional software development, because we use GPUs or TPUs for model training. GPUs/TPUs are expensive. They require additional setup of drivers and libraries. How do you share and schedule GPUs for your team? I have heard people using third-party tools, in-house built scripts, or even a spreadsheet, which is obviously not that "intelligent".
Recently, I had a conversation with folks at Lablup Inc. We talked about the challenges, opportunities and technologies in AI. The team also showed me their latest work on making AI accessible — Backend.AI, an open-source computing resource orchestrator for AI/ML. I was impressed by what the team has achieved. So I thought I should write a blog for this.
Making AI accessible
A lot of companies start adopting AI by hiring data scientists and equipping them with laptops with one GPU card. This is okay but once the team starts working on larger datasets and models, we need more GPUs. We need GPU servers in the datacenter or in the cloud. This is where the infrastructure challenges show up. Not every company is good at managing GPU resources. Backend.AI is trying to solve this very first problem of the AI system — making GPUs, or computing resources in general, easy to access and share among data scientists. It handles the complexity of managing and sharing GPUs, so that the data scientists can focus on building better models. It handles the complexity by abstracting GPU resources across many servers as a computing pool, which can be allocated to different computing sessions on demand from a simple UI.
A new session can run on a single GPU, multiple GPUs, or even a portion of a GPU, which is useful for small model training and inference. With a couple of clicks, I was able to launch a GPU enabled session with my favorite ML frameworks from my browser. I don’t need to install anything — neither the GPU driver, nor the libraries, nor even the frameworks.Backend.AI also supports launching commonly used ML tools from the browser. I use Jupyter a lot, so I click the button to start a JupyterLab instance. From there, I can start writing my ML code immediately.
I can also start a terminal within the session with one click. The CLI, running in the browser, gives me flexibility and more control over the session.Since my session environment is isolated from those of my teammates’, I don’t need to worry about conflicting and resource competition. Once my training job is done, I can save my model and shutdown the session. Then the GPU resources will be returned back to the pool.
You see how Backend.AI makes AI accessible. I don’t need to install anything, or ask someone to provide me with the GPU environment. I just click, click and go! No matter if it is a single GPU, or multiple GPUs. This is consuming the GPU resources as a service. Every data scientist deserves something like this.
How does it work?
Okay this seems cool. Now you wonder how it works? At the high level, Backend.AI works like many server-client systems. Clients are the SDKs and libraries that initiate requests to the server-side components. Most magic happens at the server, but for the client side, I do want to mention that I particularly like its integration with Visual Studio Code. I use VSCode for almost everything, including coding and taking notes daily. Being able to write and debug my models, and scale to many GPUs in a single VSCode environment, to me this is very convenient and powerful.
There are multiple components on the server side, including the Manager with API Gateway for routing API requests, Agent for executing requests and running containers, and Kernels for running commands/code in various programming languages and configurations.
Does this sound similar? When I first heard this architecture, my first thought was, what’s the difference between Backend.AI and the Docker + NGC + Kubernetes stack? So I asked. Here are the key differentiators Lablup folks shared with me:No Kubernetes required. Compared to Kubernetes based solutions, which is still difficult to many companies, Backend.AI simplifies access to the cluster.
Optimized for running HPC and AI workloads by better container-level isolation and virtualization of multi-tenant workloads on high-end/high-density nodes, which makes it easier to maximize GPU utilization. On the other hand, Kubernetes is more focused on running micro-services with a herd of small-sized nodes and keeping up the desired deployment states. Fla
Better performance because of the HPC/AI optimized design such as topology-aware resource assignment.
Comes with fractional GPU scaling to extract the last drop of GPU performance by allocating dedicated fractions of GPUs to individual containers, without modifying existing CUDA programs and libraries.
Native integration with storage filesystems, which enables higher I/O and data management performance using filesystem / storage-specific acceleration features.
Even it supports using Kubernetes clusters as a backend computing node (currently in the beta state).
Storage Integration
An AI system is data plus code (models). Backend.AI supports seamless integration with various network attached storage (NAS) products. Backend.AI includes a storage proxy component that provisions volume from NAS and presents as a virtual folder (vfolder) to a session. While NAS is typically slower than direct attached SSDs in the GPU server, the reason we need vfolder is because it is persistent across sessions. So we don’t lose data when shutting down the session.
Pure Storage FlashBlade NFS is the recommended NAS for on-premise Backend.AI customers. With FlashBlade NFS integration, customers can leverage the all-flash storage’s fast performance and its Rapidfile Toolkit to supercharge their AI workloads. In addition, FlashBlade I/O stats, along with CPU/GPU utilization and other stats are also available on Backend.AI UI. This makes it easy to understand system stats at a single glance.
There are many NAS options, why FlashBlade? To the data scientists, storage might be the least exciting thing in the AI/ML stack, however, it is a critical part. It is critical to have a simple, reliable and fast NAS in the stack.
FlashBlade is easy to use and manage.
FlashBlade is designed with high resiliency and protection.
FlashBlade delivers high throughput with consistent low latency and linear scalability.
I don’t like to compare the GBps different NAS products deliver, because oftentimes it is just a marketing message. For AI/ML workloads, the NAS needs to be at least as fast as the GPUs can process the data, because otherwise the expensive GPUs will be doing nothing but waiting for data to come in. 80GBps does not make sense if your GPUs can only process 4GB per second in the training job. Although, you do need 4GBps from the NAS to saturate the GPUs.
Once the NAS can deliver the minimum required throughput, it is the simplicity, resilience and functionality that matters. One nice thing the Lablup folks particularly talked about FlashBlade is the Rapidfile Toolkit.It is not rare that a ML dataset could have millions of files these days. The first time you decompress the dataset, what’s the first several things you do on that? Change permissions or collect file statistics, most likely. So you run some Linux commands like chmod or find for that. But wait! It may take tens of minutes or even hours to finish. It is not because the NAS is slow, but due to the way these Linux commands work — they operate the files one-by-one in sequence. It simply takes that much time to operate millions of files this way.
Rapidfile Toolkit comes to the rescue. Rapidfile Toolkit is a software package provided by Pure Storage to accelerate common Linux file operations by up to 30x. Once installed, it provides a set of “p-commands” for its Linux counterparts. For example, pfind and pls for find and ls, respectively. The p-commands improve traditional Linux commands by operating files in parallel. So instead of changing permissions one file at a time, it sends many requests in multiple threads to the NAS, changing multiple files permissions at once.
In a test against 1.2 million files stored on FlashBlade NFS, we found that Rapidfile Toolkit is around 30x faster than its Linux counterpart for the most common commands.
I heard from the Lablup team that Rapidfile Toolkit has been one of the favorite features of some Backend.AI and Pure Storage joint customers. ### Summary AI has been drawing attention to mass audiences. Expectations are high. Yet companies and governments are still struggling to build their AI infrastructure. Managing and sharing GPU resources and data is one of the first challenges we need to address. Lablup Inc.’s Backend.AI enables consuming GPU as a service. Pure Storage’s FlashBlade delivers data as a service with simplicity and speed. Together they make AI accessible.