본문으로 건너뛰기

"PureStorage&AI" 태그로 연결된 1개 게시물개의 게시물이 있습니다.

모든 태그 보기

· 약 10분
Yifeng Jiang

PureStorage is a global storage company that provides all-flash storage solutions for AI and machine learning. As a Technology Alliance Partner (TAP) of PureStorage, Lablup helps AI developers and data scientists efficiently focus on AI development in an optimal environment, by providing PureStorage FlashBlade dedicated storage plug-ins in Backend.AI. Mr. Yifeng Jiang is working as Principal Solutions Architect in Data Science at PureStorage. As an expert with years of experience in big data and machine learning, he has written the following article on how to optimally operate GPU servers and storages within an AI infrastructure.

Address challenges of managing and sharing GPU and data in AI infrastructure with a couple of clicks.

Let’s say that the company bought you and your fellow data scientists several powerful GPU servers. You feel excited because you don’t have to wait for your training jobs to complete on your laptop anymore. You want to use the GPUs immediately, but it might not be that simple. Here are some questions you IT team may ask you before you are able to access the GPUs:

  • How many CPU and GPU resources do you need?

  • What’s the estimated start and finish time?

  • Do you need shared storage and how big should it be?

  • How big are your datasets and how are you going to upload them?

And this may happen every time anyone in the team wants to use the GPUs.

Making AI accessible has never been easy. AI is such a broad range of technologies. AI is far beyond just writing some Python codes to train machine learning models. Building and operating an end-to-end AI infrastructure and system is not easy, even for big enterprises. As described in this paper from Google, only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.