Skip to main content

· 14 min read
권강민

Backend.AI에서는 GraphQL을 사용하고 있습니다. 이번 포스트에서는 크게 세 가지 주제에 대해서 이야기 해보려고 합니다. 첫 번째로 GraphQL에 대한 간단한 설명과 GraphQL을 어떻게 사용하고 있는지에 대해서, 두 번째로 GraphQL을 사용하면서 어떤 기술적 문제에 봉착했고 어떻게 해결했는지, 마지막으로 Pagination에 대해서 설명하는 시간을 가지려 합니다.

· 10 min read
Yifeng Jiang

PureStorage is a global storage company that provides all-flash storage solutions for AI and machine learning. As a Technology Alliance Partner (TAP) of PureStorage, Lablup helps AI developers and data scientists efficiently focus on AI development in an optimal environment, by providing PureStorage FlashBlade dedicated storage plug-ins in Backend.AI. Mr. Yifeng Jiang is working as Principal Solutions Architect in Data Science at PureStorage. As an expert with years of experience in big data and machine learning, he has written the following article on how to optimally operate GPU servers and storages within an AI infrastructure.

Address challenges of managing and sharing GPU and data in AI infrastructure with a couple of clicks.

Let’s say that the company bought you and your fellow data scientists several powerful GPU servers. You feel excited because you don’t have to wait for your training jobs to complete on your laptop anymore. You want to use the GPUs immediately, but it might not be that simple. Here are some questions you IT team may ask you before you are able to access the GPUs:

  • How many CPU and GPU resources do you need?

  • What’s the estimated start and finish time?

  • Do you need shared storage and how big should it be?

  • How big are your datasets and how are you going to upload them?

And this may happen every time anyone in the team wants to use the GPUs.

Making AI accessible has never been easy. AI is such a broad range of technologies. AI is far beyond just writing some Python codes to train machine learning models. Building and operating an end-to-end AI infrastructure and system is not easy, even for big enterprises. As described in this paper from Google, only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.