Distributed Systems for Machine Learning in Cloud Computing: A Review of Scalable and Efficient Training and Inference

Authors

  • Shereen Sadiq Duhok Polytechnic University
  • Subhi R. M. Zeebaree Duhok Polytechnic University

DOI:

https://doi.org/10.33022/ijcs.v13i2.3814

Keywords:

Distributed Systems, Machine Learning in Cloud Computing, Scalable Training, Efficient Training, Inference

Abstract

Traditional computer systems have been pushed to their limits as a result of the exponential rise of data and the rising complexity of machine learning (ML) models. As a result of its on-demand scalability and resource agility, cloud computing has emerged as the platform of choice for training and deploying large-scale machine learning models. However, in order to make good use of cloud resources for machine learning, it is necessary to make use of distributed systems. These systems are responsible for coordinating computations over several nodes in order to manage the demanding workloads. The purpose of this paper is to investigate the realm of distributed systems for machine learning in cloud computing, with a particular emphasis on training and inference that is both scalable and efficient. During the discussion on the need of distributed systems in machine learning, it was made clear why conventional single-machine techniques are not enough for the requirements of current machine learning and how distributed systems might help solve these difficulties. Scalability and Efficiency Considerations were reviewed in relation to the primary elements that contribute to the effectiveness of a distributed system for machine learning. These elements include task partitioning, communication overhead, fault tolerance, and resource optimization that were discussed. In the context of cloud computing, the purpose of this review research is to provide a complete overview of the fascinating topic of distributed systems for machine learning. In order to successfully traverse the intricate and ever-changing world of cloud-based machine learning, it provides vital insights and information.

Downloads

Published

01-04-2024