Proactive Fault Tolerance in Distributed Cloud Systems: A Review of Predictive and Preventive Techniques

Authors

  • Dathar Hasan Duhok Polytechnic University
  • Subhi R. M. Zeebaree Energy Engineering Department, Technical College of Engineering, Duhok Polytechnic University, Duhok, Iraq. https://orcid.org/0000-0002-3895-2619

DOI:

https://doi.org/10.33022/ijcs.v13i2.3808

Keywords:

Cloud Computing, Fault Tolerance, Proactive Techniques, Predictive Techniques, Cloud Availability.

Abstract

In a cloud computing environment, various hardware and software services are provided to the users across multiple servers and data centers. These servers are communicated to each other to allow greater scalability, flexibility, and reliability. Reliability is a vital factor in cloud computing that ensures that the requested services will be delivered to the users whenever they request them. However, different hardware or software faults may occur in cloud servers or data centers that prevent the users from receiving the service. Fault tolerance is defined as the ability of the system to provide services to the users even with the presence of faults or failures. In this review, we focused on some of the emerging fault tolerance techniques researchers have proposed to tackle the fault issues in cloud computing. We divided these techniques into three main categories: proactive and reactive techniques. Proactive techniques involve protecting the system defects by proposing certain procedures to prevent reaching the defective condition. Reactive techniques refer to the ability of the cloud system to recover the defective server or framework to continue working and providing the service.

Downloads

Published

01-04-2024