Game Theoretical Data Replication Techniques For Large-scale Autonomous Distributed Computing Systems

Game Theoretical Data Replication Techniques For Large-scale Autonomous Distributed Computing Systems

Date

2007-10-08T23:55:05Z

Publisher

Computer Science & Engineering

Abstract

Data replication in geographically dispersed servers is an essential technique for reducing the user perceived access time in large-scale distributed computing systems. A majority of the conventional replica placement techniques lack scalability and solution quality. To counteract such issues, this thesis proposes a game theoretical replica placement framework, in which autonomous agents compete for the allocation or reallocation of replicas onto their representative servers in a self-managed fashion. Naturally, each agent's goal is to maximize its own benefit. However, the framework is designed to suppress individualism and to ensure system-wide optimization. Using this framework as an environment, several cooperative and non-cooperative low-complexity, flexible, and scalable game theoretical replica placement techniques are proposed, analytically investigated, and experimentally evaluated. Each of these techniques supports different game theoretical (pareto-optimality, catering to agents' interests, deliberate discrimination of allocation, budget balanced, pure Nash equilibrium, and Nash equilibrium) and system (link distance, congestion control, minimization of communication cost, and memory optimization) related properties. Using a detailed test-bed involving eighty various network topologies and two real-world access logs, each game theoretical technique is also extensively compared with conventional replica placement techniques, such as, greedy heuristics, branch-and-bound techniques and genetic algorithms. The experimental study confirms that in each case the proposed techniques outperform other conventional methods. The results can be summarized in four ways: 1) The number of replicas in a system self-adjusts to reflect the ratio of the number of reads versus writes access; 2) Performance is improved by replicating objects to the servers based on the locality of reference; 3) Replica allocations are made in a fast algorithmic turn-around time; 4) The complexity of the data replication problem is decreased by multifold.