SANTOS NETO, E. L.; http://lattes.cnpq.br/2149883544244800; SANTOS NETO, Elizeu Lourenço dos.
Abstract:
Data-intensive applications executing over a computational grid demand large data transfers. These are costly operations. Therefore, taking them into account is mandatory to achieve efficient scheduling of data-intensive applications on grids. Further, within an heterogeneous
environment such as a grid, good schedules are typically attained by heuristics that use dynamic information about the grid and the applications (network and CPU loads, completion time of tasks, etc). However, these information are often difficult to be obtained accurately.
Although there are schedulers that attain good performance without requiring that kind of information, they were not designed to take data transfer delays into account. This work presents Storage Affinity, a novel scheduling heuristic for Bag-of-Tasks and data-intensive
applications running on grid environments. Storage Affinity exploits a data reuse pattern, common on many data-intensive applications, allowing it to take data transfer delays into account and reduce the makespan of the application. Further, it uses a replication strategy
that yields efficient schedules without relying upon dynamic information that is difficult to obtain. Our results show that Storage Affinity may attain performance that is in average better than that of state-of-the-art knowledge-dependent schedulers, even in the unlikely c a s e
when the latter are fed with perfect information. This is achieved at the expense of consuming more CPU cycles (in average, more than not using replication).