ASSIS, Leonardo de.
Resumo:
The technology of grid was created to facilitate the resource sharing among individuals belonging to different administrative domains. In recent years, the use of grid computing is
increasingly common due to the large computational power that this technology can provide at a low cost. Because of this, the execution of parallel applications that process a large amount of data (data-intensive) is increasingly common in this type of platform. A parallel application can be viewed as a collection of tasks that can be executed in parallel. A parallel application can be viewed as a collection of tasks that can be executed in parallel. For some of these applications, these tasks are independent and can be scheduled to run parallel in any order. This type of parallel application is referenced in literature as Bag-of-Tasks (BoT) applications. In order to schedule tasks onto resources in an efficient manner, grid applications schedulers use scheduling heuristics. The scheduling heuristics can be classified into two approaches: i) bin-packing heuristics, and ii) heuristics based on replication. The first approach requires complete and accurate information about the execution environment and the application. The second approach does not use any information, but, instead, it applies the principle of tasks replication to achieve good performance. But both approaches have disadvantages, complete and accurate information about the execution environment and the application is not always possible in a grid computing environment, while the redundancy of replication heuristics causes resource waste. In a recent work, it was investigated despite the fact that in a grid environment, the accurate information is difficult to get, it is not impossible to have it. In practice, the information can be obtained by using services that collect information
about the environment and the application and publish it on grid information services.
That same study showed that it is possible to reduce the execution cost of CPU-intensive
applications, while maintaining the same efficiency, using any information that is available.
Based on the assumption of that work, this dissertation presents a scheduling heuristic for
BoT data-intensive applications that is adaptive to the information availability, called Adaptive Data-Intensive. The results obtained by heuristic Adaptive Data-Intensive indicated that the rational use of available information leads to a reduction of application execution time and resource waste.