Processamento paralelo de grandes quantidades de dados sobre um sistema de arquivos distribuído POSIX.

Página inicial
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO
→
Mestrado em Ciência da Computação.
→
Ver item

Processamento paralelo de grandes quantidades de dados sobre um sistema de arquivos distribuído POSIX.

http://lattes.cnpq.br/9160350154400626; SILVA, Jonhnny Weslley Sousa.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/4741

Data: 2010-05-21

Resumo:

Data-intensive applications are becoming increasingly more present in various sectors, since academia to shopping websites and social networks. However, the most of existing solutions assume the utilization of clusters to perform these applications, and clusters are an expen sive resource. Meanwhile, the workstations do not use much of the local storage space they have. In order to use the free space of these workstations, we built the Beehive File System (BeeFS), a distributed file system designed to meet the requirements of scalability and main tainability not offered by distributed file systems widely used in practice, such as NFS and Coda. Considering the natural distribution of data in BeeFS, it is evident that BeeFS can be used to process vaste amounts of data in a distributed way. However, since BeeFS consists of shared workstations, the execution of unsolicited data-intensive applications may impact the performance that users logged in these workstations experience. To mitigate this problem, this work presents data placement heuristics for file allocation in BeeFS. These heuristics try to increase the probability that files will be available for processing on idle workstations. For this, the heuristics take into account historical data about the use of system to decide where to store the file replicas that will be used for processing. These heuristics, coupled with a simple application scheduler that prevents run applications on non-idle machines, it drastically reduces inconvenience that these applications can lead to other users. The results show that the heuristics that consider the historical availability of workstations and, at the same time, realize balancing the amount of storage space between the machines have better performance than the heuristics do not consider the availability of machines.

Mostrar registro completo