BRITO, Andrey; http://lattes.cnpq.br/2634324830901340; BRITO, Andrey Elísio Monteiro.
Résumé:
Most of the infra-structures for deploying distributed applications are characterized by the
absence of known upper bounds in the communication and process scheduling delays, i.e. they are asynchronous systems. Asynchronous systems are considered to be the best environment to develop applications for its portability and scalability, consequence of the lack of strong timing assumptions about the system. Unfortunately, most of the basic problems of distributed systems cannot be solved in such a system when they are subjected to failures. On the other hand, the synchronous systems allow trivial solutions to the basic problems of distributed systems. This is consequence of the existence of known upper bounds in communication and scheduling delays. However, most of the practical systems are not synchronous, and this fact motivates the conception of intermediate models between these two systems, named partially synchronous models. One of the most popular partially synchronous model is the asynchronous model equipped with an unreliable failure detector. This model consists in an asynchronous model, in which each process has access to a module that gives information about which processes have failed.
Among the proposed failure detectors, the perfect failure detector is the strongest and is
the only one which does not make mistakes (for example, by suspecting processes that did not fail). Unfortunately, perfect failure detectors can only be implemented in synchronous systems. However, as some applications require perfect failure detectors, the interest in them has recently raised. A recent approach taken by the designers of such systems is to implement synchronous subsystems with very limited capacity underneath the partially synchronous system where the applications execute. This work details the implementation of a synchronous subsystem in an environment composed of standards PCs connected through a local area network and running an off-the-shelf operating system. Finally, although the implementation of perfect failure detectors by itself justify the development of such synchronous subsystem, we believe that stronger abstractions than the perfect failure detectors can be developed. We introduce one of these abstractions, the Global State Digest Provider, which allows faster solutions to the consensus problem, where processes try to agree in a proposed value.