SOUZA, J. A. C.; http://lattes.cnpq.br/0373998516467963; SOUZA, Joeberth Augusto Cordeiro de.
Resumo:
DadosJusBr is a non-profit project with the objective of presenting in a detailed and consolidated way
the remuneration information of the agencies that make up the Brazilian justice system, formed by
the Public Prosecutors, Defenders, Attorneys and the Judiciary with the courts and councils, together
add up to 156 agencies. This process is called ‘Libertação dos dados’ and has four stages: Collection,
Validation, Packaging and Storage. It is in the collection stage that the growth of the project is associated, as it is necessary to coding the collectors, one for each agency. DadosJusBr is an open
source project, so the community can participate, writing collectors in multiple programming
languages, such as Go and Python. With the use of another programming language, also encompassing dynamic typing where it is more difficult to force a schema considering the type,
several problems arise to restrict the data schema. The main one is consistency in the serialization of
collected data, which is very important for storage and transmission between stages, as the standard
way that languages serialize data is different. In this work we proposed and implemented the use of
Protocol Buffers (PB) to make it easier to maintain, transmit and store data consolidated by DataJusBr.
We currently have 52 agencies collected, among them the MPPB, coded in Golang, the website of the
National Council of Justice (CNJ), coded in python, which were our data collectors that we used in this
work. Adapting crawlers and parsers, changing all fields of these collectors to deal with the new data
transmission format, resulted in unexpected difficulties, such as dealing with timestamp between the
two languages and transmitting the data in PB in text format, thus achieving the serialization of data
at all stages. Thus, consolidating the serialization and transmission of data between collectors of
different languages, making DadosJusBr more democratic and comprehensive, facilitating the
contribution.