Courrier des statistiques N9 - 2023
Which formats for which data?
In order to be intelligible to its users and to fulfil its function of conveying information, the data must be structured. This structure is then formalised through a data model, which determines the choice of format. Data formats are various and address specific problems, according to the context of use. Ideally, the standards behind the data models and the associated formats are shared by as many users as possible. For statistical data, these problems are located in the objects for which the data are likely to convey information, but also in data documentation - i.e. metadata - or in the wish to link specific software solutions with the data particularly well suited to its processing.
On this last point, the emergence over the last few decades of open-source solutions has made it possible to address simultaneously different issues securing data import, efficiency of data processing, replicability, etc. In particular, formats such as Parquet are integrated into software solutions accessible to all and adopted by an increasingly large community, convinced of their advantages. Nevertheless, there is no clear-cut answer for the choice of a format: choices are made after a precise analysis of the needs relative to each step of the data’s life cycle. In this way, the choice of a format is the concrete expression of a requirement driven standards specific to each of these phases.
Paru le :29/10/2024