Courrier des statistiques N3 - 2019
Eno, a Collection Instrument Generator
Eno is the story of an innovation that has become an important component of INSEE’s data collection process. Starting from an innovative concept, the activation of metadata to industrialize a statistical process, Eno has provided INSEE with a concrete application : the generation of collection instruments based on metadata. The realization of a prototype has made it possible to validate the concept, as well as the choice of the standards and the technical field. The objective of genericity has led to the implementation of a flexible and scalable architecture, which is now used to produce almost all business survey questionnaires, in both web and paper formats. By increasing the need of formalisation and standardisation, the tool has also contributed to the harmonisation of questionnaires and practices. Its transversal nature also required the adoption of a new form of governance for INSEE, in order to facilitate its extension to the social domain in particular. This extension will require the development of new functionalities and technological evolution, to enable mixed-mode data collection for households. The development methods used open the use of Eno to other statistical agencies.
- Eno, an Idea
- A Prototype to Validate the Principle of Generation
- Formally Describing the Semantic Complexity of the Questionnaires
- Box 1. DDI vs SDMX: Two Standards, Two Approaches
- Induced Harmonisation of Collection Media
- An Architecture Designed for a Modular, Evolving and Scalable Tool
- Box 2. The Architecture of the Eno Engine Facilitates the Generation of Collection Instruments in Multiple Formats
- A Tool Used for Business Surveys...
- ...Within a Metadata-Based Ecosystem
- Functional Developments Initiated for Households and Beyond
- Cross-Functional Governance to Facilitate Reuse
- Everything That Happens Will Happen Today (and Further...)
The metadata related to a survey are not limited to information that facilitates readability, but also include “active” metadata that allow the standardisation and industrialisation of statistical processes (Bonnans, 2019). Data collection constitutes an area conducive to the implementation of this approach: it is one of the most important expenditure items for a statistical institute and the processes, which are often repeated, are generally well identified and formalised. Since 2012, INSEE has completely overhauled its system for collecting data from business surveys, with the implementation of the Coltrane project (see the article on the birth of an innovation in statistical production by Jean-Marc Béguin). This context provided an opportunity for a first use of active metadata for the generation of collection instruments, using the Eno tool.
Eno, an Idea
To describe the functions of Eno, a shortcut is taken by talking about “automatically generating questionnaires”. This idea is attractive in itself: it makes it possible to reduce development costs, leads de facto to a standardisation of practices and related quality gains, and has the power and capacity to support a higher rate and frequency of surveys.
However, behind Eno, there is a more general idea. When you know how to capture metadata, whether it concerns questionnaires, management or quality of a process, dissemination products, that potentially offers much more than gains in quality and control of the statistical process: one can also imagine generating, one day, everything needed for said process, including the tools.
To begin to make this concept a reality, it was necessary to demonstrate its viability on a “simple” case: questionnaire specification. The founding principle of Eno is to automate the production of “collection instruments” (paper or web forms, collection modules for surveyors, etc.) based on a formal and abstract description of the questionnaire as standardised metadata. This idea can be found in Eno’s first specification, which took the form of the diagram in Figure 1, drawn on a white board.
Therefore, the overall vision was to automatically produce the media associated with the different collection modes based on a formal and unique description of the questionnaire model. Eno, indicated by the green arrow in Figure 1, would be at the heart of this vision. From the outset, it was also clear that effective implementation required strong and innovative implementation choices, in particular a document transformation architecture that did not really exist at INSEE at that time.
Figure 1. Eno, an Idea: a Single Model, Multiple Collection Media
Diagram taken from (Cotton et alii, 2013).
A Prototype to Validate the Principle of Generation
The Coltrane project, a platform for online collection from businesses (See the article on Coltrane by Olivier Haag and Anne Husseini-Skalitz), provided favourable circumstances, the human and technical resources and a relatively urgent specific case for the development of a prototype generator: the digitisation of the annual structural business survey (ESA or Enquête sectorielle annuelle). At that time, it was based on 200 questionnaire models which, until then, had been managed by an external service provider.
The decision to choose this survey was doubly interesting: on the one hand, its complexity was able to prove the validity of the principle, beyond its particular case; and on the other hand, the regained control of the questionnaire production process was likely to offer financial and quality gains.
More broadly, in the professional context of the time, the production of business survey questionnaires was done “on demand”, without capitalising on developments from one survey to the next and with little documentation. It was a long (around six months), costly (two to three developers involved over the period) and risky (the “artisanal” nature could generate bugs that that were difficult to identify) process.
There was therefore a certain interest in using a generator that enabled the industrialisation of questionnaire production: cost and time savings (automation), gains in quality (standardisation, traceability), regained specification control (metadata as “input” of the generator).
After the prototype for the annual structural business survey, an initial version of Eno was developed to generate the web questionnaire for a French Ministry of Labour survey, then gradually enhanced to meet all the needs of the surveys joining the Coltrane platform. These surveys came from various project managers belonging to the Official Statistical Service. The cross-functional nature of the tool was present from its design.
The generator was developed in stages, and the initial simplistic diagram, while remaining the guiding framework, has developed over the years into a rich information system, covering in particular the production of different formats for various collection modes.
Formally Describing the Semantic Complexity of the Questionnaires
Eno was designed from the outset in accordance with the principles that also underpinned the concurrent statistical metadata repository RMéS (Bonnans, 2019): the use of international standards and activation of statistical metadata.
The survey questionnaire, the central business object of the Eno generator, is a complex object (See the article on the Pogues questionnaire design tool, by Franck Cotton and Thomas Dubois). It was to model it that the use of an international standard was investigated. In addition to purely technical standards, such as XForms, only two options were genuinely capable of covering the complexity of the questionnaires encountered in public statistics: DDI (Data Documentation Initiative) and SDMX (Statistical Data and Metadata eXchange). Curiously, these two standards responded to the problem completely differently (Box 1).
Box 1. DDI vs SDMX: Two Standards, Two Approaches
DDI, which comes mainly from the world of social science research and libraries, has been developing since 1995 under the main impetus of the University of Michigan. Initially focused on documenting scientific studies and their results, DDI expanded from 2007 to model the entire data lifecycle, from collection to publication and archiving. This version, known as “DDI Lifecycle”, very precisely covers, in particular, the representation of statistical questionnaires, but at the cost of a verbosity that can sometimes make their use difficult.
Indeed, this standard, based on the archivist or researcher profile of its designers and which is intended as a documentation standard, favours exhaustiveness, precision and the possibilities of reuse and sharing of the objects described, over practicality of use and simplicity of representation.
SDMX, in turn, is a standard with a more statistical and international origin, which can be traced back to a meeting organised by the International Monetary Fund in Washington in September 2001. SDMX took over from previous work on the EDIFACT-GESMES*, of which it was essentially a translation into XML (see (Eurostat, Unece, 2002) for further information on the history of SDMX). SDMX benefits from the support of influential sponsors (International Monetary Fund, European Central Bank, World Bank, Bank for International Settlements, Eurostat, etc.); in particular, Eurostat is actively working to enforce its use within the European statistical system. SDMX is best known for the exchange and dissemination of tabular data and the associated structural metadata. It also includes a less well-known model dedicated to reference metadata. This model essentially specifies a hierarchical annotation structure and remains very generic. However, it can, at the cost of dedicated extensions (plug-ins relating to the types, codes, etc.), represent a survey questionnaire, or any other object, if it is adapted differently. In comparison with DDI, this standard, the designers of which had a more operational and technical profile, and which is intended as a standard for the exchange of information, favours the practical and generic nature of use over the precision and complexity of the objects represented (Arofan and Heus, 2007; Heus, Thomas and Vardigan, 2008).
*EDIFACT is a framework developed by Unece and standardised by the ISO for the exchange of digital information. GESMES (Generic statistical message) is an EDIFACT message for the exchange of statistical data and metadata.
Thus, with DDI and SDMX, we had a schematic choice between, on the one hand, a complex and semantically precise standard and, on the other, a simple but semantically weak standard. They are both specified by XML schemas and, therefore, can be used directly with the programming languages of the XML world, in particular the document transformation language XSLT (See XSLT website). Any questionnaire model expressed in DDI or SDMX is therefore “activable” in the sense that any questionnaire conforming to this model can be entered into a processing chain without modification. Therefore, in both cases, there are examples of metadata activation: the questionnaire model, in this case.
Consequently, by extending that logic further, the choice between DDI and SDMX can be informed very naturally. The semantically richer the metadata, the more fruitful the active use thereof, which has clearly led to a preference for DDI.
It should be noted that, at the time, similar exploratory work was being conducted by the Australian Bureau of Statistics (ABS) with the aim of developing a prototype web questionnaire generator (XForms) based on a DDI description (See the Ramona generator, the latest developments of which date back to 2013). In addition, the opportunities for collaboration with the Australian colleagues have encouraged the project team to choose the DDI standard and XForms technology.
Induced Harmonisation of Collection Media
The selection of a standard for questionnaire modelling, in this case DDI, was a prerequisite for automation. Standardisation eventually enables cost and time savings. However, it also has an impact on the content and form of the questionnaires designed within this framework.
The use of a formal and semantically precise standard, such as DDI, has led to deeper consideration of the representation of our questionnaires: the formulation of questions, the distinction between questions and instructions, etc.
Previously, the design phase was carried out using office automation tools and it allowed a linear structure for the questions. Sometimes, certain sections included multiple questions, instruction elements and response fields, with no formal distinction. The conceptualisation at play in the standard reveals these confusions.
Though the formal framework is not free of constraints, the resulting design exercise is often an opportunity for clarification. For example, standardisation has led to the formal identification of each question and to the homogeneous numbering of all questions, sometimes creating a break with the “historic” numbering of the questionnaire. Indeed, the latter is often reused in survey management applications and the amendment thereof has required application developments, or even changes in the habits of survey managers.
Standardisation also makes it possible, during the design phase, to standardise the questionnaires generated: the formatting choices (font, colours, spacing and layout) are generic. This results in uniform presentation of the questionnaires for different surveys and a more recognisable visual identity for INSEE and for each Ministerial Statistical Office (MSO), which also promotes the establishment of a unique collection portal (Figure 2).
Figure 2. A Strong Similarity in the Visual Identity of the Questionnaires
The deeper conceptualisation and standardisation of form both contribute to better questionnaire quality, as they make it possible to integrate compliance with best practices in terms of questionnaire design (On this topic, see (Haraldsen, Jones, Snijkers and Willimack, 2013; Christian, Dillman and Morrison, 2010; Christian, Dillman and Smyth, 2014)). Standardisation requires a change in practices, and a break with the past, especially when previous design choices sought to optimise layout to cut the cost of paper collection (by using tables, in particular). It can also sometimes cause acceptance difficulties; however, ultimately, it releases designers from the need to make layout choices and, thus, frees up potential time for them to perform design tasks with higher added value.
An Architecture Designed for a Modular, Evolving and Scalable Tool
Although the original and principle objective of Eno was the automatic production of web questionnaires, it should be recalled that the use of active metadata was the guideline from the outset. In particular, development has always focussed on the objectives of maximum genericity, re-usability and scalability. Thus:
- Eno makes it possible to produce different questionnaire models (for web or paper collection) based on the standard for modelling the life cycle of DDI data and, therefore, independent of the “household” or “business” sphere (genericity).
- Eno makes it possible to add new output and input formats, not initially identified, such as an Open Document questionnaire specification format, a format for collection by the surveyor or for post-collection data editing purposes (re-usability and scalability).
Box 2. The Architecture of the Eno Engine Facilitates the Generation of Collection Instruments in Multiple Formats
More than a simple web questionnaire generator, Eno is a tool that makes it possible to produce one or more output formats, generically referred to as an “Out”, based on an input format, referred to as an “In”. This is achieved by using a declarative configuration, together with a technique known as the separation of concerns between the input and output formats, which consists in linking elements from the input format with those of the output format in a spreadsheet file, rather than in the form of code or technical configuration files reserved for specialists only. It is a design choice, integrated from the beginning into the very heart of the Eno engine, which guarantees better readability and maintainability of the tool’s most complex functions.
Taking this logic to its natural conclusion, it makes it possible to have expert “In” format developers and expert “Out” format developers (Development of the generator is planned, so as to enable complete separation of concerns between the input and output formats).
A Tool Used for Business Surveys...
It is no coincidence that Eno was created at the Business Statistics Directorate. Business statistics provided indeed a favourable environment for the creation and subsequent development of a collection media generator. The long experience in self-administered surveys and the central nature of the business repertoire in the information system have led, over the years, to the creation of more structured statistical production processes and less complex questionnaires than in the social sphere.
It is against this backdrop that the generator was developed. It is officially in version 1 currently. It has already undergone a number of functional developments that have enabled it, in particular, to generate the vast majority of web questionnaires for Coltrane business surveys, together with their PDF equivalent.
Eno also makes it possible to produce specifications in an office format for questionnaires described in DDI (Box 2). While these specifications do not yet concern the entirety of the elements comprising the questionnaires, the current coverage already provides initial specifications for the design (review of the specification) and development (for third-party technical pathways, such as the Blaise pathway) stages.
...Within a Metadata-Based Ecosystem
The Coltrane platform provides the services necessary for web and paper collection, some of which are closely integrated into Eno (Figure 3): the collection platform allows the designer to view the web collection form, which have been produced through Eno in the form of technical descriptions; Coltrane then makes it possible to customise the different questionnaire models for the respondents.
The same applies in respect of paper questionnaires: they are produced by a mail generation module (operated by a group dedicated to publishing activities). Just like the web collection platform, this module creates all of the paper collection form from the questionnaire model created by Eno and the customisation file.
Functional Developments Initiated for Households and Beyond
The development of the multi-mode format for household surveys requires that the developments necessary for the design of collection media be shared as much as possible, insofar as the number of modes used is greater than for businesses. This requires changes to the collection protocols and questionnaire, as well as to the tools used for collection.
The use of Eno for the web questionnaires of household surveys will result in the expansion of its functions, in particular to generate the loops of questions frequently used in this type of questionnaire (for example, listing the individuals of a household).
However, these changes are minor compared to those required by the integration of surveys conducted by a surveyor (face-to-face or telephone), which requires a change in architecture and, therefore, output format. (Figure 3). Indeed, it is a question, in particular, of producing questionnaires embedded on surveyor workstations, which must be able to operate offline, whereas Eno has thus far been generating questionnaires for a classic web platform. This change in technical paradigm necessarily entails a change in technologies and, therefore, constitutes an opportunity to propose an ambitious development: the questionnaire is seen as a set of modular graphical components that can be reused and shared. These components, hosted in a dedicated application will form the questionnaire shown on the surveyor’s workstation. This approach provides the possibility of sharing components with other tools or even using them to display something other than a collection questionnaire, strictly speaking (for example, questionnaires in a form adapted to transfer activity by a manager).
Figure 3. Eno in the Creation of Collection Media
The corresponding developments are based on a library (The Lunatic library is published using Open Source on the GitHub site) of web components (for example, the phrasing of a question, a drop-down list, an entry field for dates, etc.), which are state of the art in the field of web development, and which can be used in other applications. Such reuse is already underway in two projects:
- the redesign of the price index, which will implement the sharing of components with another application;
- the development of a generic survey processing station: the components will be used to display questionnaire control screens.
Thus, as the latter case demonstrates, the description of the questionnaire models in DDI, which is already used to produce the collection media, will also be used to generate the questionnaire screens in the editing application (Figure 4).
Figure 4. Eno Tomorrow, From Design to Processing?
Reading note: today, ENO contributes to the Design and Build phases (as defined by the GSBPM) of the media. Its basic principle makes it eligible to contribute further, to the process phase.
These developments demonstrate Eno’s ability to adapt to radically different architectures, whether centralised (questionnaire server) or completely distributed (embedded surveyor workstation), to various media (consumer price surveyor tablets) and to different application contexts (collection, downstream processing, etc.).
Finally, technical developments are also planned to make the Eno engine more powerful and easier to configure.
Cross-Functional Governance to Facilitate Reuse
An innovation does not always emerge in the framework best suited to its proper functioning at capacity: the initial “business” context is particular; it has its own constraints, objectives and know-how, within which the innovation is similar to the launch of a project with dedicated resources, without this formally being the case.
Eno has been developed at INSEE under the governance of the Business Statistics Directorate. The conditions were then in place to allow the development of a questionnaire generator designed primarily to meet a need for business surveys.
However, from the outset, Eno was designed to be generic, reusable and scalable and based on a standard. The issue of the question of the governance of such a cross-functional tool therefore quickly arose.
The initial conditions were no longer suitable for the development of a cross-functional tool, the scope of which could go beyond the remit of the “parent” business directorate. Therefore, the governance of Eno naturally moved from one business directorate to another directorate more focused on “advice/expertise” and therefore more cross-functional: the Methodology, Statistical Coordination and International Relations Directorate, which was already the project manager of the statistical metadata repository RMéS.
This change in governance, which was quite unprecedented in the context of the institute, can be seen as an innovation in itself. It was also a success, as it now offers new outlooks for the development of the Eno tool. Therefore, while continuing to expand the range of services of the business survey pathway (for example, the addition of the output format for paper collection), Eno gradually became the generation engine for questionnaire models for the collection of household surveys and even price collection. It is also used downstream of collection, for the representations of post-collection data transfer questionnaires (see above). However, even when entrusted to a cross-functional unit, the governance of cross-functional tools raises formidable questions whose answers, which remain empirical today, need to be consolidated: how should the development requests of different users be assessed? How should updates of these cross-functional components be managed, when they have an impact on client applications? How should maintenance be financed? As can be seen, technical innovation leads to questions being asked about our organisations.
Eno was designed to take on an international dimension from the outset, as demonstrated by the discussions with experts from the Australian Bureau of Statistics (ABS), which provided the first elements of its design, then through its successive developments.
The selection of the DDI standard provided the opportunity to present multiple versions of the generator in DDI seminars.
Eno is one of the tools chosen to be shared as part of a European project (ESSnet Sharing Common Functionalities). Thus, it has been reused by SURS, the National Statistical Institute of the Republic of Slovenia: this remains a textbook case that has not been put into production, but there are contacts with other potential users.
At the same time, work to internationalise the code has continued (translation into English of the comments and documentation elements of this code). It has also been published and shared in open source on the INSEE GitHub repository (GitHub is a platform that hosts the code of a piece of software, an application, etc. This code may be shared if it is in Open Source, and enhanced through voluntary contributions, with the authors of the source code retaining validation rights. See the InseeFr domain on GitHub and the article on Pogues by Franck Cotton and Thomas Dubois). The Eno user community will have to grow further if the institute is to really benefit from this openness. There is still a certain amount of work to be completed to facilitate the reuse of Eno, independently of a technical environment: improving the packaging, developing services to simplify its use, etc.
Everything That Happens Will Happen Today (and Further...)
(This is the authors’ final nod to Brian Eno)
Eno is now the benchmark INSEE solution for the creation of all collection instruments. As of late 2019, Eno produces almost all the business questionnaires, as well as an initial household questionnaire. The use of the generator on the Coltrane platform has contributed to further industrialisation of the collection of business surveys. This industrialisation has been accompanied by a broad unification of practices and a certain “standardisation” of tools. A particular example is the ergonomics of the collection media (and, more generally, of the documents exchanged with respondents, emails or letters for example): this has been harmonised, which has sometimes led to restrictions on the creativity of survey designers (see the article on Coltrane by Olivier Haag and Anne Husseini-Skalitz), but which has largely contributed to the professionalisation of INSEE’s approach among businesses.
With its use in multimode household surveys, Eno is now entering a new period that will make it possible to further enhance its functionality.
If there is still some dissatisfaction at this stage, it relates to the reuse of Eno outside INSEE. Despite numerous expressions of interest (various Ministerial Statistical Departments, the Banque de France and foreign national statistical institutes), and in spite of the efforts made to document the tool, to package it in forms that can be easily reused (Eno was the first INSEE development (and to date remains the only one) to be published on the global directory of Java libraries) and to present it on multiple occasions, no case of reuse has reached production. We are reaching the limits of poorly developed communication that remains highly focused on the internal re-use of our innovations and applications: INSEE does not have a professional approach to the “marketing” of IT developments, unlike what can be seen at other statistical organisations.
Paru le : 22/06/2021
In reference to Brian Eno, the English musician and producer who popularised the concept of generative music, in particular. The tools associated with the Coltrane platform are generally given musicians’ names.
Relying on DDI (see below).
Later we will see that the extension of the social sphere to multimodal surveys subsequently led to questions about the uniqueness of the model, without however totally calling it into question.
The high number of questionnaires was inherited in particular from the stacking of sectoral specificities, when INSEE merged almost all the structural business surveys into a single process. Since then, the ESAs have only relied on... 48 models.
This was the labour force activity and employments conditions survey (enquête Activité et conditions d’emploi de la main-d’œuvre – ACEMO) the project manager of which is the Statistical Department of the Ministry of Labour (DARES).
In addition to the aforementioned DARES, there were three INSEE directorates involved: the Business Statistics Directorate, the Economic Studies and National Accounts Directorate and the Demographic and Social Statistics Directorate.
Recommendation from W3C, the web standards organisation, for the representation and use of web forms. XForms is, in fact, used by Coltrane to make survey questionnaires available via the internet.
XML: Extensible Markup Language.
XSLT is a styling language (itself expressed in XML) for transforming XML documents into other XML documents. Eno is written in XSLT.
INSEE has thus launched a project called Généric to provide a tool for post-collection control operations, which uses Eno outputs.
INSEE has launched the Métallica programme (MÉTadonnées Actives, Logiciels Libres et Infrastructure pour une Collecte Assistée – Active Metadata, Free Software and Infrastructure for Assisted Collection) in order to optimise the household survey design process, especially in relation to the questionnaires.
Pour en savoir plus
AROFAN, Gregory et HEUS, Pascal, 2007. DDI and SDMX: Complementary, Not Competing, Standards. In : site de l’Open Data Foundation. [en ligne]. Juillet 2007. [Consulté le 16 octobre 2019]
BONNANS, Dominique, 2019. RMéS: INSEE’s Statistical Metadata Repository. In : Courrier des statistiques. [en ligne]. 27 juin 2019. N°N2, pp. 46-55. [Consulté le 16 octobre 2019]
CHRISTIAN, Leah M., DILLMAN, Don A. et MORRISON, Rebecca, 2010. Questionnaire Design Guidelines for Establishment Surveys. In : Journal of Official Statistics. [en ligne]. Vol. 26, N°1, pp. 43–85. [Consulté le 16 octobre 2019]
CHRISTIAN, Leah M., DILLMAN, Don A. et SMYTH, Jolene D., 2014. Internet, Phone, Mail, and Mixed-Mode surveys – The tailored design method – 4th edition. Août 2014. Wiley Online Edition. ISBN 978-1-11-845614-9
COTTON, Franck, SIGAUD, Éric, TAILHURAT, Romain et VAN DER VLIST, Éric, 2013. XForms generation: a real world example. 5 août 2013. International Symposium on Native XML User Interfaces. Balisage Series on Markup Technologies, Vol. 11. [Consulté le 16 octobre 2019]
EUROSTAT, UNECE, 2002. Common Open standards for the Exchange and Sharing of Socio-economic Data and Metadata: the SDMX Initiative. [en ligne]. 6-8 mars 2002. Work Session on Statistical Metadata, Working Paper N°11. [Consulté le 16 octobre 2019]
HARALDSEN, Gustav, JONES, Jacqui, SNIJKERS, Ger et WILLIMACK, Diane K., 2013. Designing and conducting business surveys. 26 juillet 2013. Wiley Online Edition. ISBN 9-78-047090304-9
HEUS Pascal, THOMAS, Wendy et VARDIGAN, Mary, 2008. Data Documentation Initiative: Toward a Standard for the Social Sciences. In : The International Journal of Digital Curation. [en ligne]. N°1, Vol. 3, pp. 108-113. [Consulté le 16 octobre 2019]