Courrier des statistiques N3 - 2019

Issue N3 dedicates no fewer than six articles to innovation in official statistics. The arrival of scaner data will make the Consumer Price Index (CPI) methodology evolve from 2020 onwards. The Secure Data Access Centre (CASD) is also innovating in the certification of research based on confidential data. There is further innovation to develop the platform for collecting data from businesses via the internet, with an automatic generator and a questionnaire design tool, enhancing the range of services available for business surveys. Lastly, based on a shared foundation, two new European regulations on business (FRIBS) and social (IESS) statistics will have specific consequences for producers, users and cohesion between domains; this issue presents the progress this represents for INSEE, as well as for the German statistical system.

Courrier des statistiques
Paru le : Paru le 22/06/2021
Franck Cotton, Scientific Advisor, Information Systems Directorate, and Thomas Dubois, Manager of the Statistical Metadata Team, INSEE
Courrier des statistiques - June 2021
Consulter

Pogues, a Questionnaire Design Tool

Franck Cotton, Scientific Advisor, Information Systems Directorate, and Thomas Dubois, Manager of the Statistical Metadata Team, INSEE

Allowing a survey designer to build his data collection instrument, without having to worry about knowing the slightest technology, is what Pogues, the new questionnaire design tool, is all about. A questionnaire is a complex object composed of questions, filters, checks, etc. representing a set of statistical metadata for questioning. Giving the designer a hand to both document his questionnaire and produce the collection instrument was a real challenge for INSEE. This design tool has been built using an innovative approach, particularly in terms of development methods and technologies. This is the first Open Source development at INSEE offering opportunities for collaboration with other statistical offices. Finally, it is a demonstration by the example of the notion of active metadata, which was still vague and theoretical for many. With this “designer workshop”, a new page is being written in the approach of industrialising the collection process.

In 2015, INSEE embarked on the construction of a new industrialisation chain for surveys: a new online collection platform (See the article on Coltrane by Olivier Haag and Anne Husseini-Skalitz in this issue), equipped with a questionnaire generator (See the article on Eno by Heïdi Koumarianos and Éric Sigaud in this issue), which itself is based on a formal description in the DDI international standard (Data Documentation Initiative), a widely recognized metadata description language.

New surveys have gradually been integrated into this system. However, the writing of questionnaires in DDI in advance has quickly turned into a bottleneck, as the advantages of this metadata description language are paid for by a high level of verbosity, which increasingly overwhelmed the institute’s scarce expert resources: the back-and-forths between survey designers and programmers had been shifted to the two or three DDI experts, but not eliminated.

To make designers autonomous and end-to-end masters of the collection media creation process, they had to be allowed to write DDI without knowing it. In addition, they needed to be able to quickly observe the results of their amendments, without having to resort to an expert. The idea was born: A questionnaire design tool (Iverson and Smith, 2013), within a “designer’s workshop”, making it possible to create the questionnaire templates and view them in one click, while directly feeding into the RMéS statistical metadata repository (Bonnans, 2019).

A Collective and Innovative Initiative for INSEE...

The idea of creating the questionnaire design tool, named “Pogues”, initially sought to reduce the workload of modelling questionnaires in DDI. It was soon to be enhanced by other issues then present in the area of methodology and IT: the need to demonstrate by example this notion of active metadata (Iverson, 2010), which remained vague and theoretical for many, the desire to test new technical tools for the creation of web applications, or the desire to experiment with collaborative, agile and lightweight development methods. This combination of interests made it possible to quickly mobilise the resources and the skills needed to organise the launch of the operation. It goes without saying that the procedures for planning, allocation and management of resources were a little disturbed in the process.

The initial development of Pogues was mainly carried out in April 2015, during a “hackathon” lasting one week, at the EuraTechnologies centre of expertise and innovation in Lille. The hackathon brought together eight computer scientists and two professionals (metadata and collection). It was preceded by a few specification sessions using agile methods and followed by another shorter hackathon with a smaller group.

A guardian angel was apparently watching over Pogues, as the first hackathon proved to be perfect in every respect: opportunity in terms of timing, the blend of skills, how it progressed, the location and even the weather... The results exceeded expectations in respect of most of the objectives:

  • a web application that is sufficiently operational to be capable of being shown to future users in order to learn their opinions and their desires for its development;
  • a well-specified overall application architecture, particularly in terms of the links between the various components (Pogues interface, back-office services, Eno generator, RMéS repository, etc.);
  • consequently, a well-defined process and allocation of roles for IT developments (greater formalisation and more systematic support on standards);
  • first and foremost, the success of the main objective of generating a web form in one click, thanks to the use by Pogues of the Eno generator, adapted for the occasion as a web service.

In terms of innovation, Pogues followed in the footsteps of Eno with three innovations for INSEE:

  • Pogues initiated what would become the development channel for INSEE’s client web applications (JavaScript pathway);
  • the project was developed as Open Source software on the online software development and version management platform GitHub: the organisation InseeFr was created for this purpose and now hosts the code of around twenty applications or models;
  • its development was internationalised from the beginning, with code, comments and documentation in English.

...to be Reintegrated into an Organisation

The application continued to be enhanced in the months following the hackathon, first on the basis of voluntary contributions, then through the provision of light services on various aspects: additional developments, documentation and ergonomics. In parallel, the “organisational” positioning of Pogues was taking place: it was necessary to specify to whom this collective object that suddenly sprung up really belonged. The option chosen was to consider Pogues as essentially a metadata management application (Bakkomen, Orten and Prestage, 2014; Butt, Norland and Orten, 2018), in this case of descriptions of questionnaires, and that it was therefore a “satellite” of the RMéS metadata repository. This position was to prove very useful in giving a less theoretical meaning to the statistical metadata repository: Pogues, a tool intended for designers, de facto became the means of demonstrating the contribution of the RMéS repository to their profession.

The Quality Unit, within the INSEE Methodology, Statistical Coordination and International Relations Directorate (DMCSI), owner of the RMéS repository, has therefore taken charge of the future of the Pogues tool. In particular, in October 2016, it organised a working group in which survey designers were able to test Pogues in depth and express their wishes for its development, based on specific uses. This meeting made it possible to formalise a roadmap for the future development of Pogues (and Eno). It provided these tools with good visibility among managers of business and household surveys.

The Quality Unit also handled contacts with the DDI Alliance, the organisation responsible for the development of the DDI standard. The requirements of Pogues and Eno, which emerged as increasingly complex questionnaires were taken into account, needed as a matter of fact additional details or additions to DDI. It would have been easy to integrate these modifications into a version of the standard specific to INSEE. However, in accordance with the principles of the RMéS, and even though it would require more time and energy, the decision was made to contribute to the enhancement of the DDI standard itself, so that the entire community could benefit from the new functions.

A service provision started up in spring 2017, to redesign and develop the functional coverage of Pogues: integration of questions in the form of tables, increase in the number of filter question cases, etc. Internally, major architecture consolidation work led to the “official” release of an initial version of the application in April 2018. Lastly, documentation and communication resources (user and developer guides, demonstration video, logo creation, presentations at various national and international meetings, etc.) have made it possible to give the Pogues tool a good level of visibility both within INSEE and beyond.

The Ambition of the Functional Coverage: To Support All Cases

Pogues enables the designer to structure his/her questionnaire in sequences (modules and sub-modules) containing questions and text elements, as well as to define the respondent’s questionnaire path logic and the overall constraints for completing the questionnaire (box 1).

Different kinds of questions are supported: simple single entry field (text, numeric, boolean or date), single or multiple choice, table or values can be restricted using predefined boundaries or code lists (For more details on the features of the Pogues tool, see the user guide available on the GitHub platform).

Corresponding to each question, there are one or more so-called “collected” variables (meaning that their value will be provided by the respondent), but calculated or external variables can also be defined. The former are obtained by generally simple formulas based on the variables collected and are used, in particular, to specify consistency checks. External variables are not collected in the questionnaire but are useful for its customisation: they may be values obtained previously and recalled to facilitate completion, check evolutions or set certain elements of the questionnaire (collection wave, geographical zoning, last known population, etc.).

In addition to sequences and questions, Pogues allows users to specify different text elements: comments, instructions or help for completion, warning messages, etc. Depending on the type of collection medium that will be produced, these elements will appear in various forms, such as tooltips on web forms. Lastly, it is possible to describe some logical components for interactive modes (self administered web questionnaire or face-to-face survey):

  • conditional expressions: typically, the way a question is expressed may depend on the values of the responses already obtained;
  • checks allowing the data collected to be validated: these checks may cover one or more issues (for example: checking the consistency between the total turnover reported and its breakdown by activity);
  • filters: depending on the data already collected, the respondent may be redirected to different questions or certain sections of the questionnaire may be disabled.

These logical expressions are written in simple ad hoc language. Research is ongoing to study the use of a more standard language, such as VTL (Validation and Transformation Language, published by the SDMX initiative).

Pogues proposes a practical and intuitive graphical interface to describe the questionnaire and its components (Box 1). The questions and sequences can be moved using the mouse or copied, and a simplified representation of the questionnaire allows users to move around quickly. At any time, users may view their questionnaire in the form of their choosing (web or pdf), export it in DDI or produce the corresponding detailed specification document.

Box 1. The Ergonomics of Pogues, a Questionnaire Design Tool

 

 

 

Example from the survey on information and communication technologies in businesses (ICT 2019).

The Design of a Multi-Mode Questionnaire

The questioning is not identical across each mode of collection, whether in terms of substance or form. For example, in terms of substance, an instruction may differ depending on whether it is addressed directly to a respondent or to a surveyor (Box 2). In terms of form, the dynamic functions offered by the web (online checks, dynamic filters, drop-down menus, etc.) cannot be copied identically when using the paper format. These two aspects must therefore be taken into account in the establishment of the new pathway, in which the designer must be able to be autonomous in terms of the design and construction of the questionnaires for a multi-mode survey.

The Pogues tool must give designers the option of differentiating between elements from a single mode while, to the extent possible, sharing what can be shared (questions common to all modes are described only once).

In terms of form, it is a matter of applying deterministic principles, namely ensuring that the collection method and format of the questionnaire are sufficient to determine the presentation of a component of the questionnaire. Thus, some differences are interpreted directly by the questionnaire generator: disappearance of dynamic behaviours, filters replaced by a simple text explaining the redirection, or presenting a drop-down list from the web in a simple input field.

Box 2. Adaptation of the Instructions to the Different Modes

In a face-to-face survey, certain instructions are intended to be read by the interviewer for the respondent, while others are intended for the interviewer only. This may concern clarifying complicated concepts, for example. To switch to a self-administered questionnaire, the question must be reformulated or even restructured. These formal differences in the questionnaire models are also happen in the management of non-response or imprecise responses The example below shows instructions given to the interviewer for the “date of birth” variable. In self-administered surveys, the plan would not be to reproduce them identically, it would be preferable to place checks to reinterview in case of partial non-response, concerning the year, for example.

 

 

 

Pogues, Statistical Metadata Editor

A questionnaire and its components (instructions, question formulations, response formats, etc.) represent a set of information that characterises the statistical data collected. Thus, the questionnaire is a set of statistical metadata intended among others to be included in the devoted repository. Pogues is both a questionnaire design tool and a management tool for the metadata describing the questionnaires.

Pogues only makes sense as a metadata editor if the questionnaire components (sequences, questions and code list) can be easily reused. It is important to avoid returning to the situation that prevailed before the RMéS: the metadata was rewritten from one operation to the next, from one year to the next, or reused but adapted to the context of each survey. To that end, communication between Pogues and the statistical metadata repository must allow the search and retrieval of unit components (Greenough, Mechanda and Rizzolo, 2014), as well as their filing in the repository when the design of a stable version is completed.

To date, the development of the interface between Pogues and the RMéS has encountered technical difficulties and choices favouring other priorities; therefore, it remains functionally limited. It is now becoming urgent to complete the connection of Pogues to the RMéS repository, which should be carried out in the coming months.

A New Process for Integrating the Questionnaire into the Collection Tools Ecosystem

The arrival of an initial version of Pogues in production in April 2018 has changed the process for specifying, conducting and validating questionnaires (from business surveys so far).

From now on, it is necessary to organise a meeting between the survey designer and those responsible for the new “collection ecosystem” (Figure 1): the administrator of the Eno tool, who is also the questioning methodologist, the administrator of the Pogues tool and the administrator of the Coltrane collection platform.

 

Figure 1. Under the Bonnet of Pogues

 

 

During this preparatory meeting, four important points are discussed:

  • best practices in the design of questionnaires: integration of a survey into a new pathway provides an opportunity for a discussion between designers and methodologists in order to re-examine the choices made in terms of questionnaire content. The aim is to produce a questionnaire that complies with the rules of design best practices (For further details, see the references of the article by Heïdi Koumarianos and Éric Sigaud in this issue), most of which are already included in Pogues and Eno. This may lead to significant change to the pre-existing questionnaire;
  • the standardisation and its consequences on the questionnaire: issues of form are discussed here, including those relating to the structure of the questionnaire, to the distinction between labels for questions and instructions, numbering, etc. The standardisation of the form also helps with best practices;
  • needs that are not covered by all of the tools and the time required for the roll-out of any changes;
  • the establishment of a schedule for the completion of the questionnaire.

This meeting makes it possible to formalise the improvements in terms of questionnaire design, the commitments in terms of functions offered by the collection ecosystem and also the functional limits.

With Pogues, it is about putting the designer back at the centre of the process. Indeed, the designer now enters his/her questionnaire into Pogues directly and, in a single click, views and validates the web rendering, the paper rendering or even the “specification” rendering (in office automation format). The time required for updating a questionnaire is greatly reduced due to the elimination of the back-and-forths between the designer and the questionnaire developer or between the designer and the DDI expert (when the Eno questionnaire generator existed, but Pogues did not yet exist). It should be noted that, at this stage, validation is carried out without customisation of the questionnaire. This stage will be carried out within the collection platform.

At the time of publication of this article, the functional scope of the various tools is still incomplete. This means some manual adjustments will still be required. However, the more Pogues and Eno increase their functional scope, the more autonomous the designer will become and the shorter the time required to update a questionnaire will become.

Nonetheless, the issue of third-party intervention for the validation of complex questionnaires remains: household survey questionnaires, which have not yet switched to the new ecosystem, are currently being updated with the help of an intermediary between the designer and the computer scientist; the question of what role that intermediary could play in the future is not yet decided.

Beyond Collection?

As indicated, Pogues enables the survey manager to design his/her questionnaire, including the checks that will be performed during filling. This corresponds to the “Design variable descriptions” and “Design collection” sub-processes in the representation of the Generic Statistical Business Process Model (GSBPM: the Generic Statistical Business Process Model is the reference model in the global official statistics community for the representation and high-level analysis of the statistical processes).

The formal questionnaire structure that is thus specified, with the associated questions and variables, can be reused further downstream in the statistical production process and, in particular, during the checking stage for the collected data. This was noted during the preliminary studies for the project to develop a generic survey processing station. It was therefore agreed that the structure of the questionnaires would be sent by the RMéS during the development stage for the checks, to be reused and thus automatically generate the screens for data editing.

It is possible to go further and envisage post-collection data editing controls being specified in Pogues, where all variable definitions are already available for this purpose. The scope of the Pogues tool would then be extended to the “Design processing and analysis” sub-process. And why stop there? Pogues could also make it possible to specify the collection process (“Design production systems and workflow”): modes, collection schedule, etc., all metadata that can activate future automated services (triggering of  reminders, etc.). Pogues would then become a sort of survey designer’s workbench, which is used not only to design the survey, but more broadly to design the survey process.

Opening Up Beyond INSEE

Pogues is now included at INSEE in the service offer of the RMéS repository. Due to the scope of the latter, Pogues will eventually be opened up to Ministerial Statistical Departments (MSDs). At the same time, the objective of the Coltrane collection platform is to collect as many Official Statistical System (OSS) surveys as possible so that businesses that respond to multiple surveys have only one infrastructure to use. In late 2019, Coltrane opened up its applications (such as contact management) to OSS producers who have adopted its services. Therefore, it is natural that this offering should be supplemented by opening up the Pogues and Eno tools to the MSDs that need them. The same will apply to the new online collection platform for household surveys.

In a few years, the small original Eno graphic, resulting from the Coltrane project, has, together with Pogues, launched a comprehensive and coherent information system which has become the INSEE standard for businesses surveys. This transformation, to which many have contributed, has been achieved through a combination of consistency in adherence to the strategic vision and flexibility in tactical adaptation to changing circumstances. It has been fuelled by a permanent search for innovation, at technical, methodological and organisational levels. It has sometimes been necessary to take some risks, to push the organisation a little or convince some resistance, sometimes without success, and to deal with resources that were often very restricted; however, the result can be considered to be in line with the initial ambitions. This success has largely given credibility to the approach of managing information systems through metadata, which it now seems more natural to extend to other areas such as dissemination.

Eno and Pogues have also raised important issues, which have not been fully resolved to date: how to organise the project management of cross-functional projects and the resulting tools, how to better manage technical production innovation, etc.

But the story is just beginning. Eno and Pogues are now entering a new phase of development, in the context of the platform for the online collection of household surveys. There are functional enhancements already underway and architectural improvements are planned. It is also hoped that the promotional efforts made towards national statistical institutes of other countries will eventually pay off and that the expressions of interest will be converted into reuses or effective contributions. This will probably involve professionalising the internal organisation to better document the tool and improve user support.

Above all, we must hope that the future of Eno and Pogues will continue to be built on this desire to innovate that Brian Eno, in a way, described when asked about his creative process: “The [...] thing is to set up a situation that presents you with something slightly beyond your reach”.

Paru le : 22/06/2021

Almost 15,000 lines for 11 paper pages of the questionnaire on ICT in companies.

Particularly for modelling questions in the form of complex tables.

The GSBPM differentiates between the “Design” and “Construction” phases, but in the case of a statistical process driven by metadata, the construction (in this case the collection medium) is, in a way, a fatal design product. See below.

For example: no underlining in a web questionnaire except where it corresponds to a hyperlink.

The CPOS (Chef de projet en organisation statistique), standing for Project Manager in Statistical Organisation.

Planned for thematic business surveys, the Generic project aims to develop a data collection processing station, the inputs of which are the questionnaires and the checks of which constitute new metadata.

Pour en savoir plus

BAKKMOEN, Håvard Venge, ORTEN, Hilde et PRESTAGE, Yvette, 2014. The DASISH Questionnaire Design Documentation Tool: Keeping track of the Questionnaire Design Process. In : EDDI14 – 6th Annual European DDI User Conference. [en ligne]. 2-3 décembre 2014. Institute of Education, University of London. [Consulté le 14 octobre 2019]

BONNANS, Dominique, 2019. RMéS: INSEE’s Statistical Metadata Repository. In : Courrier des statistiques. [en ligne]. 27 juin 2019. N°N2, pp. 46-55. [Consulté le 14 octobre 2019]

BUTT, Sarah, NORLAND, Stig et ORTEN, Hilde, 2018. The Questionnaire Design and Documentation Tool (QDDT) a DDI based tool for assisting questionnaire design teams in their work. In : site de la plateforme Zenodo. [en ligne]. 4-5 décembre 2018. EDDI18 – 10th Annual European DDI User Conference, Deutsche Institut für Wirtschaftsforschung, Berlin. [Consulté le 14 octobre 2019]

DANNEVANG, Flemming et NIELSEN, Mogens Grosen, 2017. Towards Common Metadata Using GSIM and DDI 3.2. In : IASSIST Quarterly. [en ligne]. 24 février 2017. Vol 40, N° 2 (2017), Summer 2016, pp. 6-17. [Consulté le 14 octobre 2019]

GREENOUGH, Carmen, MECHANDA, Kaveri et RIZZOLO, Flavio, 2014. Metadata in the modernization of statistical production at Statistics Canada. In : European Conference on Quality in Official Statistics. [en ligne]. 2-5 juin 2014 Vienne. [Consulté le 14 octobre 2019]

IVERSON, Jeremy, 2010. Metadata-Driven Survey Design. In : IASSIST Quarterly. [en ligne]. 11 novembre 2010. Vol 33, N° 1-2 (2010), Spring/Summer 2009, pp. 7-9. [Consulté le 14 octobre 2019]

IVERSON, Jeremy et SMITH, Dan, 2013. Generating Blaise Surveys from the Data Documentation Initiative’s Metadata Standard using Colectica. In : EDDI13 – 5th Annual European DDI User Conference. [en ligne]. 3-4 décembre 2013. Réseau Quetelet – French Data Archives for Social Sciences. [Consulté le 14 octobre 2019]