Courrier des statistiques N3 - 2019

Issue N3 dedicates no fewer than six articles to innovation in official statistics. The arrival of scaner data will make the Consumer Price Index (CPI) methodology evolve from 2020 onwards. The Secure Data Access Centre (CASD) is also innovating in the certification of research based on confidential data. There is further innovation to develop the platform for collecting data from businesses via the internet, with an automatic generator and a questionnaire design tool, enhancing the range of services available for business surveys. Lastly, based on a shared foundation, two new European regulations on business (FRIBS) and social (IESS) statistics will have specific consequences for producers, users and cohesion between domains; this issue presents the progress this represents for INSEE, as well as for the German statistical system.

Courrier des statistiques

Paru le :Paru le22/06/2021

Imprimer

Marie Leclair, Head of Consumer Prices Division, INSEE

Courrier des statistiques- June 2021

Présentation

Consulter

Sommaire

Using Scanner Data to Calculate the Consumer Price Index

Marie Leclair, Head of Consumer Prices Division, INSEE

Scanner data are data gathered by large retailers when consumers go to pay for their goods in store. From January 2020 onwards, these enormous volumes of data will be used to calculate the consumer price index (CPI). The comprehensive coverage of the field by these data and the detailed knowledge of household consumption that they provide are major steps forward when it comes to producing price statistics: they improve, in particular, their accuracy and ultimately mean that new statistics can be produced (more detailed indexes, regional indexes, average prices and spatial price comparisons). However, they also require that a number of new solutions be found, in particular in terms of automated processing. Choosing a big data IT solution and using a barcode dictionary help enable these data to be exploited whilst preserving the concepts of the CPI, in particular the reference to a fixed basket of goods.

Sommaire

A New Source of Data, Private Data
An Opportunity to Calculate Price Statistics
More Accurate and More Detailed Statistics
Knowing the Quantities Consumed and Providing the Basis for a Survey
Better Processing of Consumer Substitutions Between Goods
Box. How Is the CPI Produced?
Responding to Recurrent Controversies Surrounding the CPI
More Effective Tracking of the Prices Actually Paid by Consumers
The Long Road to Accessing Private Data
Statistics Based on Data Produced for Other Purposes?
1.7 Billion Records: New IT Architecture…
… Which Is Necessary for the Automation of Statistical Processes
A Slightly Different Choice to That of Our European Partners
Finally, What Is the Impact of Scanner Data?

The French consumer price index (CPI) is calculated on the basis of 200,000 monthly prices carried out by price collectors at physical outlets. This gathering of field data has gradually been supplemented by other sources of data, dematerialised sources: online price collection and administrative data amounting to 190,000 additional prices each month. From January 2020 onwards, use will be made of a new source that is of a different scale altogether: scanner data.

A New Source of Data, Private Data

Scanner data are information on prices paid and goods purchased, these data being gathered by retailers when consumers go to pay for their goods in store. Involving much larger volumes than the data used hitherto to calculate the CPI (1.7 billion records received each month), these private data are a real opportunity when it comes to calculating price statistics. But these private data also raise new questions in terms, first of all, of their access, then of their reliability, and finally of INSEE's ability to use them for statistical purposes and from an IT point of view.

Scanner data, which are referred to more generically as transaction data, are available for many shops once transactions are recorded. Although centralisation helps with provision, the statistical processing of these files is not made easier for all consumer goods: at this stage, INSEE "only" uses data from large retailers (supermarkets and hypermarkets) in metropolitan France in respect of processed food products, cleaning products and hygiene and beauty products.

An Opportunity to Calculate Price Statistics

Scanner data have been used by panelists for market study purposes for many years. Some national statistical institutions recognised the benefit of using such sources to calculate their CPI quite early: the Netherlands has been using them since 2002, followed by Norway in 2005, Switzerland (2008), Sweden (2012), Belgium (2015), Denmark (2016), Iceland (2016), Luxembourg and Italy (2018). Eurostat has contributed to their use being extended via grants, workshops and a manual (Ouvrir dans un nouvel ongletEurostat, 2017). Use of scanner data is also a subject often discussed by the group of experts on consumer price index at UNECE and ILO, which bring together academics and statisticians from all over the world (Ouvrir dans un nouvel ongletUNECE, 2018).

There are various reasons for the interest shown: the wish to have statistical institutions enter the big data era, the use of private data produced "free of charge" (even though using them may be expensive), the comprehensiveness of the data, but also the provision of new information on goods consumed that has not been available hitherto and that opens up numerous opportunities for price statistics, as will be seen below.

In the case of France, scanner data are comprehensive data on their field; they are continuously collected and centralised on a daily basis. As a result, for each product barcode (Figure 1), the quantities of the product sold and the associated price or turnover per point of sale and day of sale are recorded (Figure 2).

Figure 1. The Structure of a Barcode

More Accurate and More Detailed Statistics

According to (Tassi, 2018), big data are characterised by the quantity of information (which can be anything up to comprehensive information on a given field) and the frequency of acquisition of this information. In the case of price statistics, the availability of data continuously on a daily basis is important when it comes to producing a monthly index like the CPI, more so given the considerable time constraints. However, there are no plans to produce an infra-monthly CPI.

Figure 2. A Sample of Scanner Data

The comprehensiveness of the source, however, means that more accurate statistics can be produced and that it may be possible to produce more detailed indexes, for example on specific consumption segments. The CPI is already produced monthly in respect of over 250 subclasses and annually in respect of over 360 sub-subclasses. There is public demand for ever more detailed information, but this is not necessarily the main contribution made by scanner data.

It is the geographical dimension of the comprehensiveness of scanner data which is particularly useful. Until now, prices have been surveyed by the CPI in a sample of urban units of more than 2,000 inhabitants that are representative of the nation as a whole. However, INSEE has encountered practical difficulties in the smallest urban units because, owing to the less dense commercial fabric, price collectors have to cover many more miles to survey the prices.

Besides the reliability of collecting data in more rural areas, comprehensiveness allows representivity at regional level and ultimately the production of regional price indexes (within the scope of scanner data): until now, only an index for metropolitan France and an index for each overseas territory have been published. In addition, representivity per individual territory and detailed information on the products tracked ultimately makes it possible to contemplate spatial comparisons in terms of price: this exercise is currently only carried out by INSEE every 5-6 years and, in metropolitan France, only between the Paris region, Corsica and the rest of France (Clé et alii, 2016). Experiments (Léonard et alii, 2019) have shown that these scanner data can be used advantageously for spatial price comparisons. Besides, they are already being used by certain countries for European price level comparisons (purchasing power parities).

Knowing the Quantities Consumed and Providing the Basis for a Survey

If price statisticians are very interested in scanner data, it is also because they give them access to information of which they hitherto had only very rough, outdated and aggregated knowledge: the types of goods consumed by households and the quantities consumed by them.

For example, the national accounts currently provide information on the weight of breakfast cereals consumed within French territory. However, they give no details on the variety of cereals or, even less, on brands or outlets.

It is thus not possible to rely on this as the basis for a survey in order to draw a random sample of goods for which the price can be tracked. Owing to the lack of information, the CPI basket is currently de facto defined by a quota method: it is true that the urban units where the price collector goes are drawn randomly as a function of the population living there and the consumption habits attributed to them (Jaluzot and Sillard, 2016). However, the precise choice of outlets and the products tracked is made by the price collector as a function of a few constraints or quotas (form of sales, varieties of products tracked). In the case of breakfast cereals, a price collector will be asked to go to a given urban unit, drawn at random, and find there, for example, 4 packets of muesli type cereal, one in a hypermarket, two in a supermarket and one in a convenience store. It is the price collector who chooses the points of sale and the box of cereal they pick therein.

Scanner data finally provides the basis for a random survey: all of the articles sold by points of sale, with the weighting of each of these articles in the turnover of the points of sale. The existence of the basis for a survey enables, on the one hand, the use of a random drawing of products and, on the other, the control of any sampling bias. It also makes it possible to spot quickly any new goods that should be added to the CPI basket or ailing goods which need to be removed from it so that the basket is always up to date and representative of household consumption.

Ultimately, at INSEE, given the information and automation possibilities (see below), it is the comprehensiveness of scanner data which has been adopted, avoiding the use of sampling.

Better Processing of Consumer Substitutions Between Goods

Knowledge of the precise quantities of each article sold also helps with indexes in practical terms: to calculate a synthetic price index, surveys are aggregated using a number of formulas (see box). The theory of indexes defines the properties of these formulas and the indexes which should be adopted from a theoretical point of view (Sillard, 2017; IMF, 2004). In practice, a lack of knowledge of the current quantities consumed by consumers at a detailed level restricts choice. For the CPI, use is made:

at the most aggregated level (for example to aggregate boxes of cereals with pastas), of a Laspeyres type index with a weighting using past consumer expenditure;
at the most refined level (for example to aggregate the different boxes of cereal between themselves), in the absence of any information on the quantities consumed, even in the past, of a Dutot or Jevons index, implying equal weighting of the prices collected.

Box. How Is the CPI Produced?

The CPI measures movements in the price of goods and services consumed by households. Prices of a fixed basket of products are tracked on a monthly basis in order to measure pure price movements at constant quality. The index is a Laspeyres* type index, with the different consumption segments being weighted according to their previous weighting in household consumption. Weightings are no longer known at a level more detailed than consumption segments, and assumptions are made in individual price aggregation. The CPI uses the Dutot** and Jevons*** formulae.

To ensure that the index remains representative of household consumption, the weightings and the basket of tracked products are updated every year; the CPI is an yearly chain-linked index. Where a product is discontinued during the year, it is replaced by a similar product and a quality adjustment is made to address the difference in quality between the replaced and replacement products.

The CPI is a monthly index; the provisional index is published on the final business day of the month, with the final index released fifteen days after the end of the month. This final index is not subsequently revised. These short time frames and the lack of revision place tight constraints on the CPI compilation process.

Besides transaction data, the CPI uses two types of source: price surveys carried out by INSEE price collectors in the field (prior to the use of scanner data, some 200,000 prices were collected each month in urban units which were representative of France as a whole) in various forms of sales (including internet sales); data collected in a centralised way, either because the price of these products was the same throughout the territory (telecommunication services, electricity, tobacco, etc.), or because databases can be used to calculate price changes (data from the National Health Insurance Fund for health services, for example).

______________________________________________________________________________________________

* The Laspeyres index is a fixed basket index comparing the average price for the current period with the average price for the reference period, weighting current prices and those of the reference period by quantities consumed during the course of the reference period.

** A Dutot index is a fixed basket index comparing the arithmetic average price for the current period with the arithmetic average price for the reference period. All of the prices in the basket are weighted equally.

*** A Jevons index is a fixed basket index comparing the geometric average price for the current period with the geometric average price for the reference period. All of the prices in the basket are weighted equally.

Through the detailed information they provide, scanner data make it possible to choose index formulas which, at the most refined level, take better account of substitutions made by consumers between two products (Leclair et alii, 2019): when the price of a product goes up, the impact on the consumer utility may be more or less strong depending on whether they can or cannot switch their consumption to another product which is more or less interchangeable with the first. The ability to process these substitutions was the subject of discussion, during the nineties, over the possibility of a bias, downwards, of price indexes which took insufficient account of them (Ouvrir dans un nouvel ongletBoskin, 1996). At the time, the use of scanner data and detailed information on quantities sold, and hence of adjustments made by consumers to their baskets, was already being presented as a promising solution (Lequiller, 1997).

Responding to Recurrent Controversies Surrounding the CPI

Scanner data also provide some answers in the debate over a possible underestimation of inflation by the CPI: after switching to the euro, the difference between the inflation felt by households and the inflation measured by INSEE grew (Leclair and Passeron, 2017). One explanation for this difference is that households tend not to take account of certain improvements in the quality of the products they consume, whereas the CPI offsets them: an improvement in the quality of a product at a price that is unchanged is translated in the CPI as a drop in price. However, although the social norm has changed towards products that are of better quality, and consequently more expensive, it is possible that consumers will feel tied to this higher quality expenditure and will regard this shift in social norm as an increase in the cost of living.

One possible answer was to structure statistics to be closer to experience, whilst continuing to produce the CPI which is relevant when it comes to measuring, in particular, the increase in GDP in terms of volume and household purchasing power. The idea is to calculate the overall average product prices that take account of changes in consumption habits and do not offset these changes in quality (Ouvrir dans un nouvel ongletMoati and Rochefort, 2008). For example, the increase in consumption of basmati or Thai rice, which is generally more expensive than ordinary rice, may be considered to be a change in the quality of the product and will not be translated into an increase in the price index for rice in the CPI. In contrast, the average price of rice (all categories combined) will increase owing to the growing proportion of these fragrant rices in purchases.

Scanner data mean that such average prices can be calculated, giving precise information on quantities consumed as well as prices paid.

More Effective Tracking of the Prices Actually Paid by Consumers

Perhaps rather surprisingly, another advantage of scanner data is that they make it possible to track more effectively the price concept which one wishes to measure with the CPI, compared to a conventional specific survey.

This is because price collectors sent by INSEE into the field can only collect the prices displayed at points of sale. However, these may differ from the prices actually paid by consumers: these differences are partly explained by display errors, but particularly by the use of certain special offers. Currently, the CPI only measures special offers when they apply to all consumers. This method, which is in line with European regulations on consumer price indexes, is the result of a lack of information on the number of purchasers actually benefiting from "discriminatory" offers, for example those linked to being the holder of a store card.

In scanner data, it is the prices actually paid which are generally recorded. They thus include numerous special offers even though certain commercial practices still escape them, such as "reward point schemes" which involve awarding points for future consumption in return for the purchase of a specific product.

The Long Road to Accessing Private Data

Although the contribution made by scanner data in the production of price statistics is undeniable, their use by INSEE has still presented a number of difficulties.

The first, in chronological order, is quite simply being able to access them: scanner data are intangible assets of the companies producing them and the latter are therefore under no obligation to grant INSEE access to them, even for public interest purposes to produce public statistics. First of all, contact was made with certain retailers in an attempt to persuade them to pass their data on to INSEE. Since 2012, four retailers (about 40% of the mass distribution market) have thus been supplying data on an experimental basis and within the framework of agreements (Figure 3). In order, on the one hand, to obtain data on the entire supermarket and hypermarket field and, on the other, to maintain provision, the 1951 law on legal obligation, coordination and confidentiality in the field of statistics was amended. The law now provides for the possibility of making the transmission of certain private data mandatory after consultation with the parties and only to replace mandatory statistical surveys. This amendment, in addition to providing access to scanner data, may ultimately make it easier to access other private data.

Figure 3. Chronology of the «Scanner Data» Project

After consultation with the mass distribution retailers in June 2016, a prior study of the feasibility and advisability of using scanner data for the CPI was presented to the French National Council for Statistical Information (CNIS) at the end of 2016. Having received a positive endorsement, a decree was signed by the minister on 13 April 2017, making it mandatory for scanner data to be transmitted by retail businesses in non-specialist stores which are over 400 m² and mainly sell food. Since January 2019, all scanner data from major food retailers, apart from hard discounters, have thus been received daily by INSEE.

Statistics Based on Data Produced for Other Purposes?

A second difficulty in using scanner data is that they have not been produced for the purposes of producing statistics. This raises two questions: are the statistics produced on the basis of these data produced entirely independently and impartially? And is the information collected suitable for the statistical aim sought?

As far as the first point is concerned, the importance of the CPI in the public debate and the existence of controversies over manipulation of the index up until the seventies (Jany-Catrice, 2018) raise the question of the faith that one can have in data produced by private parties. Although it seems hard to imagine how retailers might manipulate such large volumes of data, INSEE wanted to guarantee the quality of the data used to calculate the CPI by organising control surveys. In future, every month, a certain number of prices recorded in the scanner data will be checked by price collectors at the outlets. From 2019 onwards, a double collection of prices, on the one hand by price collectors to calculate the CPI and, on the other, in the scanner data, means that any discrepancies can be ruled out.

The second question relates to the fact that these data were not initially produced for the purposes of drawing up statistics. As for administrative data (Rivière, 2018), it is possible that the data gathered, the definitions adopted and the scope do not exactly correspond to what the statistician wishes to measure. These weaknesses are all the more pronounced in relation to big data (Blanchet and Givord, 2017), which are characterised not only by their "Volume" and by the "Velocity" at which they are made available, but also by a third "V", their "Variety", underlining the often unstructured aspect of these data.

Against this background, scanner data seem to be a special case because, to begin with, the information gathered may prove to more pertinent than in the case of a survey dedicated to the collection of prices: the price concept is, for example, tracked more effectively in scanner data than by price collectors who can only collect the displayed price; information on quantities consumed is hard to measure in a survey, at least with the level of finesse necessary; the scope of the scanner data can easily be supplemented by survey data (in other forms of sales, for fresh produce sold in supermarkets and hypermarkets, for example) to cover all household consumption; in certain cases it even covers household consumption more effectively, integrating drive-through data, for example, not hitherto covered by the CPI.

Moreover, scanner data are a distinct element of big data because, in reality, they are highly structured data which only have big data's first two "Vs", volume and velocity.

Overall, whilst other forms of use of big data tend to aim to produce new statistics which complement rather than replace existing public statistics, a specific aspect of scanner data is that they can actually replace survey data without the need to modify the concepts or the methodological framework of what one wants to measure.

1.7 Billion Records: New IT Architecture…

Although INSEE has therefore chosen to regard scanner data as being capable of replacing price collection carried out by price collectors without the need to adapt the concepts used for the CPI, the volume of data to be processed calls for a certain number of solutions, firstly IT solutions but then also statistical ones, in order to automate processing which was previously carried out manually.

INSEE receives 1.7 billion records each month, corresponding to the lines in the scanner data, in other words associated sales at a point of sale for a given barcode and on a given day (Figure 1 and Figure 2). This number of records for processing each month could not have been handled using conventional databases, referred to as relational databases. Technologies tailored to big data (in this case the Hadoop system) have been adopted, enabling the data and processing to be spread across several servers in order to improve processing performance and make the system more robust if one or more servers break down.

… Which Is Necessary for the Automation of Statistical Processes

From a statistical point of view, the volume of data now precludes manual processing which could be carried out in the past, generally by price collectors, and which therefore has to be automated. Three examples can be cited: being capable of classifying a product in a detailed classification system, identifying commercial relaunches and replacing a product when it is discontinued.

In scanner data, products are identified by their barcode (Figure 1) and this provides no information on the type of product tracked. Given the number of barcodes present in scanner data (nearly 9 million), it is inconceivable to search manually for which product corresponds to each. To solve this problem, INSEE is buying a dictionary of barcodes from a panelist, describing in detail the characteristics of the product associated with each barcode. Classifying products then amounts to constructing a simple table comparing this dictionary with the COICOP classification system.
This dictionary of barcodes also enables the issue of "commercial relaunches" to be processed correctly. These relaunches involve a slight modification of the packaging of the product, often accompanied by an increase in the price, or a stable price but with the product sold being smaller in volume. These commercial relaunches generally conceal a price rise, so it is essential to be able to spot them. When data are collected in the field, it is the price collector who identifies the relaunch; with scanner data, barcodes generally change with the packaging and one has to be able, automatically, to connect the initial product to its relaunch. In Sweden, (Tongur, 2019) estimates the bias which could result if a relaunch were not identified in scanner data at 0.1 point. In the case of France, the existence of a dictionary of barcodes enables the necessary link to be made between a product and its commercial relaunch and to record the associated price rise.
Replacing goods that are in the CPI basket and that are discontinued during the course of the year is also an operation which heavily involves price collectors. They choose the replacement product to be as close as possible to the product that has been discontinued and decide whether or not it is necessary to make a quality adjustment. This strategic operation to produce the CPI can also be automated (Léonard et alii, 2017). The replacement product is drawn at random from the same consumption segment and a quality adjustment is always made by comparing the price of the replaced product with that of the replacement product over the same period: the difference in quality is estimated to be equal to the difference in price. This method of adjusting the quality is commonly used in the CPI, but the prices compared are most of the time imputed because, when use is made of price surveys, it is not generally possible to compare the price of the discontinued product and that of the replacement product during the same period. By definition, it will not have been anticipated that the product was going to be discontinued and one will not have collected the price of the replacement product before even knowing that the replaced product was going to be discontinued. Since scanner data are comprehensive, it is possible to search for the previous price of a product a posteriori.

A Slightly Different Choice to That of Our European Partners

The purchase of a dictionary of barcodes and the use of big data technologies thus make it possible to maintain the concepts of the current CPI (in particular the idea of an annually fixed basket) whilst exploiting the comprehensiveness of scanner data. This situation is quite unusual in the world of statistical institutions using scanner data.

Historically, and it is still the solution used by a number of countries such as Sweden and Italy, countries which have been using scanner data to calculate the CPI have drawn a sample of products so as to be able to carry out the three elements of processing described above manually: classification, identification of relaunches and the replacement of products. The concepts of the CPI are then strictly maintained and scanner data are used as the basis for a survey in order to spot the appearance or discontinuation of products as quickly as possible or to know precisely the weighting associated with each product.

Subsequent developments sought to benefit from all of the precision provided by the comprehensive nature of scanner data and to limit manual processing associated with replacements which very quickly becomes significant as soon as the size of the sample is increased. The previous method has thus gradually been replaced by an exploitation of scanner data, which has dispensed with the annual fixedness of the basket of goods tracked: the monthly change in prices has been measured on a balanced range of goods present over the course of two consecutive months; these monthly changes were then linked together. This overly frequent linking of indexes is known by price statisticians to create chain drifts in the index. To avoid this, it has been necessary, at the most detailed level, to dispense with weightings, even though this is a major innovation in scanner data.

A new generation of indexes has finally emerged: multilateral indexes, using the GEKS (Ouvrir dans un nouvel ongletDiewert, Fox and Ivancic, 2009) or Geary-Khamis (Ouvrir dans un nouvel ongletChessa, 2015) method, are inspired by methods used to make spatial comparisons of prices and make it possible to process the fact that baskets may differ each month. These are nevertheless less intuitive and more difficult to explain to the general public.

Finally, What Is the Impact of Scanner Data?

Scanner data will be used for the first time to calculate the CPI published in January 2020. 30,000 prices surveyed each month by price collectors will be replaced by about 77 million products contained in the scanner data. The existing collection of data will be maintained for all other item headings.

Before using this new source of data in the production of statistics, which are as important to the public debate as the CPI, it has been necessary to demonstrate its reliability, its sustainability and its contribution to the measurement of inflation. Experimentation and methodological work have made it possible to define the processing of scanner data and the integration thereof into the CPI (Leclair et alii, 2019). A specific IT application has been developed, using big data technology, to ensure the receipt of data, statistical controls and the processing to be carried out: this is because the CPI is produced under very considerable time restraints which require reliance on robust computer processing. The reliable provision of scanner data has been brought about by a decree (see above) coupled, for certain retailers, with agreements.

Finally, before scanner data are actually used in the CPI, INSEE wanted to carry out a dual process for a year: whilst the CPI published each month in 2019 relied on price surveys carried out by price collectors (and on other traditional sources used by the CPI), in parallel, a CPI was produced using scanner data.

This general repetition makes it possible to get the production process up to speed; in particular, it allows the results to be compared. So what is the impact of this new source of data? The impact on inflation overall is not important because, ultimately, scanner data represent only a small proportion of overall consumption (about 11%): many products cannot be tracked using scanner data (services or fresh produce where there are no barcodes) or are not tracked for methodological reasons (clothing and durable goods, for which the rotation of products is important and the methodology used for quality adjustments is specific) and, finally, household consumption is also carried out in forms of sale other than supermarket and hypermarket sales.

But, at a more detailed level, for items where scanner data are used more heavily, some differences can be noted. A detailed analysis of discrepancies shows that they are essentially explained by three factors:

better representivity of the products tracked: detailed knowledge of quantities in scanner data leads to consumption segments being tracked previously which were not, owing to an inability to identify their significance; however, these consumption segments have their own price dynamics which were not taken into account beforehand;
a more precise index: for the same consumption segments, there may be different changes in price owing to sampling inaccuracies;
better tracking of prices; the better integration of special offers in scanner data (see above) makes it possible to highlight price changes which cannot be highlighted using the traditional method of collection.

Scanner data are therefore a promising source for calculating price statistics. In January 2020, only price statistics currently published (the CPI, the index of prices of frequently purchased products in super and hypermarkets) will be produced. However, in the more distant future, new statistics may be published: average prices for a number of products, spatial price comparisons or regional indexes. Methodological studies will continue to try to exploit scanner data in fields for which they are not yet used, clothing or durable goods for example. Work will be carried out to access new scanner data, data from hard discounters or specialist mass distribution.

Paru le :22/06/2021

Imprimer

United Nations Economic Commission for Europe (UNECE).

International Labour Office.

Also called the GTIN for Global Trade Item Number or EAN for European Article Numbering.

The data are sent to INSEE with a two-day lag.

A provisional estimate of the CPI is published on the last business day of the month.

The consumer price index is published using the COICOP (Classification of Individual Consumption by Purpose) classification system; subclasses and sub-subclasses are the last two levels in this classification system.

Article 19 of the law of 7 October 2016 for a Digital Republic.

Decree of 13 April 2017 making the electronic transmission of data for public statistical purposes mandatory.

More specifically, only 1.3 billion are actually used for the CPI (owing to the exclusion of certain products). These data are then consolidated by month and by article of equivalent class.

To offset the fact that the replacement product may be of a slightly different quality from the product that has been discontinued.

The NSIs in other European countries have no such toolkit; they generally rely on a (rather short) description of the product which can be found on receipts, and use methods of machine learning which are able to classify barcodes in a functional classification system used to calculate the CPI.

This is because it is the weightings which produce the chain drifts in the index: as the weighting in consumption generally depends on the price level, a special offer period links a high weighting with a drop in price whereas the return to the normal price is accompanied by a low weighting in consumption. The index which is chain-linked each month and uses weightings thus does not return to its original level after a special offer period because it does not weigh price rises and price falls in the same way.

Pour en savoir plus

BLANCHET, Didier et GIVORD, Pauline, 2017. Données massives, statistique publique et mesure de l’économie. In : L’Économie française, édition 2017. [en ligne]. Insee Références, pp. 59-77. [Consulté le 7 octobre 2019].

BOSKIN, Michael J., 1996. Ouvrir dans un nouvel ongletToward a More Accurate Measure of the Cost of Living : Final Report to the Senate Finance Committee from the Advisory Commission to Study the Consumer Price Index. In : site de l’administration de la Sécurité Sociale des États-Unis. [en ligne]. [Consulté le 7 octobre 2019].

CHESSA, Antonio, 2015. Ouvrir dans un nouvel ongletTowards a generic price index method for scanner data in the Dutch CPI. In : Fourteenth Meeting of the Ottawa Group (International Working Group On Price Indices). [en ligne]. 20-22 mai 2015. Tokyo, Japon. [Consulté le 7 octobre 2019].

CLÉ, Émeline, JALUZOT, Laurence, MALAVAL, Fabien, RATEAU, Guillaume, SAUVADET, Luc, 2016. En 2015, les prix en région parisienne dépassent de 9 % ceux de la province [en ligne]. 14 avril 2016. Insee Première, n°1590. [Consulté le 7 octobre 2019].

DIEWERT, Erwin, FOX, Kevin J., IVANCIC, Lorraine, 2009. Ouvrir dans un nouvel ongletScanner Data, Time Aggregation and the Construction of Price Indexes. In : Eleventh Meeting of the Ottawa Group (International Working Group On Price Indices). [en ligne]. 27-29 mai 2009. Neuchâtel, Suisse. [Consulté le 8 octobre 2019].

EUROSTAT, 2017. Ouvrir dans un nouvel ongletPractical Guide for Processing Supermarket Scanner Data. [en ligne]. Septembre 2017. [Consulté le 8 octobre 2019].

FMI, 2004. Manuel des prix à la consommation. Théorie et pratique. OIT/FMI/OCDE/CEE – ONU/Eurostat/Banque mondiale. Genève, Organisation internationale du travail. ISBN 1-58906-330-9.

JALUZOT, Laurence et SILLARD, Patrick, 2016. Échantillonnage des agglomérations de l’IPC pour la base 2015. [en ligne]. Janvier 2016. Insee, Document de travail, N°F1601. [Consulté le 8 octobre 2019].

JANY-CATRICE, Florence, 2019. L’indice des prix à la consommation. Édition La Découverte. Collection Repères, N°717. Janvier 2019. ISBN 978-2-7071-9931-7

LECLAIR, Marie, LÉONARD, Isabelle, RATEAU, Guillaume, SILLARD, Patrick, VARLET, Gaëtan et VERNÉDAL, Pierre, 2019. Les données de caisses : avancées méthodologiques et nouveaux enjeux pour le calcul d’un indice des prix à la consommation. In : Économie et statistique. [en ligne]. 17 septembre 2019. N°509, pp. 13-31. [Consulté le 8 octobre 2019].

LECLAIR, Marie, et PASSERON, Vladimir, 2017. Une inflation modérée depuis le passage à l’euro. [en ligne]. 24 mai 2017. Insee Focus, N°87. [Consulté le 8 octobre 2019].

LÉONARD, Isabelle, SILLARD, Patrick, VARLET, Gaëtan, ZOYEM, Jean-Paul, 2017. Scanner data and quality adjustment. [en ligne]. Juin 2017. Insee, Document de travail, N°F1704. [Consulté le 8 octobre 2019].

LÉONARD, Isabelle, SILLARD, Patrick, VARLET, Gaëtan, ZOYEM, Jean-Paul, 2019. Écarts spatiaux de niveaux de prix entre régions et villes françaises avec des données de caisses. In : Économie et statistique, [en ligne]. 17 septembre 2019. N°509, pp. 73-87. [Consulté le 8 octobre 2019].

LEQUILLER, François, 1997. L’indice des prix à la consommation surestime-t-il l’inflation ? In : Économie et statistique. [en ligne]. Mars 1997. N° 303, pp. 3-32. [Consulté le 8 octobre 2019].

MOATI, Philippe et ROCHEFORT, Robert, 2008. Ouvrir dans un nouvel ongletMesurer le pouvoir d’achat. [en ligne]. Janvier 2008. Édition La Documentation française, Collection Les Rapports du Conseil d’analyse économique. [Consulté le 8 octobre 2019].

RIVIÈRE, Pascal, 2018. Utiliser les déclarations administratives à des fins statistiques. In : Courrier des statistiques. [en ligne]. 6 décembre 2018. N°N1, pp. 14-24. [Consulté le 8 octobre 2019].

SILLARD, Patrick, 2017. Indices de prix à la consommation. [en ligne]. 7 août 2017. Insee, Document de travail, N°F1706. [Consulté le 8 octobre 2019].

TASSI, Philippe, 2018. Les apports des Big Data. In : Économie et statistique. [en ligne]. N°505-506, pp. 5-15. [Consulté le 8 octobre 2019].

UNECE, 2018. Ouvrir dans un nouvel ongletReport of the Group of Experts on Consumer Price Indices. [en ligne]. 7-9 mai 2018. Fourteenth session, Genève, Suisse. 7-9 mai 2018. ECE/CES/GE.22/2018/2. [Consulté le 8 octobre 2019].

TONGUR, Can, 2019. Inflation Measurement with Scanner Data and an Ever-Changing Fixed Basket. In : Économie et statistique. [en ligne]. 17 septembre 2019. N°509, pp. 33-50. [Consulté le 8 octobre 2019].