Courrier des statistiques N5 - 2020

Issue N5 could not ignore the specific nature of 2020: it therefore begins with an article by the Director-General of INSEE on the adaptation of the institute and its methods to the exceptional context of the health crisis. The Courrier then looks at the structuring issues of governance, through the French Official Statistics Authority, which takes stock of its ten years of existence, and the recent experience of the Official Statistics Quality Label Committee.

How to produce data useful for public decision-making? With a highly flexible cartographic representation, gridding makes it possible to better grasp the reality of territories. With an adapted communication, the indicators of added value of high schools meet the need for evaluation and internal steering, as well as the expectations of citizens and the media. With a dynamic microsimulation model on pensions, Prisme supports the legislator who wants to change the regulations.

Finally, the last article raises a simple question: what is data? Exploiting this material is the core business of the statistician, but does he really measure all its dimensions?

Courrier des statistiques
Paru le :Paru le15/09/2022
Valérie Darriau, Head of the Statistics and Urban Analysis Division, INSEE (at the time of writing)
Courrier des statistiques- September 2022
Consulter

Grid Data, Innovative Tools and Methods Used to See the Reality in the Territories

Valérie Darriau, Head of the Statistics and Urban Analysis Division, INSEE (at the time of writing)

Grid data are data disseminated on an original lattice: that is not corresponding to any known administrative or historical division, but squares, whose sides can range from 200 metres to several kilometres. In urban areas, when municipal boundaries are too imprecise to analyse demographic or socio-economic phenomena, the assembly of grid can provide valuable information.
In order to produce this type of data, INSEE has to meet several challenges: geolocate the information to attach it to grids, develop a method that guarantees the protectionof privacy and respect for statistical secrecy, and make this data available in a form that can be used by experts, but also by amateurs, who are curious to get to know their territory better and to have a quick and enlightening overview of it.

With a few examples of use for the deployment of public policies, the article illustrates the techniques that have been implemented to enable the dissemination of data from socio-fiscal sources in 2019. Like in a mosaic, a single “tile” does not make sense: it is indeed the proximity with its neighbours that will allow reality to take shape and the territory to reveal the richness and complexity of the phenomena that run through it.

In 2013, INSEE released an initial set of indicators for a new geographical grid: the cell. This release saw great success among urban planning authorities and urban analysis specialists. Expectations were high for the updating and enrichment of these data. In 2019, INSEE responded to this request and expressed its desire to facilitate access to and use of these data. The issues behind the production of such data are more complex than they appear. First and foremost, grid data require a methodology that guarantees confidentiality and data quality. INSEE’s Department of Statistical Methods has undertaken some innovative work to meet this requirement (Branchu, Costemalle and Fontaine, 2018; Costemalle, 2018). Listening to users has also played an important role in making the data available to urban analysis experts and the general public in the form best suited to their needs.

Statistical Blocks and IRISes, the First Statistical Breakdowns for the Municipal Territory

Knowledge of a territory requires the use of statistical data on a fine geographical scale. More often than not, it is the municipal level that is used as the basic block for establishing survey areas and responding to specific issues: living zones, employment areas, urban unit and, more recently, city catchment areas (de Bellefon, Eusebio, Forest, Pégaz-Blanc and Warnod, 2020). In fact, municipalities serve to partition the national territory and have a rich supply of data, thanks in particular to the population census.

However, in cities and conurbations, the scope of public intervention and territorial projects rarely correspond to just the municipal boundaries and require information at even finer levels.

Up until 1999, INSEE disseminated the statistical data resulting from each survey at the level of statistical , a scope that corresponded to a block of houses. At the same time, during the nineties, INSEE worked with the largest municipalities to establish a dissemination mesh known as , used in particular for the more “sensitive” variables of the census: the scope of these grouped together adjacent statistical blocks within the same municipality, establishing something resembling “neighbourhoods”.

Although it is of use for fine urban analysis, the division of the municipal territory into statistical blocks varies from census to census, making it difficult, if not impossible, to analyse changes in urban phenomena. Since the introduction of the updated census in 2004, census collection is no longer exhaustive in the large municipalities, but instead takes place on the basis of samples. In order to guarantee the robustness of the data disseminated, the mesh of statistical blocks was dropped in favour of the single IRIS scale for the dissemination of sub-municipal data; as the urban territory has gradually changed, so too have the contours of the IRISes to ensure that they are no longer systematically nested with the former statistical block mesh, i.e. that used for the 1999 census.

 

The Grid Cell, a Neutral, “Simple and Practical” Mesh for Urban Analysis

For each broad category, whether it be housing, transport, local facilities or health, the urban environment is analysed according to specific divisions. In order to identify, for example, the population living near a railway station, along an infrastructure or exposed to noise, the analytical perimeters must be accurate. Their approximation via statistical block meshes or IRISes does not fully satisfy local stakeholders. Their contours do not remain stable over time and their geometry is variable: small in the city centre and much larger on the outskirts. Furthermore, these irregular contours bring about the appearance of a geographical phenomenon known as : the irregular shapes and the administrative grid boundaries that do not necessarily reflect the actual spatial distributions studied make it difficult to compare spatial units that are unevenly subdivided (Loonis and de Bellefon, 2018).

As a result, a new technique was developed during the eighties: the grid system (Ouvrir dans un nouvel ongletDelahaye, 1987). The principle behind is to divide the territory into small squares, identical in size, for which information is created that then simply needs to be aggregated for the territory of interest. It appears “simple and practical” to use (Ouvrir dans un nouvel ongletCertu and CETE Normandie Centre, 2011), and it allows for easier spatial and temporal comparisons.

However, the difficulty lies in the availability of data: how can we gather data for these cells that are useful for analysis when they are generally collected at the level of the municipality or of statistical blocks of different sizes and with variable geometries? Methodologies were then developed in order to disaggregate or distribute the available information at the neighbourhood level via cells (Ouvrir dans un nouvel ongletLajoie, 1992). However, finer division results in smaller and more numerous cells and larger databases, and the computer processing capabilities are not suitable for this. It has been possible to overcome these difficulties and to broaden this means of presenting the data thanks to the development of computer technologies and the introduction of sub-municipal geolocation and geographical information systems.

 

The Grid Cell, a Stable Foundation on Which to Build Zones of Interest

Resulting from the division of a territory in accordance with an even grid, each cell, when looked at individually, has no geographical “meaning”; it does not reflect any known territorial reality. However, when combined with neighbouring cells, it allows a zone of interest to be reconstituted.

This method was duly used by the Observatory of Grand Paris station neighbourhoods established by the Paris Urbanism Agency (APUR) with a view to characterising the neighbourhoods of future stations in the Paris conurbation. In order to analyse the Ardoines district, which comprises a circular area measuring 800 metres around the Vitry-sur-Seine train station, the APUR implemented two different approaches (Figure 1). The first consisted of using grid data by selecting the cells intersecting with the perimeter (even if only by a small amount). This resulted in the creation of a group of 63 cells: its total surface area (2.5 km²) is close to that of the circular area (2.0 km²). The second approach taken by the APUR was to use the IRISes intersecting the circular area. They have highly variable perimeters and cover a total area of 3.7 km², almost double the area of the neighbourhood initially analysed. In this case, the cells allow for the reconstitution of statistics that are closer to the actual situation in the neighbourhood when compared with the IRISes.

The other advantage of creating the observation area on the basis of cells is to ensure time analyses on a constant perimeter, which is useful for analyses performed over a long period of time.

 

Figure 1. Cells Allow an Area of Interest to be Examined in More Detail Than IRISes

 

 

Cells, but What Kind?

In the early 1980s in Great Britain, an atlas was used to present the results of the 1971 census on a scale using cells measuring 1 km² and 10 km² (OPCS, 1980). In other countries, grid data are being developed in other disciplines (medicine, geology, botany, biology, etc.). However, the size of the cells and the position of the grid used are variable and specific to the territories analysed.

By means of the European INSPIRE Directive (2007/2/EC), the European Union wished to establish “an infrastructure of geographical data to ensure interoperability between databases and to facilitate the dissemination, availability, use and re-use of geographical information in Europe”. With regard to cells, it aimed in particular to create a “harmonised, multi-resolution grid with a common point of origin and harmonised cell positioning and size” (Ouvrir dans un nouvel ongletCNIG, 2020). As a result, it became possible for the “French grid” to be juxtaposed with the German or Italian grid in accordance with a standard and compatible scheme.

In order to find a unique grid cell, you must first know the size of the grid in which it is located. Will the cell measure 200 square metres or 4 square kilometres? Its identifier must specify this information (Ouvrir dans un nouvel ongletEuropean Commission, 2010). Once this resolution was known, INSPIRE introduced a convention by which a cell is identified by its lower left corner. Indeed, armed with this single point and the resolution of the size of the cell (200 m), you can immediately draw the corresponding cell, starting from the corner and going east for 200 m and then north for 200 m. In theory, these two elements would therefore be all that would be needed to find the cell if it wasn’t for one small issue: the geographical coordinates actually depend on the used. The INSPIRE identifier must therefore also specify this (Figure 2).

Once the cells have been identified, the next step is to link them to the statistical information.

 

Figure 2. How INSPIRE Identifies Cells

 

 

Cells + data = grid data?

The equation looks easy, but it is not so simple to solve. Where do the most vulnerable populations live? Elderly people? Where are second-hand dwellings located? The answers to these questions are present in fiscal, administrative and management files, etc. The individual and detailed information must be converted into aggregated data for the entire cell. For this to be achieved, the files must include a geographical indicator that allows the information to be precisely linked to a cell. Some already come with precise geographical coordinates: this is the case for files that are fiscal in origin (housing tax, for example), which, in addition to statistical information (population, income, etc.), also include an identifier for the cadastral parcel on which the dwelling or tax household in question is located.

As can be seen from the example in Figure 3, each parcel also has a label, represented by a point for which the coordinates are known. This point is generally located within the parcel: it is therefore simply a case of linking this point to the cell in which it appears to position the information for the parcel within the cell.

Although it appears simple on the surface, this procedure can sometimes prove to be rather complex. In a few very rare cases, the parcel label may be positioned outside of the parcel or even the municipality. It must then be located and its position corrected if possible. Depending on the way in which division has been carried out, there may also be cases in which the information is located within a cell that only covers a very small portion of the parcel. Consequently, in the example shown in Figure 3, cell no. 1 covers parcel nos. 0048 and 0383, together with a part of parcel no. 049, on which the (presumably inhabited) building has been constructed; however, the latter parcel is linked to the neighbouring cell no. 4. Ultimately, the statistical information in cell no. 1 will only relate to a single residence (located in parcel no. 048), even though it actually relates to two residences (the one in parcel no. 48 and the one in parcel no. 49).

 

Figure 3. Grid Representation of Fiscal Data Linked to a Cadastral Parcel

 

 

Positioning the Information in the Correct Cell: the Postal Address Challenge

More often than not, the statistical files simply contain a postal address by way of geographical information: the addresses of persons receiving family benefits or living in social housing, etc. It is therefore necessary to pinpoint those addresses precisely on a map to enable the corresponding data to be linked to a cell.

For this to be achieved, the address must be geolocated, i.e. the string of characters making up the postal address must be recognised in a “repository”, a directory of sorts that contains the numbers and street names of all municipalities in France, together with their precise location. Once the character string has been found, the geographical coordinates present in the repository can be linked to the statistical information.

In urban areas, addresses are largely standardised: they are made up of a number, a street type (avenue, rue, etc.), a street name and a postal code or municipality code. In such cases, the address is unique. The difficulties therefore lie in correctly identifying the character strings: for example, it must be possible to link an abbreviation of the street name, such as “245 rue du Dr Fiolle (Marseille)” to the name in the repository “245 rue du Docteur Fiolle”.

In rural areas, the difficulties encountered are often very different, as not all of the addresses are standardised: this is very often the case for place names. It is practically impossible to pinpoint the precise location of a household (Figure 4) that gives its address as “Bussac Bas, hameau de Siaugues-Sainte-Marie”, since there are several homes that use this address. In such cases, the address to which the corresponding statistical information is to be linked must be chosen at random.

Regardless of the method used, the cadastral parcel label or the geolocation of postal addresses, the statistical information is ultimately positioned precisely on a map and linked to the corresponding cell. We then obtain databases known as grid data: these are the databases that will subsequently be used by public and private stakeholders to shed light on specific issues.

 

Figure 4. The Geolocation of Postal Addresses is More Complex in Rural Areas

 

 

Grid Data, Useful for Guiding Public Decision-Making

From the European Commission to the French local authorities, everyone is calling for precise information on urban issues.

At the European level, according to , grid data “present numerous advantages. As they provide data at a high spatial resolution and with a standardised shape and size, these data can be combined in a transparent manner with data from neighbouring countries. The European Commission relies heavily on grid data in order to analyse access to services, such as transport, education and health care, including services located on the other side of an international border. Moreover, grid data play an essential role in assessing exposure to pollution and natural hazards and can help to guide emergency services”.

At the local level, the primary users of grid data are urban planning authorities: “Thanks to the fine mesh of the grid, the Bordeaux Aquitaine Urban Planning Agency has been able to produce rich and useful territorial analyses”, explains . “In the Operational Master Plan for Metropolitan Travel, key information was provided with regard to populations who have a public transport service nearby. We have also been able to qualify the population density for the metropolitan RER study and have even managed to analyse and compare neighbourhoods that had previously been poorly identified by the IRIS division”.

Other stakeholders are also fans of this information: consultancy firms, researchers, students and local authorities that are less equipped to manipulate data; all of these are also requesting easily accessible data, cartographic media and tools that allow them to easily manipulate these.

 

Striking a Balance Between the Richness of the Information...

For the moment, the grid data disseminated by INSEE are limited to those provided by the source. By making use of fiscal and social data, this source, which was developed for statistical purposes, makes it possible to not only provide indicators on standard of living, inequality and poverty, but also socio-demographic data, at a fine local level, thereby meeting some of the needs expressed by users. These data also make it possible to shed more light on demographic issues (early childhood, elderly persons, etc.), social issues (poverty, single parent families), educational issues (school attendance, college attendance, etc.), environmental issues (age of dwellings), and urban issues (social housing, home ownership), etc.

Thanks to this source, it is possible to link a great deal of information to a cell:

  • information on individuals (number, age brackets, etc.);
  • information on households (number, size, standard of living, ownership status, single parent families, etc.);
  • housing characteristics (shared housing, social housing, homes, construction dates).

As a result, there is a strong temptation to start cross-referencing this information, for example to establish the number of poor households in social housing. However, the fineness of the distribution mesh (cells measuring 200 m) demands special precautions in order to protect privacy and to comply with statistical and fiscal confidentiality.

 

... and the Management of Statistical Confidentiality

Statistical confidentiality concerns the protection of individuals from any dissemination of individual data and from the possibility of statistical data being traced back to them.

Fiscal confidentiality governs the use of data from fiscal sources, which includes the FILOSOFI tool used by INSEE. It demands that statistical information only be disseminated for aggregations of at least 11 tax households.

In order to comply with these provisions, INSEE’s Department of Statistical Methods has drawn up a specific and original grid methodology (Branchu, Costemalle and Fontaine, 2018), which results in the dissemination of grid data on the basis of two different types of grid (INSEE, 2019).

In fact, the grids of cells that have been discussed so far have been implicitly “regular”, i.e. the cell size remains identical throughout. However, from the point of view of confidentiality, the method used leads us to reconsider this assumption. This is because, although the threshold of 11 households is more often than not met within a cell of 200 m, it is more difficult to achieve in sparsely populated areas: 79% of the cells in Metropolitan France, Martinique and Réunion Island include fewer than 11 households. In some cases, cells up to 32 km in size are needed in order to adequately cover a populated area, as well as to ensure that the information disseminated at this scale guarantees confidentiality.

An initial method known as the “natural level” will therefore be used to adapt the size of the cell to the number of inhabitants it contains. The second kind of grid is the more intuitive regular grid: in order to manage confidentiality within this type of grid, it is necessary to accept that the data within some of the cells will be modified.

 

The Grid with Cells of Different Sizes: the “Natural” Level

The natural level grid represents the division of the territory into cells of different sizes (from 200 m up to 32 km), which allows all of the information to be disseminated in compliance with fiscal confidentiality.

More specifically, we start by covering the territory with 32-km cells, the size required in order to guarantee that there are at least 11 households in each of those cells. They are then split in 4 to create cells measuring 16 km, within which the number of households present is counted. If any one of those cells contains fewer than 11 households, the grid will not be divided at that level. The divisions continue until:

  • either the cells obtained measure 200 m;
  • or the next division would result in one or more cells not meeting the confidentiality threshold of 11 households.

In sparsely populated areas, the division stops at an early stage, i.e. with large cells, as can be seen in the example shown in Figure 5 to the west of the Bordeaux conurbation. In very densely populated areas, such as city centres, data will be available down to 200 m.

This first level of dissemination makes it possible to guarantee the accuracy of all of the data disseminated for each cell. Nevertheless, it does not really lend itself to the cartographic presentation of the data: the sparsely populated cells with fewer inhabitants cover a very large surface area, which draws the eye of the viewer, while the dense cells within the city centre will be barely visible, which reinforces the MAUP effect mentioned earlier (Floch, 2012).

In addition, it is dependent on the source and will therefore change if other statistical sources – or another set of data from that same source – are disseminated. It is therefore not possible to superimpose natural level grids from two different sources.

 

Figure 5. Example of Division into Cells at the “Natural” Level in the Bordeaux Conurbation

 

 

The Grid of Cells Measuring 200 m... Or the “80/20 rule”

The more familiar second grid type consists of dividing the territory evenly into fixed cell sizes.

This grid offers several advantages. First of all, it provides a division that can be used for any source. It also allows for the retrieval of information available at a much finer geographical level than is permitted by the natural level. Indeed, although the natural level division guarantees the dissemination of precise data, it does not optimise the information disseminated.

Take, for example, the fictitious situation shown in Figure 6. The 1 km cell contains 555 households. However, during the division at 200 m, 14 cells were identified that contained fewer than 11 households (shown in orange): in this case, the natural level is therefore the 1 km cell.

Yet, it can be seen that, at the 200 m level of division, 11 cells are above the threshold and contain 450 households, so 81% of the total. The information could be disseminated for these cells without breaching confidentiality; however, the natural level does not permit this.

On the other hand, the information present within the cells containing fewer than 11 households must be processed. The first option considered could be to “blank them out”, i.e. to not disseminate any values for these cells. However, this would result in the values of the cell measuring 1 km being different from the sum of the values of its constituent 200 m cells. The second option is therefore to retrieve the information for the cells that cannot be disseminated and to distribute it “randomly” within the 1 km cell. This process then guarantees an increase in the amount of information and the consistency of the totals between the different dissemination levels, but also results in the presence of modified data for the cells that cannot be disseminated. It is therefore crucial that the user is made aware of the method used and that they are able to distinguish between actual and imputed values. In the grid data file for Metropolitan France, 80% of 200 m cells are imputed, but they only represent 20% of the population.

 

Figure 6. Regular Cells and Treatment to Ensure Confidentiality

 

 

Sensitive Variables: Poverty and Standards of Living

The methodology developed ensures that no information relating to fewer than 11 households is disseminated. Nevertheless, when it comes to information on poverty and income, INSEE wanted to apply some additional precautions:

  • for cells containing more than 11 households, but for which more than 80% of the households are poor, the figure for the number of poor households has been reduced to 80%;
  • for the distribution of standards of living, extreme values have also been given special treatment, known as winsorisation, which avoids sensitivity to extreme values in the distribution. In practice, having calculated the standard of living for each individual, we look at the distribution of those standards of living across a given department:
    • if an individual’s standard of living is above the 95th percentile of the departmental distribution, the standard of living is lowered to that threshold [for example, in Ain, if an individual has a standard of living above €60,000 per year, the value is reduced to €54,680];
    • conversely, if an individual’s standard of living falls below the 5th percentile of the departmental distribution, the standard of living is increased to that threshold [staying with the example of Ain, if an individual has a standard of living below €8,000 per year, the value is increased to €9,010];
    • if an individual’s standard of living falls between these two thresholds, no treatment is applied.

This treatment serves to protect individual information, while also preserving information that is of use for territorial analysis.

All of the methodological treatments have been subject to a declaration to the delegate concerning the protection of the data that INSEE provides, and the conditions for the protection of personal data can be accessed on INSEE’s website (INSEE, 2020a).

 

Databases for the Informed User that Allow them to Express their Creativity

Once the data are ready to use, positioned within each cell and “anonymised”, the only thing left to do is... to use them. Some users are experts in data processing and mapping software: they want access to raw data that they can then manipulate as they see fit in order to present them in the manner that best meets their needs. They appreciate the flexibility offered by the data, which form as many blocks as one could wish to build with in order to create original spatial representations.

For these specialists, adapted dissemination formats have been used, such as the shapefile format, which is widely used in cartographic analysis, but “proprietary”, or the geopackage format, which is larger, but free.

, who created the cartographic representations in Figure 7, stresses that “these highly original data make it possible to show and understand territorial population dynamics, socio-economic dynamics such as median income, age brackets or the age or dates of construction of housing on different scales (from the entire country down to individual neighbourhoods, via municipalities). The proposed modelling, namely the application of a regular mesh to the territory, also makes it possible to explore new ways of presenting the data and geovisualising geographical data from official statistics”. However, these databases are large to download and difficult to use. “The only drawback I can point to is the fact that INSEE has made these data available as a single file, which is too large for use by non-specialists”, confirms Boris Mericskay.

 

Figure 7. Some Examples of the Use of Grid Data by Internet Users

 

 

Maps on Geoportal or on Insee Local Statistics Website

In order to democratise access to data for less informed users, a map has been offered on the for all cell sizes (Ouvrir dans un nouvel ongletIGN, 2020).

To allow for navigation around the territory and zooming in on very small cells, the IT infrastructure must be dimensioned accordingly to ensure that the display is fluid. The Geoportal also offers the possibility of mobilising grid maps with many other layers in the background, which serve to enrich the information provided by the data. This could be relief, communication networks or, for example, the flood plains of the Seine (Figure 8), which can be cross-referenced with the density of the population at risk.

The data are also available (with a mesh of 1 km²) on the INSEE website dedicated to local statistics (Ouvrir dans un nouvel ongletINSEE, 2020b).

However, as proof that it is possible to create intermediate solutions between database provision and cartography, some expert users have developed tools that allow these data to be explored and use to be made of their full flexibility, by selecting them, aggregating them or even downloading them solely within their area of interest, like, for example, OpenDataSoft Explore (Ouvrir dans un nouvel ongletODS, 2020) or France in pixels (Ouvrir dans un nouvel ongletFrancepixel, 2020).

 

Figure 8. Map Showing the Highest Known River Levels in the Seine Basin and Population Density in Cells

 

 

Grid Data, a Magnifying Glass With Which You Can Explore Your Own Territory...

Where statistical data is disseminated in the form of tables, graphics or databases, it is difficult for a user to compare reality with the data that they are using.

Democratising access to the information provided by the data included within the cells means offering any user the possibility of going to look at a place that they know, which comes with two major risks.

The first risk is that the information displayed may lead the user to believe that the information reveals their sensitive data. Indeed, it is natural to believe that the information within the more sparsely populated cells is the actual data. In order to avoid this erroneous perception, INSEE and the IGN (which runs the Geoportal) have worked to make people aware of the treatments applied to these data. Therefore, if the number of households within a 200-metre cell is fewer than 11, the cell is cross-hatched to indicate that the data are imputed. Furthermore, the information bubble for the cell in question includes the following notice alongside the statistical data: “These data have been modified for reasons of confidentiality.”.  Finally, communication campaigns, including in particular an educational video (INSEE, 2019b), were issued in order to explain the methodology used to guarantee confidentiality.

However, at the time of dissemination of the data, and although the emphasis was placed on this point in the documentation, some users were concerned to find information on the map that they felt was too detailed or revealed personal data. Responses were provided detailing the measures taken to ensure confidentiality.

 

... But Not Microscopically

The second risk relates to the fact that grid data must be used to describe a sufficiently dense area, made up of multiple cells. In this respect, their degree of usefulness at a fine mesh is well-suited to urban analysis. The value displayed for a single cell is not really of any statistical interest; however, the fineness of the information often causes the user to take an interest in it where it relates to a familiar place. However, grid data still suffer from a lack of precision, largely linked to the location of the information.

In rural areas, the classic example is that of large cadastral parcels that comprise a dwelling in the middle of a field or close to a forest. Due to its large size, the parcel will be covered by multiple cells, but the statistical information will only be located within one of them. Sometimes, this “inhabited” cell is located several hundred metres from the dwelling in question, in a lake or forest.

This phenomenon also occurs in urban areas. In a large cadastral parcel comprising several high-rise buildings, the cadastral label may be positioned in a cell located in one part of the parcel one year and then in the neighbouring cell the following year, but still within the same parcel. An evolving analysis will show a significant decrease in population in the first cell and an increase of the same size in the neighbouring cell.

Information on such a fine scale must therefore be interpreted with caution: the primary interest of these data is to allow for the analysis of dense urban areas made up of several cells.

 

What are the Future Prospects for Grid Data?

The dissemination of grid data in 2019 paves the way for the integration of other statistical sources within the grids of cells. The production process is now described and documented. It must be possible to meet the needs of users who want the themes initially covered by the grid (housing, age distribution of the population) to be extended to other fields of use for development policies (employment, transport, environment, etc.). “Although updates to the population data and their characteristics are always eagerly awaited, employment data also attracts the same degree of anticipation or even demand”, explains Caroline De Vellis, “it is also difficult for us, even when aggregating multiple cells, to obtain valuable information, such as the cross-referencing of variables, which are reserved for larger meshes”. This requires the mobilisation of new statistical sources, geolocation of those sources and handling of the confidentiality issues. These sources include those that relate to the productive system, salaried employment, or even the population census (Box 1).

As regards the latter source, the deadlines will be tight, as Eurostat wants the results of the 2021 census to be displayed on a 1 km² grid at the level of the European Union (Ouvrir dans un nouvel ongletEurostat, 2019). The European body emphasises that this dissemination mesh will allow it to better meet the “ever-changing expectations of users, who are placing increasing importance on the availability of detailed data at the local level. This will allow for much more flexible analyses, even across borders, that can be adapted based on political and research needs”.

Between European orders and local needs, the demand for grid data for territorial studies and analyses is increasing. INSEE has already taken a significant step in this direction with the dissemination in the summer of 2019 of the FILOSOFI 2015 grid data. This experience will allow it to continue along the path towards the more systematic dissemination of data on this new geographical grid. Between databases for advanced users and visualisation in the form of open data, progress still needs to be made to offer intermediate functions and to allow the flexibility offered by this mosaic of localised information to be used to its full potential.

 

Box 1. Grid Representation of the Results of the 2021 Population Census

The results will be presented on a grid to respond to the European demand for the provision of population data in 1 km² cells; this will be done for the first time within the scope of the 2021 Census (Ouvrir dans un nouvel ongletEurostat, 2019). For the French census, this presents two major challenges:

  • the first involves the geolocation of places of residence in municipalities with fewer than 10,000 inhabitants. There are several methods that can be used for this. The first consists in geolocating the addresses appearing on the collection documents; however, this poses difficulties in municipalities where standard addresses are not used (in rural areas for example). The second method, known as the “probabilistic” method, links together the census and the fiscal files based on the personal characteristics of individuals;
  • in municipalities with more than 10,000 inhabitants, housing is already geolocated thanks to the Identified Buildings Index (RIL). However, the census is conducted there every year on a sample basis. The second challenge is therefore to make reliable estimates for the cells within these municipalities, in spite of the non-exhaustive nature of the census in these areas. Methods to obtain quality results are currently being tested.

These two challenges will be addressed as part of a project being funded by Eurostat. The work carried out for the 2021 Census will eventually have an impact on the census production system. The aim is to go beyond the response to the European regulation by “industrialising” the geolocation of the census in order to produce national population and housing data on the basis of the census. The aim is to enable the long-term dissemination of grid data from the census on www.insee.fr.

Box 2. A Grid, Data... To Measure the Impact on Air Quality and Health in a Lyon Neighbourhood

An interesting example of the use of both the grid of cells and INSEE’s statistical data is the approach taken during the study performed within the scope of the ZAC Part Dieu project, carried out by Greater Lyon in 2016 (Ouvrir dans un nouvel ongletNumTECH, 2016). This project was accompanied by the creation of numerous homes, offices and shops, as well as the modification of the road network, which would have an impact on car traffic in the study area.

The aim of the study was to examine the impact on air quality and the health of the residents. It made use of grid data produced by INSEE (population, see left), as well as the grid itself to provide a grid representation of a pollution/population index (IPP, see right), which was calculated as part of this study in order to evaluate the situation before and after this project. The calculation of this indicator “is based on the cross-referencing of an element of pollution data (pollutant concentration) with an element of population data in the area of study. [...] The pollutant concentration and the corresponding population is added to each INSEE grid. The IPP is then calculated by cross-referencing the population and concentration values. The result provides a population “exposure” indicator. [...] The index was therefore evaluated for each of the 200 m INSEE cells (subsequently referred to as “INSEE cells”)”. 

In this case, the grid and data were both used as aids to support decision-making. It would not have been possible to model the calculations on the IRIS scale, the surface area of which is too large for use with this issue of pollutant emissions on the roads. In this case, the cell provides an indispensable analysis grid.

Source: Grand Lyon - Projet ZAC Part Dieu - SETEC environnement Étude Air et Santé (NumTECH, 2016).

 

 

Legal References

Paru le :15/09/2022

IRISes (aggregated units for statistical information), for which the population is generally around 2,000 inhabitants, were then defined for all municipalities with more than 5,000 inhabitants. For further details.

Modifiable Areal Unit Problem.

In cartography, a projection (or coordinate) system is a reference within which elements can be represented in space. This system makes it possible to pinpoint any place on planet Earth with just a couple of geographical coordinates.

Lewis Dijkstra is the Deputy Head of the Policy Development and Economic Analysis Unit of the Directorate-General for Regional and Urban Policy in the European Commission.

Caroline de Vellis is a statistician at the Bordeaux Metropolitan Area Urban Planning Agency and facilitator of the Observation Society of the French Network of Urban Planning Agencies (FNAU).

FILOSOFI is the name given to INSEE’s local social and fiscal income tax system.

Lecturer at Rennes 2 University with joint responsibility for the SIGAT (Geographic Information Systems and Spatial Planning) Master’s degree programme.

The Geoportal is the national portal for territorial knowledge, provided by the National Geographic Institute (IGN).

Pour en savoir plus

APUR, 2014. Ouvrir dans un nouvel ongletObservatoire des quartiers de gare du Grand Paris – Monographie du quartier de gare Les Ardoines – Ligne 15 Sud. [online]. July 2014. Atelier parisien d’urbanisme. P. 6. [Accessed 3 December 2020].

BRANCHU, Marc, COSTEMALLE, Vianney et FONTAINE, Maëlle, 2018. Données carroyées et confidentialité. In: 13èmesJournées de Méthodologies Statistiques. [online]. 12-14 June 2018. Insee. [Accessed 3 December 2020].

CERTU et CETE NORMANDIE CENTRE, 2011. Ouvrir dans un nouvel ongletTraitements géomatiques par carreaux pour l’observation des territoires. [online]. October 2011. Éditions du Certu, Collection Dossiers. [Accessed 3 December 2020].

CNIG, 2020. INSPIRE – Ouvrir dans un nouvel ongletPrésentation. In: site du Conseil national de l’information géographique. [online]. [Accessed 3 December 2020].

COSTEMALLE, Vianney, 2018. Identification des problèmes de différenciation géographique à l’aide de la théorie des graphes. In: 13èmes Journées de Méthodologies Statistiques. [online]. 12-14 June 2018. Insee. [Accessed 3 December 2020].

DE BELLEFON, Marie-Pierre, EUSEBIO, Pascal, FOREST, Jocelyn, PÉGAZ-BLANC, Olivier et WARNOD, Raymond, 2020. En France, neuf personnes sur dix vivent dans l’aire d’attraction d’une ville. [online]. 21 October 2020. Insee Focus, n°211. [Accessed 3 December 2020].

DELAHAYE, Christine, 1987. Ouvrir dans un nouvel ongletLe carroyage : création d’une entité stable. In: L’Espace géographique. [online]. Tome 16, n°4, pp. 265-268. [Accessed 3 December 2020].

EUROPEAN COMMISSION, 2010. Ouvrir dans un nouvel ongletINSPIRE – Infrastructure for Spatial Information in Europe – D2.8.III.1_v3.0 Data Specification on Statistical Units – Technical Guidelines. [online]. 10 October 2013. European Commission Joint Research Centre. [Accessed 3 December 2020].

EUROSTAT, 2019. Ouvrir dans un nouvel ongletEU legislation on the 2021 population and housing censuses, explanatory notes. [online]. February 2019. Theme Population and social conditions, Collection Manuals and guidelines. [Accessed 3 December 2020].

FLOCH, Jean-Michel, 2012. Détection des disparités socio-économiques, l’apport de la statistique spatiale. [online]. 6 December 2012. Insee, Direction de la Diffusion et de l’Action régionale. Document de travail N°H2012/04. [Accessed 3 December 2020].

FRANCEPIXEL, 2020. Ouvrir dans un nouvel ongletSite de la France en pixel. [online]. [Accessed 3 December 2020].

IGN, 2020. Ouvrir dans un nouvel ongletSite du géoportail. [online]. [Accessed 3 December 2020].

INSEE, 2019a. Documentation – données carroyées FILOSOFI 2015. [online]. June 2019. [Accessed 3 December 2020].

INSEE, 2019b. Les données carroyées de l’Insee. [online]. 27 June 2019. [Accessed 3 December 2020].

INSEE, 2020a. Production et diffusion des données carroyées. [online]. 24 February 2020. [Accessed 3 December 2020].

INSEE, 2020b. Ouvrir dans un nouvel ongletStatistiques locales. [online]. [Accessed 3 December 2020].

LAJOIE, Gilles, 1992. Ouvrir dans un nouvel ongletLe Carroyage des informations urbaines – Une nouvelle forme de banque de données sur l’environnement du Grand Rouen. [online]. August 2018. Presses universitaires de Rouen et du Havre, nouvelle édition sur OpenEdition Books. [Accessed 3 December 2020].

LOONIS, Vincent et DE BELLEFON, Marie-Pierre, 2018. Manuel d’analyse spatiale – Théorie et mise en œuvre pratique avec R. [online]. 29 October 2018. Insee, Eurostat, Collection Insee Méthodes, N°131. [Accessed 3 December 2020].

NUMTECH, 2016. Ouvrir dans un nouvel ongletProjet PEM / Two Lyon et ZAC Part-Dieu Ouest – Étude air et santé. [online]. August 2016. Rapport d’étude pour SETEC Environnement, Réf. 284.1015/ETR – v2.1. [Accessed 3 December 2020].

ODS, 2020. Ouvrir dans un nouvel ongletPopulation française : Données Carroyées à 200 mètres – 2015. [online]. [Accessed 3 December 2020].

OPCS, 1980. People in Britain: a census atlas. 1er November 1980. Office of Population Censuses and Surveys. Stationery Office Books. ISBN 978-0116906182.