Article: FOPPA: an open database of French public procurement award notices from 2010-2020Authors: Lucas Potin, Vincent Labatut, Rosa Figueiredo, Christine Largeron and Pierre-Henri Morand
Our Data Intelligence team recently published a paper in Nature - Scientific Data, presenting the partial results of their research conducted within the framework of the DeCoMaP ANR project (ANR-19-CE38-0004), in collaboration with the LBNC , LIA , CRA labs and the company Datactivist. The project aims at the automation of detection fraud in the public procurement process.
Amongst DeCoMaP's initial goals was the set up a comprehensive database of corruption and fraud cases, through the collection of empirical pieces of evidence from various and heterogeneous legal sources, from primary and secondary source documents, as well as a survey of procurement experts. The published document describes the constitution of the FOPPA database, which relies on a subset of the TED database that contains French public procurement notices published from 2010 to 2020. These data sets contain a number of issues, the most serious one lying in the missing unique IDs of the most involved agents. The team puts forward a method by which these issues can be resolved, thus enabling the constitution of a usable database.
Abstract
Public Procurement refers to governments’ purchasing activities of goods, services, and construction of public works. In the European Union (EU), it is an essential sector, corresponding to 15% of the GDP. EU public procurement generates large amounts of data, because award notices related to contracts exceeding a predefined threshold must be published on the TED (EU’s official journal). Under the framework of the DeCoMaP project, which aims at leveraging such data in order to predict fraud in public procurement, we constitute the FOPPA (French Open Public Procurement Award notices) database. It contains the description of 1,380,965 lots obtained from the TED, covering the 2010–2020 period for France. We detect a number of substantial issues in these data, and propose a set of automated and semi-automated methods to solve them and produce a usable database. It can be leveraged to study public procurement in an academic setting, but also to facilitate the monitoring of public policies, and to improve the quality of the data offered to buyers and suppliers.
Figure 1 above: Overview of the method proposed to correct and complete the raw TED data and constitute the FOPPA database.
Figure 2 above: Successive filtering phases applied to match TED agents with SIRENE entries, as a part of the Agent Identification step from Figure 1.
Figure 3 above: Structure of the FOPPA database, shown as an Entity-Relation diagram.
Read the full article here
Read the full report here