The project
The Socface project develops automatic handwriting recognition technologies to analyze all census lists from 1836 to 1936 (i.e. 20 censuses) and produce a database of individuals who lived in France during this period.
About us
Funded by the French National Research Agency (ANR), the Socface project brings together archivists, demographers, economists, historians and computer scientists working together to develop technologies for the large-scale processing of huge series of historical documents.
- Partners
- Team
Recollection of the images
Contact and information on the project for the archivists who would like to be part of it.
Schedule
From September 2021 to March 2025, we will collect, process, standardize and analyze individual data from the French censuses.

The project
Methods

Methods

The huge number of listes (about 15 million images from 1836 to 1936, corresponding to 700 million individual records) and their spatial dispersion (they are kept in nearly one hundred archives deposits) have limited their use until now. The Socface project intends to overcome this limitation by using the most recent advances in machine learning technologies.

Taking advantage of the regularity of the source over time, we will develop automatic models that will process all the images to: detect rows and columns, perform text recognition, and identify entities within the text (name, age, hamlet, etc.). Over the course of this processing chain, we will carry various tests to evaluate the consistency and quality of the results obtained. To do so, we will take advantage of the knowledge of the listes by the archivists, historians and demographers involved in the project. Symmetrically, computer scientists are not mere data purveyors for the social scientists; they will explain them how text recognition and document processing work, so that social scientists can code the information extracted from the documents and use them for doing research in perfect knowledge of its characteristics and limitations.

Hence, Socface is truly a coproduction of a database on a unique scale, helping to produce frontier research in both computer sciences and social sciences.