Big data did not begin in Silicon Valley

Nahaufnahme eines antiken Buches und eines USB-Sticks auf einem weißen Tisch
(Bild: Adobe Stock/Sabphoto)

FAU research project investigates Chinese data practices stretching over more than two millennia, opening new perspectives on AI, big data and digital power.

Nowadays, data are often assumed to be a neutral basis for political decisions and for artificial intelligence. However, data do not arise on their own – they are gathered, structured, interpreted and prioritized. A new research project at Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) is investigating how China ruled with data long before the digital age, and the implications this has for today’s debates.

When people talk of artificial intelligence or big data today, they generally think of algorithms, data centers and the like. The new research project “Towards a Chinese History of Data” is starting much earlier: It explores how China has gathered, structured and used information as an instrument of governance over more than two millennia. The project at the Chair of Chinese Studies, focusing on the philosophical and cultural history of China has received approximately 325,000 euros in funding from the Volkswagen Foundation. The project has a duration of 18 months, from April 2026 until September 2027.

The principal investigator is Dr. Chun Xu, with Sijia Cheng as co-principal investigator. Their research centers on a question that could hardly be of any more relevance today: Are data truly neutral representations of reality? No they are not, according to the FAU researchers. The very decision as to what data is selected, which categories apply and what information is used for influences politics and society.

Project Leader Dr. Chun Xu (left) and Sijia Cheng, Co-Project Leader. (Photo: courtesy of the authors)
Project Leader Dr. Chun Xu (left) and Sijia Cheng, Co-Project Leader. (Photo: courtesy of the authors)

Early forms of data collection by the state

“Many debates about big data suggest that data-driven societies are an entirely new phenomenon,” says Dr. Chun Xu. “However, states were already working with highly complex information systems hundreds of years ago. In Chinese sources we can find a vast number of household registers, land surveys, tax lists and censuses. The exciting question is: What happens if we consider these materials as data?”

The incentive for the project came from the current day. While working with data sets stemming from pre-modern sources, Dr. Chun Xu realized that historical public servants often dealt with similar problems to today’s data engineers: cleaning up old registers, combining contradictory information, adjusting formats, or making patchy datasets usable.

How data shape reality

One particularly illustrative example is the Chinese household registration system, known as the houkou system. Already in the Qin dynasty in the 3rd century BC, the state recorded who lived where, how many people belonged to one household, which social status they had and what obligations arose as a result. These registers determined who paid taxes, who could be conscripted or who was permitted to change their place of residence. “Such systems do not only describe society, they actively form it,” says Xu. “By assigning people to certain categories, the state also creates political reality.” The project is investigating several phases during which data practices in China became particularly important: early administrative data in the 3rd and 4th centuries BC, major land surveys and statistical projects of the Song dynasty, and modern forms of quantitative data recording in the 19th and 20th centuries.

Historical questions for today’s digital world

Sijia Cheng’s research focuses on the modern period. Her research covers topics such as the history of intelligence tests and the statistical recording of social problems in the Republic of China. “Data are made, they are not simply a given,” she emphasizes. The FAU researcher continues: “Data are created through technology, administrative routines, political interests and cultural ideas about what is actually considered measurable or relevant.”

The project touches on key questions of relevance today: Who and what is counted – and according to what rules? What falls through the cracks? Which assumptions are behind categories that later appear objective? And who controls the infrastructure used to create and evaluate data? The researchers see clear parallels to today’s digital systems.

What the past can tell us about our future

Algorithmic assessments, automated classifications or digital administrations also shine a light on certain information while keeping other aspects in the dark. At the same time, the historic perspective shows that large data collections always come up against boundaries: Even the imperial administrations struggled with too much information, unclear data streams and figures whose correlation to the actual situation on the ground was hard to check.

“Today’s data system is not the natural consequence of one single development,” says Cheng. “It is one option among many. Early data systems have been created, changed and partially collapsed. That is why the historical perspective is so valuable.” The project therefore aims not only to close a gap in research into China, but also to extend debates on AI, surveillance, digital administration and big data to cover other historical periods. History shows that today’s systems are neither self-evident, nor the only option.

Further information:

Dr. Chun Xu
Chair of Sinology
xu.chun@outlook.com

Sijia Cheng
Chair of Sinology
sijia.cheng@fau.de

Project Leader Dr. Chun Xu (left) and Sijia Cheng, Co-Project Leader. (Photo: courtesy of the authors)
File Name
XU_Cheng_Collage
File Size
727 KB
File Type
JPEG