Mathematics and Machines: A New Formula for Social Sciences

Nearly all exit polls conducted by the media for the 2024 Lok Sabha elections predicted that the ruling party would get more than 350 seats, but the actual results were different. It’s not new for social sciences to get future predictions wrong. After World War II, a meeting of global scholars was held. Political science was heavily blamed for failing to predict the Second World War, which happened just 20 years after the First. Following this, political science and social sciences, in general, began to reform themselves. In natural sciences, once a law is proven, natural phenomena occur in the same way. For example, an apple has been falling towards the ground since Newton’s time. However, the rules of behavior between two individuals or two nations are not the same, which makes predicting the future difficult in the social sciences.

Now, imagine that instead of a small sample size in exit polls, analysts started using your digital footprints—like social media algorithms, your speech, likes, shares, and subscriptions—to determine your political choice. Analysis obtained from a real-time survey of such a large amount of data, not just in size but also in the number of samples, is rarely wrong. This means that if we combine mathematics, computers, and technology to study society, it could revolutionize the social sciences! This idea gave rise to the field of Computational Social Science(CSS).

Computational Social Science

CSS is a new field born from the intersection of social and modern science/technology. It is not just an academic discipline; it’s a fundamental rethinking of how we view our world. By tracking the vast and continuous stream of data generated from our daily digital lives (e.g., social media posts, online transactions, and mobile sensor data), CSS provides the skills to model, simulate, and analyze complex social phenomena.

Traditional sociology relied on purposefully collected data, but CSS is based on a new, unwritten social contract, where our daily actions become raw material for a new kind of scientific inquiry. According to some studies, the current accuracy of social sciences is around 20%, while the use of CSS shows signs of increasing it to up to 80%. This field will play a crucial role in the journey of social sciences from an art form to a science.

The Evolution of Computation

American mathematician Warren Weaver (1950) categorized the mathematical world into parts based on computational capacity. According to him, the computations until 1900 were simple, involving 2 or 3 variables. In the next phase, the number of these variables increased tremendously. This led to the development of statistics through methods like average, arithmetic progression, and regression. At the same time, the progress of Calculus made continuous computation of changes over a specific period possible. Weaver believed that methods like averages and statistics were insufficient for a comprehensive study of the social sciences. New computational methods had to be discovered for economics and political science. The technology developed during World War II could play an important role in this journey in the future. Coincidentally, we know the history of Alan Turing inventing the computer around the same time.

The Information Revolution

What changed in the 21st century is the tremendous speed of computation and the ability to handle and create massive amounts of information! In 2007, a total of 5 zettabytes (10^{21}) of digital data was created worldwide. If all this information were printed on paper and piled up, the pile would be 4500 times the distance between the Sun and the Earth. Since then, the information created has been doubling every two years. By leaving digital footprints through our mobiles and social media, each of us is becoming a sociologist. Technologies like Big Data Analysis, Machine Learning, and Artificial Intelligence are being developed to handle all this information.

Today, human behavior is constantly, rapidly, and massively generating information. From searches on our phones to sensor data in cars and public posts on social media, we are continuously living in a world of digital information. This new era provides the raw material to show a high-resolution, live video-like picture of society, where each individual action is a pixel in a constantly updated image. This new data is so large and complex that it requires entirely new tools for analysis, such as machine learning, network analysis, and AI. Unlike traditional surveys, which are just a static picture, big data offers the opportunity for continuous, detailed, and comprehensive analysis of the entire population. This has opened up a new path of research, where researchers can find patterns they could never see before, even without a specific question in mind.

What’s surprising here is that the question is different, and the standards being measured are a third thing, and then they are linked to each other. For example, a few years ago, NITI Aayog wanted information about migration in India. It is a very difficult task to find out where a person went and settled. However, researchers used the big data of the railways. By analyzing tickets, they found patterns of people who went from one place to another and did not buy a return ticket, and from that, they got information about migrants in less time. Even if this data is not accurate, it is sufficient for making an estimate with minimal effort.

How can we predict and respond to a disease outbreak before it gets out of control? Algorithms are used to analyze the total Google search data in a region to find out if there has been a sudden increase in searches for symptoms like “fever” or “cough,” and these trends are linked to known disease outbreaks. This allows for preventive steps to be taken before the situation gets out of control.

Application for India

In a country like India, with a huge population, complex challenges, and advanced digital infrastructure, using CSS for public welfare can change the equation. With a population of over 1.4 billion and widespread use of digital media, India has a large treasure trove of social data. This technology can be a boon for a country where social sciences are often neglected, as it can help them be taken more seriously.

A key debate in Indian administration is based on the concept of ‘evidence-based policymaking’. Due to India’s vastness and diversity, the conclusions of small-scale experimental programs often face difficulties when applied on a large scale, which often forces policymakers to rely on their intuition. CSS is a direct solution to this problem. Instead of expensive and limited trials, the government can use data from existing, widely used systems like Aadhaar and UPI to evaluate the outcomes of policies in real time. The success of the CoWIN vaccination platform and the Aadhaar-enabled DBT system are not just temporary technical success stories; they are examples of a new form of governance called ‘digitally-native governance’. This model transforms administration from an opaque process into a transparent, data-driven system that can adapt and respond to dynamic socio-economic challenges.

Of course, CSS is not a panacea for all problems. Ethical and legal issues related to data, such as data taken without permission and the right to privacy, remain unanswered. Moreover, there is no definite answer as to what would happen if this data is stolen or misused. However, CSS proves that the changes brought about by technology are not just reflected in the conclusions, but also in the methods of studying society.