The buzz of busy commuters, as well as the lack thereof, leave information-rich fingerprints on all aspects of people’s lives. In EPJ Data ScienceEszter Bokányi and team to analyze 63 million tweets originated across the United States over a 10-month period and finds links between unemployment rates and user activity on Twitter.
Pixabay, public domain CC0
Guest post by Eszter Bokányi
Until recently, collecting and analyzing data on individual humans on a large scale was a long, expensive and arduous task. With the advent of the digital age, there is an increasing amount of data accessible online that allows for the analysis and modeling of human behavior. However, our understanding of these digital data sources and the methods that link data to real-world outcomes is still limited.
One of the more interesting data that can be gleaned from these fingerprints is the one that has geographic information attached to it. By linking digital data to geographic areas, researchers are able to predict different phenomena ranging from land use patterns to estimates of poverty, population density or crime rates.
In our article just published on EPJ Data Science, we are dealing with a framework for estimating US county employment and unemployment rates. Previous research could link individuals’ daily activity patterns to the regularity of their working hours, unemployment to the psychological effects measurable in mobile communication patterns, and aggregate daily activities of certain time intervals of geographic regions to unemployment. THE current jobThe goal of is to provide an alternative narrative and a larger mathematical framework for these estimates.
We collected the aggregate daily activity timelines of US counties from the normalized number of messages sent each hour on the online social network Twitter from January 2014 to October 2014. These aggregate timelines are the overlaps of many individuals’ timelines that we cannot measure due to of the scarcity of data. But if we could group individuals into groups that behave consistently based on their daily activity patterns, the data would allow us to measure the extent to which each group’s time series features in the county-wide time series.
We assume that according to their daily time series there are two types of people: those who have regular working hours and those who do not. We formulate our hypothesis that each county’s timeline is a linear combination of these two models, and then look for the underlying patterns and linear combination factors that minimize errors with respect to the data’s timeline measurements.
This type of analysis would allow policy makers to better understand the processes linked to employment phenomena.
It turns out that the underlying « hidden » patterns actually match one that has earlier morning and nighttime activity and one that shows a later upward shift in the morning and increased nighttime activity. Furthermore, the mixing factor indicating the extent of the anticipated increase pattern in county models correlates significantly with employment rates (0.46 ± 0.02) and with an opposite sign with unemployment rates ( −0.34 ± 0.02).
Our results therefore show that, analyzing a relatively scarce publicly available geolocation dataset, a very simple model can explain employment/unemployment to some extent. This type of analysis would allow policy makers to better understand the processes related to employment phenomena and could form the basis of future datasets, where problems could be identified not only on the basis of officially registered unemployed, but also on the basis of data fingerprints that people leave on different platforms.
Read the full article Here.
Eszter studied physics at the Eötvös Loránd University in Budapest and at the Humbolt Universität zu Berlin majoring in statistical physics. She is currently a graduate student at Eötvös Loránd University, where her main research area is to study how social phenomena can be captured through statistical physical methods using various fingerprints of individuals.