Researchers have developed a method to infer information about a social media account owner based on the information disclosed in their Twitter profile information.
A new machine learning system —unveiled at the Web Conference in San Francisco—learned the patterns associated with different ages, genders, and between organizations and individuals from a data set of over four million Twitter accounts in 32 languages.
This information was then combined with estimated locations and re-weighted against census data to produce more accurate estimates of population in 1,101 statistical regions across the EU.
This could pave the way for a more representative understanding of people’s views on key societal issues and topics, based on what they post on social media and attributed to specific geographical locations and demographic groups.
Dr Scott Hale, Senior Research Fellow, Oxford Internet Institute, University of Oxford said: “Despite providing lots of data points, social media has long been an unreliable tool for understanding what issues are most important to a wider population given how people self-select into using any one platform.
“This first study of its kind performs demographic predictions about a social media account’s owner based purely on the account’s profile information in 32 languages and then re-weights the online sample to be more similar to an offline population.
“We see this as a significant step towards using social media to get a more accurate picture on the issues and topics that most interest the public and understanding which groups’ views are over- or under-represented.”
This information and data underpinning this research has been made available in an open source library and you can test the inference tool at here.
The researchers in this study are from University of Oxford, University of Michigan, University of Massachusetts, GESIS – Leibniz Institute for the Social Sciences, the Max Planck Institute, and Stanford University.