Impact & Society, Data Analysis Heather Estop Impact & Society, Data Analysis Heather Estop

Emotion and reason in political language

In the day-to-day of political communication, politicians constantly decide how to amplify or constrain emotional expression, in service of signalling policy priorities or persuading colleagues and voters. We propose a new method for quantifying emotionality in politics using the transcribed text of politicians’ speeches. This new approach, described in more detail below, uses computational linguistics tools and can be validated against human judgments of emotionality.

Read More
Tools & Technology, Data Analysis Chris Burnage Tools & Technology, Data Analysis Chris Burnage

Understanding institutions in text

Institutions — rules that govern behavior — are among the most important social artifacts of society. So it should come as a great shock that we still understand them so poorly. How are institutions designed? What makes institutions work? Is there a way to systematically compare the language of different institutions? One recent advance is bringing us closer to making these questions quantitatively approachable. The Institutional Grammar (IG) 2.0 is an analytical approach, drawn directly from classic work by Nobel Laureate Elinor Ostrom, that is providing the foundation for computational representations of institutions. IG 2.0 is a formalism for translating between human-language outputs — policies, rules, laws, decisions, and the like. It defines abstract structures precisely enough to be manipulable by computer. Recent work, supported by the National Science Foundation (RCN: Coordinating and Advancing Analytical Approaches for Policy Design & GCR: Collaborative Research: Jumpstarting Successful Open-Source Software Projects With Evidence-Based Rules and Structures ), leveraging recent advances in natural language processing highlighted on this blog, is vastly accelerating the rate and quality of computational translations of written rules.

Read More
Tools & Technology, Data Analysis Chris Burnage Tools & Technology, Data Analysis Chris Burnage

text: An R-package for Analyzing Human Language

In the field of artificial intelligence (AI), Transformers have revolutionized language analysis. Never before has a new technology universally improved the benchmarks of nearly all language processing tasks: e.g., general language understanding, question - answering, and Web search. The transformer method itself, which probabilistically models words in their context (i.e. “language modeling”), was introduced in 2017 and the first large-scale pre-trained general purpose transformer, BERT, was released open source from Google in 2018. Since then, BERT has been followed by a wave of new transformer models including GPT, RoBERTa, DistilBERT, XLNet, Transformer-XL, CamemBERT, XLM-RoBERTa, etc. The text package makes all of these language models and many more easily accessible to use for R-users; and includes functions optimized for human-level analyses tailored to social scientists.

Read More
Skills, Data Analysis, Data Collection Chris Burnage Skills, Data Analysis, Data Collection Chris Burnage

My journey into text mining

My journey into text mining started when the institute of Digital Humanities (DH) at the University of Leipzig invited students from other disciplines to take part in their introductory course. I was enrolled in a sociology degree at the time, and this component of data science was not part of the classic curriculum; however, I could explore other departments through course electives and the DH course sounded like the perfect fit.

Read More
Tools & Technology, Data Analysis Chris Burnage Tools & Technology, Data Analysis Chris Burnage

From preprocessing to text analysis: 80 tools for mining unstructured data

Text mining techniques have become critical for social scientists working with large scale social data, be it Twitter collections to track polarization, party documents to understand opinions and ideology, or news corpora to study the spread of misinformation. In the infographic shown in this blog, we identify more than 80 different apps, software packages, and libraries for R, Python and MATLAB that are used by social science researchers at different stages in their text analysis project. We focused almost entirely on statistical, quantitative and computational analysis of text, although some of these tools could be used to explore texts for qualitative purposes.

Read More
Tools & Technology, Data Analysis Chris Burnage Tools & Technology, Data Analysis Chris Burnage

No more tradeoffs: The era of big data content analysis has come

For centuries, being a scientist has meant learning to live with limited data. People only share so much on a survey form. Experiments don’t account for all the conditions of real world situations. Field research and interviews can only be generalized so far. Network analyses don’t tell us everything we want to know about the ties among people. And text/content/document analysis methods allow us to dive deep into a small set of documents, or they give us a shallow understanding of a larger archive. Never both. So far, the truly great scientists have had to apply many of these approaches to help us better see the world through their kaleidoscope of imperfect lenses.

Read More
Tools & Technology, Data Analysis Chris Burnage Tools & Technology, Data Analysis Chris Burnage

2018 Concept Grant winners: An interview with MiniVan

Following the launch of the SAGE Ocean initiative in February 2018, the inaugural winners of the SAGE Concept Grant program were announced in March of the same year. As we build up to this year’s winner announcement we’ve caught up with the three winners from 2018 to see what they’ve been up to and how the seed funding has helped in the development of their tools.

In this post we chatted to MiniVan, a project of the Public Data Lab.

Read More
Tools & Technology Daniela Duca Tools & Technology Daniela Duca

Social media data in research: a review of the current landscape

Social media has brought about rapid change in society, from our social interactions and complaint systems to our elections and media outlets. It is increasingly used by individuals and organizations in both the public and private sectors. Over 30% of the world’s population is on social media. We spend most of our waking hours attached to our devices, with every minute in the US, 2.1M snaps are created and 1M people are logging in to Facebook. With all this use, comes a great amount of data.

Read More
Tools & Technology, Data Analysis Chris Burnage Tools & Technology, Data Analysis Chris Burnage

2018 Concept Grant winners: An interview with Ken Benoit from Quanteda

We catch up with Ken Benoit, who developed Quanteda, a large R package originally designed for the quantitative analysis of textual data, from which the name is derived. In 2018, Quanteda received $35,000 of seed funding as inaugural winners of the SAGE Concept Grants program. We find out what challenges Ken faced and how the funding helped in the development of the package.

Read More
Tools & Technology, Data Analysis Heather Estop Tools & Technology, Data Analysis Heather Estop

Roundup: #text2data - new ways of reading

‘From text to data - new ways of reading’ was a 2-day event organised by the National Library of Sweden, the National Archives and Swe-Clarin. The conference brought together librarians, digital collection curators, and scholars in digital humanities and computational social science to talk about the tools and challenges involved in large scale text collection and analysis.

Read More