Sage Books about Quantitative Data Analysis with R
Learn about R and find books about using this language and environment for statistical computing and graphics.
Social Statistics for a Culturally Diverse Society: Interview with the Authors
An interview with authors of Social Statistics for a Diverse Society, who discuss how to use statistical techniques to understand pressing social issues.
Analyze Big Data
Want to learn about Big Data analysis? Here are some open-access examples.
Practical Tips for Getting Started with Harvesting and Analyzing Online Text
How can you collect and analyze text you find online?
Emotion and reason in political language
In the day-to-day of political communication, politicians constantly decide how to amplify or constrain emotional expression, in service of signalling policy priorities or persuading colleagues and voters. We propose a new method for quantifying emotionality in politics using the transcribed text of politicians’ speeches. This new approach, described in more detail below, uses computational linguistics tools and can be validated against human judgments of emotionality.
Understanding institutions in text
Institutions — rules that govern behavior — are among the most important social artifacts of society. So it should come as a great shock that we still understand them so poorly. How are institutions designed? What makes institutions work? Is there a way to systematically compare the language of different institutions? One recent advance is bringing us closer to making these questions quantitatively approachable. The Institutional Grammar (IG) 2.0 is an analytical approach, drawn directly from classic work by Nobel Laureate Elinor Ostrom, that is providing the foundation for computational representations of institutions. IG 2.0 is a formalism for translating between human-language outputs — policies, rules, laws, decisions, and the like. It defines abstract structures precisely enough to be manipulable by computer. Recent work, supported by the National Science Foundation (RCN: Coordinating and Advancing Analytical Approaches for Policy Design & GCR: Collaborative Research: Jumpstarting Successful Open-Source Software Projects With Evidence-Based Rules and Structures ), leveraging recent advances in natural language processing highlighted on this blog, is vastly accelerating the rate and quality of computational translations of written rules.
text: An R-package for Analyzing Human Language
In the field of artificial intelligence (AI), Transformers have revolutionized language analysis. Never before has a new technology universally improved the benchmarks of nearly all language processing tasks: e.g., general language understanding, question - answering, and Web search. The transformer method itself, which probabilistically models words in their context (i.e. “language modeling”), was introduced in 2017 and the first large-scale pre-trained general purpose transformer, BERT, was released open source from Google in 2018. Since then, BERT has been followed by a wave of new transformer models including GPT, RoBERTa, DistilBERT, XLNet, Transformer-XL, CamemBERT, XLM-RoBERTa, etc. The text package makes all of these language models and many more easily accessible to use for R-users; and includes functions optimized for human-level analyses tailored to social scientists.
The validity problem with automated content analysis
There’s a validity problem with automated content analysis. In this post, Dr. Chung-hong Chan introduces a new tool that provides a set of simple and standardized tests for frequently used text analytic tools and gives examples of validity tests you can apply to your research right away.
My journey into text mining
My journey into text mining started when the institute of Digital Humanities (DH) at the University of Leipzig invited students from other disciplines to take part in their introductory course. I was enrolled in a sociology degree at the time, and this component of data science was not part of the classic curriculum; however, I could explore other departments through course electives and the DH course sounded like the perfect fit.
From preprocessing to text analysis: 80 tools for mining unstructured data
Text mining techniques have become critical for social scientists working with large scale social data, be it Twitter collections to track polarization, party documents to understand opinions and ideology, or news corpora to study the spread of misinformation. In the infographic shown in this blog, we identify more than 80 different apps, software packages, and libraries for R, Python and MATLAB that are used by social science researchers at different stages in their text analysis project. We focused almost entirely on statistical, quantitative and computational analysis of text, although some of these tools could be used to explore texts for qualitative purposes.
No more tradeoffs: The era of big data content analysis has come
For centuries, being a scientist has meant learning to live with limited data. People only share so much on a survey form. Experiments don’t account for all the conditions of real world situations. Field research and interviews can only be generalized so far. Network analyses don’t tell us everything we want to know about the ties among people. And text/content/document analysis methods allow us to dive deep into a small set of documents, or they give us a shallow understanding of a larger archive. Never both. So far, the truly great scientists have had to apply many of these approaches to help us better see the world through their kaleidoscope of imperfect lenses.
2018 Concept Grant winners: An interview with MiniVan
Following the launch of the SAGE Ocean initiative in February 2018, the inaugural winners of the SAGE Concept Grant program were announced in March of the same year. As we build up to this year’s winner announcement we’ve caught up with the three winners from 2018 to see what they’ve been up to and how the seed funding has helped in the development of their tools.
In this post we chatted to MiniVan, a project of the Public Data Lab.
Tapping into the hidden power of big search data
Sam Gilbert demonstrates the value of big search data for social scientists, and suggests some practical steps to using internet search data in your own research.
Social media data in research: a review of the current landscape
Social media has brought about rapid change in society, from our social interactions and complaint systems to our elections and media outlets. It is increasingly used by individuals and organizations in both the public and private sectors. Over 30% of the world’s population is on social media. We spend most of our waking hours attached to our devices, with every minute in the US, 2.1M snaps are created and 1M people are logging in to Facebook. With all this use, comes a great amount of data.
2018 Concept Grant winners: An interview with Ken Benoit from Quanteda
We catch up with Ken Benoit, who developed Quanteda, a large R package originally designed for the quantitative analysis of textual data, from which the name is derived. In 2018, Quanteda received $35,000 of seed funding as inaugural winners of the SAGE Concept Grants program. We find out what challenges Ken faced and how the funding helped in the development of the package.
Roundup: #text2data - new ways of reading
‘From text to data - new ways of reading’ was a 2-day event organised by the National Library of Sweden, the National Archives and Swe-Clarin. The conference brought together librarians, digital collection curators, and scholars in digital humanities and computational social science to talk about the tools and challenges involved in large scale text collection and analysis.