Q&A
+ Are data able to be archived and reused? Or is the data collected dynamically as part of the analysis?
Collected data can be downloaded as an R data frame to RDS file or exported from the data tables, for example as CSV. Networks can be downloaded as data frames of nodes and edges, or in GraphML format (that can be imported later). The current state of a network graph from the analysis section can be downloaded as GraphML and then imported later via the open GraphML control. Data collection is performed prior to analysis, however the VOSONDash interface makes it easy to iteratively collect and examine data as part of exploratory analysis.
+ Best practice in using Reddit datasets, especially API data and Pushshift.io data
vosonSML retrieves JSON data from subreddit threads using unauthenticated requests. The data retrieved is public, however this method is very limited and may be removed by Reddit at any time. Best practice would be to use another library that supports authenticated access to the API until we support this in vosonSML. We hope to support an authenticated API approach with a more comprehensive, standardised data collection and network creation in the future.
It is possible to generate networks from JSON data retrieved from Pushshift.io: this can be done using igraph in R and we are planning a blogpost on this topic.
+ In what ways is VOSON helpful in literary studies and research?
VOSON is designed to enable research into online networks. The main reason for using VOSON is if you are interested in understanding how actors are interacting with one another via e.g., replies or retweets on Twitter, or comments on Reddit, and where it is useful from a research perspective to use network analysis to study this behaviour. Another reason for using VOSON is if you are interested in collecting and analysing text data from social media, and where you would like to know the actors who are authoring the text, and how these actors connect with one another. If you are not interested in networks, then here are other complementary open-source tools available for text analysis within the R environment, such as Quanteda and tidytext, which are used for the quantitative analysis of text data.
+ Cost of VOSON? Is there a free option?
VOSON R tools are Free and Open-Source Software (FOSS) released under the GPL-3 licence. They are publicly available via The Comprehensive R Archive Network (CRAN) and from VOSON Lab GitHub repositories.
+ What's the difference between the VOSON tools and other social network analysis tools (e.g., NodeXL)?
The VOSON tools are released as open-source R packages and hence they make use of, and can be used in addition to, other packages within the R environment such as rtweet, igraph, statnet, visnetwork, quanteda, tidytext etc. VOSON is complementary to the R packages igraph and statnet and indeed, VOSON makes extensive use of igraph for network analysis functionality. But igraph and statnet do not enable data collection from social media or the web (that is VOSON’s speciality).
While we think NodeXL is great software (and indeed there used to be a VOSON plugin to NodeXL, for hyperlink network collection), some users may find it a limitation that NodeXL only runs in Windows. R works on the major operating systems: Windows, MacOS, Linux.
We would like to draw your attention to Gephi for large-scale network visualisation. It is possible to create a network in VOSONDash and then export it to graphml and import it into Gephi for visualisation. Why would you do this, and not simply make use of the visualisation capabilities of VOSONDash? Well, while we are very proud of network visualisation in VOSONDash (and we build on igraph and visnetwork for this), the fact is that since VOSONDash is a web application, it is not capable of visualising very large networks. Gephi is the specialist tool for network visualisation, and we use it extensively in the VOSON Lab.
Finally, we’d like to mention two other software tools that are very prominent in social network analysis (SNA): UCINET and Pajek. Again, these tools do not provide functionality for collecting network and text data from social media, but it is possible to use UCINET and Pajek to analyse networks created in VOSON.
+ Is there a possibility to filter the data related to a profile of users, for example concerning their age or location?
The fields available for analysis are those provided by the APIs, and they differ for each data source. After you have collected your data, you will be able to see what fields are available in the data table. For Twitter, for example, there are around 80 fields that are available including profile information (such as location, if the user provided it). Note that by default, only a subset of available fields is included in the network as node or edge attributes. It is possible to include additional fields as node/edge attributes, but that will require some simple R/igraph coding.
+ Can this software be used for analysing social media contents other than political discussion?
Yes. In the VOSON Lab we tend to focus on analysis of political discussion, but it is possible to use VOSON tools to study any public social media activity. By “public” we mean that users have not changed their privacy settings such that their behaviour is hidden (such private activity is not available for collection via APIs). So VOSON can be used studying activity on social media related to any topic that you are interested in, as long as there are people on social media who are talking about it.
+ How reliable is data collection online?
VOSON will collect whatever the APIs allow it to collect. APIs have well-known restrictions or limitations. For example, with the Twitter API there might be limitations associated with sampling of data (when you are collecting on a hashtag that has high volume), and collection of historical Twitter data is only available if you have Academic Track access to the API. If a user deletes their tweets or Twitter suspends a user account, then the data will no longer be available via the API. Another issue of “reliability” of social media data is: how representative is the data of the population you are interested in studying? Social media data are typically not representative of the general public. Appropriate research design can help address such restrictions and limitations. For example, in our analysis of the 2020 US presidential debate Twitter data we do not make claims about what the US voting public thought about the candidates, rather our population of interest is people who were on Twitter talking about the debates.
+ Does text analysis (sentiment analysis) in Dash work in different languages or just in English?
We have spent a lot of time to ensure that the VOSON tools collect and store text data in an appropriate manner. So, for example, if the Twitter data you collect contains non-ASCII characters (e.g., Chinese language) then the text data will be stored correctly for further analysis. However, the VOSONDash text analysis tools (frequency analysis, word clouds, sentiment analysis) will possibly not handle the text correctly. With regard to the frequency analysis and word clouds, the approach we use relies on using spaces for tokenisation of words, and that is not appropriate for all languages. The sentiment analysis in VOSONDash is using an English lexicon. So, our recommendation is that if you are wanting to conduct text analysis for a language other than English, then you are probably best using VOSON just for the data collection and network construction. Then you can export your data (including networks, if useful to you) and analyse your text using R packages that are designed for handling the language you are working with. That is the beauty of working in the R environment: there is almost certainly going to be an R package to help you.
+ In which format can you download the data?
The raw data (what VOSON collects from the APIs) can be downloaded as a data frame (rds format for storing R objects). Network data can also be downloaded as a data frame, csv or Excel format. Network graphs can be downloaded as GraphML files.
+ Is there a way to override the API restrictions via brute force scraping?
If you are wanting to scrape a social media platform, then VOSON is not the tool for you. VOSON allows you to collect via APIs (for Twitter, Reddit, and YouTube) and hence whatever the API will allow you to collect, then you can collect it via VOSON. By supporting collection via APIs (rather than web scraping) we contend that VOSON is a tool for ethical research into online behaviour. However, there will still be ethical considerations with what you do with the data collected using VOSON, and this will be something that the human research ethics committee at your university will have something to say about. Finally, while we do provide a web crawler within vosonSML (WWW hyperlink networks are one of the data sources that you can collect on), it is designed to crawl websites, not social media platforms, and further: it obeys the robots.txt protocol, so it only collects the data that webmasters are making visible to crawlers.
+ Could you please share some sample research articles, which employed the VOSON app?
There are some research examples in the slides we are providing as part of this webinar. Also, please see the VOSON Lab website for nearly twenty years of research using and producing the VOSON tools. If you want to find research by other people where VOSON tools are used, sometimes authors forget to cite us, but google searching for VOSON can turn up some papers.
+ Can you do search query without the hashtag? Can we search for certain words in the tweet, for example?
Yes, it is possible to search a tweet for any term (a word, a hashtag) or a combination of terms (including boolean searches). Twitter allows for sophisticated search queries (see standard search operators) https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators and so whatever the API allows, you can do this in VOSON. Additionally, the collection may be filtered by, for example, type of Twitter activity (e.g., to include retweets only), number of collected tweets, or language of tweet. See our vignette for more information: https://cran.r-project.org/web/packages/vosonSML/vignettes/Intro-to-vosonSML.html
+ What is the maximum number of tweets you can collect in a network?
It depends on the Twitter API rate limits. With the standard v1.1 Twitter API, there is a limitation of 18,000 collected in a 15-minute period. If your collection is going to exceed this rate limit, it is possible to set VOSON so that the collection will pause or sleep if the limit is reached, and then it will automatically start up again. The VOSON Lab conducted large-scale Twitter collections (over 1 million tweets collected) during the debates of the 2020 U.S. presidential election. For details, see this blog post: https://vosonlab.github.io/posts/2021-06-03-us-presidential-debates-2020-twitter-collection/
+ Can you look at changes over time? Is it possible to build an author network scraping data in reddit based on date? For example, from day x to day y?
For three of the data sources (Twitter, Reddit, and YouTube) there is timestamp information indicating when a tweet was authored, or when a comment on Reddit or YouTube was written. VOSON includes the timestamp data in the network as a node or edge attribute. Hence, it is possible to conduct dynamic network analysis. Also, it is quite common to undertake Twitter collections over a period of time e.g., collecting on a particular hashtag every week. This leads to a series of dynamic networks which can be analysed separately or merged into a single large dynamic network. We are currently exploring ways to integrate dynamic network analysis and visualisation into VOSONDash, but the data for dynamic network analysis are being collected and are available.
It is currently not possible to use VOSON to collect comments that were authored during a particular time period. What you would need to do is collect the entire thread (or post) and then you can later filter out comments based on date of creation (this would require that you download the data and work with it directly in R).
+ What types of training/ workshops do you offer? For researchers and educators?
- The VOSON Lab contributes to undergraduate and master’s courses at the Australian National University in the following areas: Online Research Methods, Social Science of the Internet, Economic Analysis of the Digital Economy.
- We encourage applications from suitably qualified students to undertake PhD studies in the School of Sociology, where the VOSON Lab is located.
- We run online short courses and masterclasses via the Australian Consortium for Social and Political Research Inc. (ACSPRI).
+ Currently VOSON searches Twitter, YouTube, and Reddit. Is one able to search Facebook, Instagram, LinkedIn, and Tik Tok? Might this be possible in the future?
If a social media platform affords networked behaviour (e.g., conversations, commenting, liking of posts, sharing of posts) and has a publicly available API, then the VOSON Lab might be interested and available to extend the VOSON tools to collect the data. In the past VOSON was able to collect data from both Facebook and Instagram, but the changes to the API that Facebook enacted after the Cambridge Analytica data scandal meant that it was no longer possible to collect network data from these platforms. We are always looking to integrate other data sources via their APIs, that can be used for social network analysis, but please remember we are a small team so we might need to seek resourcing for any major software development.
+ Is VOSON tool able to crawl the hyperlink and content of a website/page?
Yes. Hyperlink collection is available via vosonSML. See the following blogpost: https://vosonlab.github.io/posts/2021-03-15-hyperlink-networks-with-vosonsml/
+ Is this only for user networks? Or can I use this for co-hashtag network visualization?
The VOSON software designed for the analysis of networks, and the software currently produces the following networks:
- Reddit: actor network (nodes are Reddit users who have commented, and the author of the post); activity network (nodes are the comments, and the top-level post).
- YouTube: actor network (nodes are users who have commented on a YouTube video, and the channel that uploaded the video); activity network (nodes are the comments, and video).
- Twitter: actor network (nodes are Twitter users who have e.g. authored tweets containing a hashtag or are mentioned/replied to/retweeted in tweets containing a particular hashtag); activity network (nodes are the tweets); two-mode network (two actor types – user and hashtag – and there is an edge from user i to hashtag j if user i authored a tweet containing hashtag j; semantic networks (nodes are entities extracted from the tweet text - words, hashtags and usernames and edges reflect co-occurrence i.e. there is an edge between entities i and j if they both occurred in the same tweet).
- WWW hyperlink: actor network (nodes are website domains e.g., www.anu.edu.au); activity network (nodes are web pages).
More details on the network types can be found in our vignette: https://cran.r-project.org/web/packages/vosonSML/vignettes/Intro-to-vosonSML.html
But remember: if there is a particular network type that you wish to work with, and it is not currently provided by VOSON, then it is always possible to export the VOSON network as graphml and then use igraph in R to construct whatever network type you would like. We do this in the VOSON Lab, and we are planning future blogposts on this topic.