An Example of R Versatility
By Dave Collingridge
In my last blog post I mentioned a few advantages to learning R. One of those advantages is that R opens up a world of new data analyses. There are novel techniques available in R that are not found in the ANALYZE drop down menus of SPSS, Stata, and Statistica. Novel techniques in R can be a big help in situations where data are not well-suited for traditional analyses like t-tests, ANOVA, and regression.
Let’s take a look at an example.
Consider a fictional medical study with the following conditions:
- Outcome variable 1 is a time variable representing hospital length of stay (lower is better)
- Outcome variable 2 represents a biomedical outcome that is tracked cumulatively during treatment (lower is better).
- We want to compare outcomes between treatment 1 patients and treatment 2 patients.
At first glance it seems that a two-group comparison like the Hotelling’s multivariate T-squared test might work for this scenario. Ideally this multivariate test would tell us whether there is a meaningful difference between the treatments when both outcomes are taken into consideration. However, this analysis does not capture the complex interplay between time and biomedical outcome. Consider that a patient could have a biomedical score of 50 over five minutes while another patient could have the same biomedical score (50) over five days. Accumulating a biomedical score of 50 over five minutes is very different than accumulating a biomedical score over five days. Also the data have a great deal of within group variability (due to patient differences), so much so that there is little or no chance of finding significant results with traditional multivariate or univariate analyses.
This kind of scenario can leave researchers scratching their heads wondering what kind of statistical test to use.
Thankfully this sort of data can be analyzed in R with bivariate density analysis. Bivariate density analysis gives researchers a 3-dimensional display of the complex interplay between two outcome variables. One outcome is put on the X-axis and the other is put on the Y-axis. The vertical Z-axis represents the probability of a case from the population being at a particular location given scores on the two outcome variables.
The bivariate density plots shown below are for treatment group 1 and treatment group 2. Note that each plot contains an area of 100 percent. The plots show that patients receiving treatment 1 are much more likely to score low on the biomedical outcome measure AND score low on the time variable than patients receiving treatment 2. In fact, patients undergoing treatment 1 have little or no chance of scoring above 400 on the biomedical outcome variable, and little chance of being in the hospital for more than 40 days.
Conclusion: Treatment 1 is more favorable. It appears to lead to lower biomedical outcome measures and less time spent in the hospital.
More from Dave
Methodspace blogger Dave Collingridge is a senior research statistician for a large healthcare organization located in Utah, USA. He has published several quantitative and qualitative research articles in healthcare, psychology, and statistics and has been a member of Methodspace for several years. See his debut post here.