Skip to content

Distributional assessments with Q Q plots

fkrawiec edited this page Mar 29, 2017 · 12 revisions

Background

Quantile-quantile plots (Q-Q plots) are a powerful way of visually diagnosing distributional assumptions of a random variable. Help with this assessment is provided by a line through points in the first and third quartiles of the empirical and theoretical distributions (commonly known as qqline) as well as by a confidence band or pointwise intervals around the line. It has been shown by Aldor-Noiman et al (2013) and Loy et al (2016) that both the choice of the interval around the line and the design of the Q-Q plot, such as a rotation by 90 degree, have an impact on our ability to use Q-Q plots. In the ggplot2 framework (Wickham, 2009 and 2016) quantile-quantile plots are supported by the stat_qq and the geom geom_qq, which is connected to drawing the points for the quantile-quantile- plot. We are proposing to add extensions to the ggplot2 framework for adding a Q-Q line as well as support for bands around this line. Since ggplot2 version 2.0.0 the way that geoms are support has been completely overhauled, which makes extensions much easier to write.

References

  • Aldor-Noiman S., L. Brown, A. Buja, W. Rolke, R. Stine, The Power to See: A New Graphical Test of Normality, The American Statistician, 67(4), 249-260, 2013.
  • Loy A., L. Follett, H.Hofmann, Variations of Q–Q Plots: The Power of Our Eyes!, The American Statistician, 70(2), 202-214, 2016.
  • Wickham H., ggplot2: Elegant graphics for data analysis. useR, Springer, 2009.
  • Wickham H., ggplot2: Elegant graphics for data analysis. 2nd edition, useR, Springer, 2016.

Related work

Q-Q plots have been implemented in various forms in R, starting with qqplot and qqline in the base package. However, the functionality within the ggplot2 package is restricted to stat_qq and geom_qq, both of which are only concerned with the placement of points in a Q-Q plot. By providing functionality for the drawing of the Q-Q line and a confidence region in form of a geom additional ggplot2 tools such as facetting and layering are made available to the analyst.

Details of your coding project

With version 2.0.0 of the ggplot2 package the handling of geoms was completely revised, which makes the handling of user defined geoms much more straightforward and compliant with the remainder of the ggplot2 framework. This extension is based on ggproto, which operates at the interface between ggplot2 and the more general proto package. We envision that the student will be using this approach and create an interface that allows an integration of Q-Q lines and regions as geoms. There are different ways of creating confidence regions for the line, such as regions based on pointwise intervals or joint confidence bands based e.g. on bootstrapping techniques.

The outcomes of the project are:

  • R package for Q-Q plot add-ons implemented in form of geoms for the ggplot2 package based on ggproto. The package has to be fully functional and must be documented.
  • A set of examples documenting the use and flexibility of the geoms.
  • A shiny app interactively highlighting the functionality, enabling users to specify parameters and immediately see the impact to allow them to familiarize themselves with the more abstract concepts.

Expected impact

Making the functionality of Q-Q lines and confidence regions available in a single package (rather than in loose and distributed comments on stackoverflow) and within geoms of the ggplot2 framework, we ceate a single point of contact and make the functionality available to the wider community of ggplot2 users.

Mentors

Once you have a solution to the medium or/and the hard problem, please make a link to the solutions available below and get in touch with Heike Hofmann hofmann@iastate.edu and/or Adam Loy.

Tests

  • Easy: Draw a Quantile-Quantile plot of the variable mpg in the mtcars data and add a Q-Q line (use a package/solution of your choice). Interpret the result in a paragraph or two. Include both the code and the explanation in a knitr/Rmarkdown document.
  • Medium: write a shiny app that draws a Q-Q plot using stat_qq of ggplot2. Make parameter choices for distribution and dparams available to the user. Beyond that, let your creativity show!
  • Hard: based on Hadley Wickham's introduction to extending ggplot2, implement a rudimentary form of Q-Q line as a geom. Document the function using Roxygen, and create an R package. Make the example you used above an example in your code.

Solutions of tests

Students, please post a link to your test results here.

https://github.com/sauravkaushik8/GSoc-2017-QQ_Plots http://almeidaxan.dyndns.org/shiny/gsoc2017/

https://github.com/fkrawiec/GSoC2017

Clone this wiki locally