-
Notifications
You must be signed in to change notification settings - Fork 31
Distributional assessments with Q Q plots
Quantile-quantile plots (Q-Q plots) are a powerful way of visually diagnosing distributional assumptions of a random variable. Help with this assessment is provided by a line through points in the first and third quartiles of the empirical and theoretical distributions (commonly known as qqline
) as well as by a confidence band or pointwise intervals around the line. It has been shown by Aldor-Noiman et al (2013) and Loy et al (2016) that both the choice of the interval around the line and the design of the Q-Q plot, such as a rotation by 90 degree, have an impact on our ability to use Q-Q plots.
In the ggplot2
framework (Wickham, 2009 and 2016) quantile-quantile plots are supported by the stat_qq
and the geom geom_qq
, which is connected to drawing the points for the quantile-quantile- plot. We are proposing to add extensions to the ggplot2
framework for adding a Q-Q line as well as support for bands around this line.
Since ggplot2
version 2.0.0 the way that geoms are support has been completely overhauled, which makes extensions much easier to write.
- Aldor-Noiman S., L. Brown, A. Buja, W. Rolke, R. Stine, The Power to See: A New Graphical Test of Normality, The American Statistician, 67(4), 249-260, 2013.
- Loy A., L. Follett, H.Hofmann, Variations of Q–Q Plots: The Power of Our Eyes!, The American Statistician, 70(2), 202-214, 2016.
- Wickham H., ggplot2: Elegant graphics for data analysis. useR, Springer, 2009.
- Wickham H., ggplot2: Elegant graphics for data analysis. 2nd edition, useR, Springer, 2016.
Q-Q plots have been implemented in various forms in R, starting with qqplot
and qqline
in the base package. However, the functionality within the ggplot2
package is restricted to stat_qq
and geom_qq
, both of which are only concerned with the placement of points in a Q-Q plot. By providing functionality for the drawing of the Q-Q line and a confidence region in form of a geom additional ggplot2
tools such as facetting and layering are made available to the analyst.
With version 2.0.0 of the ggplot2
package the handling of geoms was completely revised, which makes the handling of user defined geoms much more straightforward and compliant with the remainder of the ggplot2
framework. This extension is based on ggproto
, which operates at the interface between ggplot2
and the more general proto
package.
We envision that the student will be using this approach and create an interface that allows an integration
of Q-Q lines and regions as geoms. There are different ways of creating confidence regions for the line, such as regions based on pointwise intervals or joint confidence bands based e.g. on bootstrapping techniques.
The outcomes of the project are:
- R package for Q-Q plot add-ons implemented in form of
geoms
for theggplot2
package based onggproto
. The package has to be fully functional and must be documented. - A set of examples documenting the use and flexibility of the geoms.
- A shiny app interactively highlighting the functionality, enabling users to specify parameters and immediately see the impact to allow them to familiarize themselves with the more abstract concepts.
Making the functionality of Q-Q lines and confidence regions available in a single package (rather than in loose and distributed comments on stackoverflow) and within geoms of the ggplot2
framework, we ceate a single point of contact and make the functionality available to the wider community of ggplot2 users.
Once you have a solution to the medium or/and the hard problem, please make a link to the solutions available below and get in touch with Heike Hofmann hofmann@iastate.edu and/or Adam Loy.
- Easy: Draw a Quantile-Quantile plot of the variable
mpg
in themtcars
data and add a Q-Q line (use a package/solution of your choice). Interpret the result in a paragraph or two. Include both the code and the explanation in a knitr/Rmarkdown document. - Medium: write a shiny app that draws a Q-Q plot using
stat_qq
ofggplot2
. Make parameter choices fordistribution
anddparams
available to the user. Beyond that, let your creativity show! - Hard: based on Hadley Wickham's introduction to extending
ggplot2
, implement a rudimentary form of Q-Q line as a geom. Document the function using Roxygen, and create an R package. Make the example you used above an example in your code.
Students, please post a link to your test results here.
https://github.com/sauravkaushik8/GSoc-2017-QQ_Plots http://almeidaxan.dyndns.org/shiny/gsoc2017/