Skip to content

Use cases for various tasks

agarie edited this page Mar 24, 2013 · 1 revision

Correlation of sets of random numbers

  1. Generate two sets of 1000 random numbers each, uniformly distributed between 0.0 and 1.0.
  2. Get Pearson's correlation (r), r^2, slope, and the y-intercept.
  3. Plot an x-y scatterplot and include the regression line. Include r^2 and the line equation (slope & y-intercept) somewhere on the plot.

R

# modified from: http://sudoit.blogspot.com/2010/08/regression-line-r2-pearsons-correlation.html
x = runif(1000, 0.0, 1.0)  # random uniform distribution
y = runif(1000, 0.0, 1.0)
pearsons_r = cor(x,y)
r_squared = pearsons_r^2
fit = lm(y~x)    # notice the order of variables
y_intercept = as.numeric(fit$coeff[1])
slope = as.numeric(fit$coeff[2])
plot(x,y)
abline(lm(y~x)) # again, notice the order
function_string = paste("f(x) = ", slope, "x + ", y_intercept,  sep="")
r_sq_string = paste("r^2 =", r_squared)
display_string = paste(function_string, r_sq_string, sep="\n")
mtext(display_string, side=3, adj=1)  # top right outside of the margin

Matlab

%Get two sets of random numbers
n = 1000;
x = rand(n,1);
y = rand(n,1);

pearsons_r = corr(x,y);
r_squared = pearsons_r^2;

vals = polyfit(x,y,1);
slope = vals(1);
intercept = vals(2);

x2=0:.01:1;
y2=slope*x2+intercept;

clf %clear figure
hold on
scatter(x,y);
plot(x2,y2);
text(0,.8,sprintf('r^2=%.10f\n%.4f x + %.4f',r_squared,slope,intercept),'FontSize',14,'FontWeight','bold');
hold off

sciruby (take 1)

Other thoughts on how this should/could look are most welcome. This is just imagine-ware right now

require 'sciruby'  # which requires at least 'sciruby/narray'?? and 'sciruby/stats'??
(x,y) = 2.times.map { NArray.float(1000).random!(1.0) }  # how would this look in NArray v0.7? What about with GSL backend?
# how does Statsample do this?
pearsons_r = Stats.pearsons_r(x,y)  # should this be combined with following line into one call?
slope, intercept = Stats.slope_intercept(x,y)
# fill in with rubyvis-like plotting???
# ... lots of nifty plotting and labeling here ...

statsample

require 'statsample'
include Statsample::Shorthand
a=Statsample.new_scale(1000) {rand}
b=Statsample.new_scale(1000) {rand}
r=Statsample::Bivariate.Pearson.new(a,b)
puts r.summary # Retrieves r, t and p
sr=Statsample::Regression::Simple.new_from_vectors(a,b)
puts "r:#{sr.r}, r^2:#{sr.r2}, a:#{sr.a}, b:#{sr.b}" # I have to implement summary on simple regression
scatterplot(a,b,:show_regression=>true, :label=>"r^2:#{sr.r2}\nslope=#{sr.a}+#{sr.b}b") # :show_regression and label not implemented yet

sciruby (take 2)

# other thoughts?

Multiline plot

This use case is here because this should be a simple thing, yet it is so incredibly awkward to accomplish in R.

R

d1x <- c(1,2,3,4,5)
d1y <- c(4,5,2,3,3)

d2x <- c(1.5,3,7,9,12.5)
d2y <- c(8,11,8,9,15)

d3x <- c(3,3,5,5,9)
d3y <- c(-2,3,10,9,-1)

xrange <- range(d1x,d2x,d3x)
yrange <- range(d1y,d2y,d3y)

plotcolors = c("pink", "orangered", "blue")

plot(d1x, d1y, ylim=yrange, xlim=xrange, type="o", col=plotcolors[1])
lines(d2x, d2y, type="o", col=plotcolors[2])
lines(d3x, d3y, type="o", col=plotcolors[3])

# note that you must choose optimal placement
# or dig into the labcurve (Hmisc package) or emptyspace (plotrix package)
# for automatic placement
legend("topleft", c("dataset 1", "dataset 2", "dataset 3"), col=plotcolors)  
# (not getting the legend to show colors on my machine...any help on this??)
Clone this wiki locally