Skip to content

migueldiazpdj/Statistical-techniques-and-illustration-in-R

Repository files navigation

📊✨ @USC R Statistics & Visualization 📈🎓

This repo contains materials, assignments, and projects from the course. Dive in to explore the world of R programming, statistical analysis, and data visualization!

📝 Important Notes on Key Statistical Concepts:

  • 📊 Mean vs. Median:
    When the mean and median are significantly different, it indicates the presence of outliers in the sample, breaking the symmetry of the data.
  • 📏 Quartiles vs. Percentiles
  • ⚖️ Sample Variance vs. Population Variance:
    It is preferable to use sample variance (biased variance) over population variance. For large sample sizes, the difference becomes minimal.
    In R, the var function calculates the sample variance, not the population variance.
  • 📐 Interquartile Range (IQR):
    The IQR is the range between the first quartile (Q1) and the third quartile (Q3) within the dataset.
  • 📉 Standard Deviation (SD):
    SD is defined as the square root of the average of the squared deviations from the mean.
  • 🔄 Symmetric vs. Asymmetric Variables:
    Symmetric Variable: Data is evenly distributed around the mean.
    Positive Skewness: The right tail is longer; mean > median.
    Negative Skewness: The left tail is longer; mean < median.
  • 🔗 Correlation vs. Causation:
    Correlation between two variables does not imply that one causes the other. It only indicates a relationship or association between them.
  • 📊 Boxplots:
    Boxplots are useful for identifying outliers but do not show whether there are distinct groups within the data.
  • 🔍 Biases:
    Biases can distort statistical analysis and can occur in data collection, sampling, or interpretation. Common types include selection bias, measurement bias, and confirmation bias.
  • 📈 Normal Distribution vs. t-Distribution:
    The normal distribution is symmetrical and bell-shaped, used for large sample sizes.
    The t-distribution is similar to the normal distribution but has heavier tails. This occurs because it accounts for additional variability due to estimating the population standard deviation from a small sample size, which introduces more error.
  • 🔍 Dimensionality Reduction in Multivariate Analysis:
    In multivariate analysis, the number of observed variables can be high. The goal is to reduce the number of variables to a smaller set that still accurately describes the data.
  • 📉 Coefficient of Determination (R²):
    The coefficient of determination of a regression model, commonly denoted as R², is the proportion of the variability in the variable Y explained by the regression model. It takes values between 0 and 1; the closer to 1, the closer the observations are to the fitted line.
  • 🔧 Variable Selection Method:
    The variable selection method seeks a model that best fits the data while being as simple as possible.
  • 🔄 Collinearity in Regression Models:
    A regression model suffers from collinearity when the explanatory variables are highly correlated with each other. Under these conditions, the model struggles to distinguish the effect of each variable on the response. In practice, collinearity is studied through the Variance Inflation Factor (VIF).
  • 🔄 Multivariate Analysis and Dimensionality Reduction:
    In multivariate analysis, when we have many variables and want to reduce them, we use techniques like Principal Component Analysis (PCA). PCA transforms the original variables into new variables, called principal components, which are linear combinations of the original variables. These principal components maintain the same overall variability of the data but with fewer dimensions.
  • 🔗 Agglomerative vs. Divisive Methods:
    Agglomerative Methods: Start with each individual as a separate cluster and merge them step by step until all individuals belong to a single group.
    Divisive Methods: Start with a single group and divide it step by step until each individual forms its own group.

🔗 Links of Interest:

About

USC course of R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages