Power analysis - procedure often used before data collection to determine the smallest possible sample size to detect an effect size of a given magnitude (typically based in theory, previous research, and/or practical importance) at the desired level of significance, can also be used to evaluate the strength of the sample after data collection
Population - a collection of units to which we want to generalize a set of findings or a statistical model
Sample - a smaller but hopefully representative collection of units from a population to determine truths about the population
Sampling distribution - the probability distribution of a statistic, the distribution of possible values of a given statistic that we could expect from a given population
Generalization - the ability of a statistical model to be applied to the population at large
Qualitative methods - research involving unstructured or semi-structured techniques, such as gathering observations and interviews, with an emphasis on the qualities of individual experience as opposed to measurable, quantifiable data points. Particularly useful in preliminary research or in the form of targeted open-ended questions, can capture nuance and supplemental information that may not be covered in a multiple choice survey for instance, This insight may help guide subsequent research or detect misunderstandings or unanticipated problems in a survey or experiment, or help explain low completion rates.
Quantitative methods - inferring evidence for a theory through measurement of variables that produce numeric outcomes
Categorical data - any variable including categories of objects or entities (e.g. freshman, sophomores, juniors, seniors)
- Levels - different values within a factor in an experiment. For instance, let’s say 0 mg, 5 mg of a medicine, 10 mg of a medicine, and 15 mg are administered to patients and the efficacy is compared in an experiment. The four different conditions represent four levels of the independent variable “Dosage”.
Binary variable - (AKA dichotomous variable) a categorical category with two exclusive categories (e.g. dead or alive, yes or no, pregnant or not pregnant)
Continuous variable - a variable that can be measured to any level of precision. Time is a continuous variable, because there is no limit to how finely it can be measured.
Interval data - data measured on a scale along the whole of which intervals are equal
- Ratio variable - an interval variable with the additional property that ratios are meaningful
Ordinal variable - ranked data, without a measure of differences between values, (e.g. in a race, we have first place, second place, and third place, although it is not clear how much faster higher places were)
Construct - an underlying concept, characteristic, ability or skill that a given measure is intended to test. For example, an IQ test is one form of measurement of the construct "intelligence".
Hypothesis - a prediction about the state of the world. In other words, what do you expect to find?
- Experimental hypothesis - predicted relationship between variables
- Null hypothesis - the reverse of the experimental hypothesis that your prediction is wrong and the predicted effect does not exist
Variable - anything that can be measured and can differ across entities or across time
- Dependent variable (AKA DV, outcome variable, y) - the variable being tested in an a research experiment
- Independent variable (AKA IV, predictor, factor, explanatory variables, regressor variables) - the variable manipulated or identified by the experimenter as predicting or having an association with the outcome variable
- Latent variable - a variable that cannot be directly measured, but is assumed to be related to several variables that can be measured.
Validity - evidence that a study allows correct inferences about the question it was aimed to answer or that a test measures what it set out to measure conceptually
- Content validity - evidence that the content of a test corresponds to the construct it was designed to measure
- Ecological validity - evidence that the results of a study, experiment, or text can be applied and allow inferences
- Type I error - occurs when we believe that there is a genuine effect in population, when there is not
- Type II error - occurs when we believe there is no effect, when in fact, there is
Confounding variables (AKA confounds) - these are factors unrelated to the experiment that may impact the outcome, perhaps masking an effect or causing an observed effect to appear more influential than it actually is. An example may be test fatigue or boredom during a test. Confounding factors should be carefully considered during the planning phase of a research project and either avoided if possible, and addressed in the limitations section of a writeup
Practice effect - refers to the possibility that participants performance may be influenced by repetition or increasing familiarity of a task
Suppressor effects - when a predictor has a significant effect, but only when another variable is held constant
Randomization - random assignment of participants to varying treatment conditions to prevent test order effects, sampling effects, etc.
Reliability - the ability of a measure to produce consistent results when the same entities are measured under different conditions
- Test-retest reliability - the ability of a measure to produce consistent results when the same entities are tested at two different points in time
- Split-half reliability - a measure of internal consistency in which one half of a test is compared to the other half
Power - the ability of a test to detect the effect of a particular size
Normal distribution - a probability distribution of a random variable that is known to have certain statistical probabilities
- Central limit theorem - theorem stating that when samples are large, (above 30) the sampling distribution will take the shape of a normal distribution regardless of the shape of the population from which the sample was derived.
- Parametric test - a test based on the normal distribution, generally requiring four basic assumptions, normality, homogeneity of variance, interval or ratio data, and independence of observations
- Two-tailed test - a parametric test of a non-directional hypothesis, without suggesting the direction of the relationship
- Probability distribution - a curve describing an idealized frequency distribution of a particular variable from which it is possible to ascertain the probability to which specific values of that variable will occur
Monotonic relationship - two variables that demonstrate either an increasing or decreasing relationship, though not necessarily at the same rate. This may result in a curved pattern as opposed to a linear straight line.
Polynomial - a growth curve, or trend over time
- Linear model - a monotonic model based on a straight line, may include correlation or regression
- Non-linear model
Meta-analysis - a statistical procedure used to assimilate research findings
Likert scale - a common 5 - 7 point scale used in survey research in which the participants rate their agreement or disagreement with a statement, resulting in ordinal data useful in analysis
Inferential statistics - used for generalization in making predictions (“inferences”) about a population based on the test results from a sample
Analysis of Variance (ANOVA)
Main effect - the unique effect of the predictor variable (or independent variable) on the outcome variable, usually used in context of ANOVA
Interaction effect - the combined effect on two or more predictor variables on the outcome variable
Planned contrasts - a set of comparisons between group means that are constructed before any data is collected
Post hoc tests - a set of comparisons between group means that were not thought of before data was collected, typically comparing the means between all combinations of groups, and a strict significance criterion (e.g. Bonferroni correction). Tend to have less power than planned contrasts. Usually used for exploratory work
Robust test - a term applied to a family of procedures to estimate statistics, reliable even when the normal assumptions are not met. ANOVA is considered robust.
Mixed design - an experimental design incorporating two or more independent variables with both repeated measures and between-subject measurements
Unique variance - variance specific to a particular variable
Normality - refers to the tendency of data to fall along a bell-shaped distribution, a feature upon which estimation using parametric testing is based on mathematically
- Skew - a measure of the symmetry of the frequency distribution. Symmetrical distributions have a skew of 0
- Positive skew - frequent scores are clustered at the lower end of the distribution, and the tail points towards higher or more positive scores
- Kurtosis - measurement of the degree to which scores cluster in the tails of a frequency distribution
- Leptokurtic - kurtosis > 0. Too many scores in the tails and is too peaked
- Platykurtic - kurtosis < 0 too many scores in the tails, quite flat
Sphericity - assumes that variances of the differences between data taken from the same participant or entity are equal in repeated-measure ANOVA
- Mauchly's test of sphericity - a formal test of the assumption of sphericity in repeated measures ANOVA, used to assess whether the variances in the differences in the combinations
- Epsilon ε - an estimate of the departure of sphericity, with a maximum value of 1. Values closer to 1 indicate the assumption is met, while values much less than one indicate the assumption is violated
- Greenhouse-Geisser correction - common conservative correction for a violation of sphericity, particularly when epsilon is less than .75.
- Huynh-Feldt correction - more liberal and less common correction of sphericity in comparison to Greenhouse-Geyser, as it can tend to overestimate epsilon. Researchers may consider using this if epsilon is greater than .75.
Homogeneity - the assumption that the variance of one variable is stable (i.e. relatively similar) at all other levels of another variable
- Heterogeneity - case in which the variance of one variable differs across levels of another variable
- Levene’s test - tests the assumption of equality of variance (AKA homogeneity - involves the spread of the values - how far do values diverge from the mean) between two or more variables within a repeated measures are equal, e.g. the levels of the independent variable when there are more than two points of data from the same person.
- Heteroscedasticity - residuals of each level of the predictor variables have unequal variances
- Homoscedasticity - an assumption in regression analysis that the residuals at each level of the predictor variable(s) have similar variances. In other words, at each point along a given predictor variable, the spread of residuals should be fairly constant
- Q-Q plot - a graph plotting the quantiles of a variable against the quantiles of a particular distribution. Values falling along the diagonal of the plot demonstrate similar distributions. Values deviating from the diagonal show deviations from the distribution of interest.
Multicollinearity - two or more variables are very closely linearly related
Singularity - perfect correlation between variables, (correlation coefficient of either 1 or -1)
Leverage - gauges the influence of observed value of the outcome value over the predicted values
Cooks distance - in least-squares regression, Cook's distance is used to determine if a single data point causes an undue influence on the statistical results
Independence - the assumption that one data point does not influence another
Collinearity - used to describe independent variables that are highly correlated, which may negatively impact the validity of the analysis
|Test Statistic||Effect Size||Number of Variables and Data Type||Assumptions|
|Pearson's correlation||r||r||Two continuous paired observations||Linearity, no significant outliers, bivariate normality|
Chi-square test of independence
At least two categorical variables
Independence of observations, all cells have expected counts greater than or equal to five
|Statistical Test||Test Statistic||Effect Size||Number of Variables & Data Type||Assumptions|
|t-tests||Independent t-test||t||Cohen's d||One continuous DV, one categorical IV consisting of two independent groups||Independence of observations, no significant outliers, normality, homogeneity|
|Dependent t-test||t||Cohen's d||One continuous DV, one IV consisting of two categorical related groups or matched pairs||No significant outliers, normality|
|Analysis of Variance (ANOVA)||One-way Between-subjects ANOVA||F||Eta squared η2, η2p||Categorical, more than two independent groups|
|Inferential statistical procedure utilizing the F-ratio to test the overall fit of a linear model, usually defined in terms of group means.||Two-way between subjects ANOVA||F||Categorical, more than two independent groups|
|Repeated Measures ANOVA||F||Categorical, more than two repeated observations|
|Mixed ANOVA||F||Categorical, more than two independent groups with repeated observations|
|Statistical Test||Definition||Test Statistic||Effect Size||Number of Variables & Data Type||Assumptions|
|Regression||Linear regression||Expands upon correlation, used to predict a variable based on another variable.||F||Cohen's f2, R2, β (beta), b||One continuous IV, one continuous DV||Independence of errors (residuals), linearity, homoscedasticity of residuals, no multicollinearity, no significant outliers, or cases with high leverage or influence, normal distribution of error (residuals)|
|Standard Multiple regression||An extension of linear regression used to predict a variable based on two or more variables||F||Cohen's f2, R2, β (beta), b||One continuous DV, Two or more continuous or nominal IV’s|
|Hierarchical Multiple Regression||F||Cohen's f2, R2, β (beta), b||One continuous DV, two or more continuous or nominal IV|
|Binomial Logistic Regression||F||Cohen's f2, R2, β (beta), b|
|Statistical Test||Test Statistic||Effect Size||Number of Variables & Data Type|
|Association||Spearman's correlation||rs or ρ (rho)||Two continuous and/or ordinal paired observations||Monotonic relationship|
|Differences Between Groups||Mann-Whitney U Test||U, z||One continuous or ordinal DV, one categorical IV consisting of two related groups or matched pairs|
|Wilcoxin's rank-sum test (AKA Wilcoxin's signed rank test)||z||One continuous or ordinal DV, one categorical IV consisting of two related groups or matched pairs|
Descriptive statistics - used to summarize data rather than make generalizations or inferences about a population - may include percentages, ratios, measures of central tendency such as mean, and measures of variability (spread of the data).
- Central tendency (AKA Measures of Central Tendency) - refers to the center of a frequency distribution of observations as measured by the mean, median, and mode
- Mean (μ) - a simple statistical model of the center of the distribution of scores. A hypothetical estimate of the “typical” score
- Median - the middle score of a set of observations
- Mode - the most frequently occurring score in a set of data
Degrees of freedom - the number of entities that are allowed to vary when estimating a statistical parameter. Determines the probability distribution for test statistics
Chi-square test of association (AKA Chi-square test of independence) - used to assess the relationship between two or more categorical variables
F-ratio - a test value used in analysis of variance (ANOVA procedures) determining whether the difference between two variables is statistically significant
Model sum of squares - a measure of the total amount of variability for which a model can account, derived from the difference between the total sum of squares and the residual sum of squares
Wilcoxon’s rank-sum test - a non-parametric test to detect differences between two independent samples, nonparametric equivalent to independent t-test, provides same function as Mann-Whitney U test
Wald statistic - a test statistic with a known probability distribution (a chi-square distribution) used to test whether the b coefficient for a predictor in a logistic regression model is significantly different from zero
Wilcoxon’s signed-rank test - nonparametric test detect differences between two dependent samples, nonparametric equivalent to dependent t-test
Spearman’s correlation coefficient - a standardized measure of the strength of relationship between two variables that does not rely on the assumptions of a parametric test
Simple regression - a linear model in which one variable or outcome is predicted from a single predictor variable
Hierarchical multiple regression (AKA sequential multiple regression) -
α (alpha) level - the probability of making a Type 1 error, usually .05
- Bonferroni correction - a correction applied to the α level to reduce the probability of a Type I error when multiple significance tests are carried out. The α level is divided by the number of tests conducted.
- p-value - the p-value is a statistic representing the likelihood that the observed effect is due to chance. p < .05 is generally considered "statistically significant"
Effect Size - an objective measure of the magnitude of the observed effect
- Correlation coefficient (AKA r, R, or Pearson's r) - a standardized measure representing the linear relationship between two variables, ranging from -1 to 1. The closer this measure is to zero, the weaker the relationship. Numbers closer to -1 or 1 represent a negative or positive relationship, respectively. In other words, if the correlation = .78, the variables are positively correlated. When exercise increases, strength goes up.
- Cohen’s d - a standardized measure of the difference between means
- Eta squared η2 (AKA coefficient of determination) - an effect size that is the ratio of the model sum of squares to the total sum of squares
- z-score - the value of an observation expressed in standard deviation units
- Chronbach’s α - a measure of the reliability of a scale. The number of items is squared, multiplied by the average covariance between items, then divided by the sum of all elements in the variance-covariance matrix.
- β (beta) - standardized regression coefficient. Indicates the strength of relationship between a given predictor and an outcome in standardized form. It is the change in outcome associated with a one standard deviation change in the predictor
- b - unstandardized regression coefficient, indicates the strength of the relationship between a given predictor and an outcome, in the units of original measurement
- Nagelkerk’s R2n: a version of coefficient of determination for logistic regression
- Partial eta squared η2 - the proportion of variance the variable explains, when excluding other variables in the analysis
- Pearson’s r (AKA correlation coefficient) - a standardized measure of the strength of relationship between two variables, ranging from -1 to 1.
- Phi Φ - a measure of association between two categorical variables, used with 2 x 2 contingency tables, a variant of the chi-square text
Confidence interval - for a given statistic calculated for a sample of observations (e.g. the mean), the confidence interval is a range of values around that statistic that are believed to be certain, with a certain probability (e.g. 95%), the true value of that statistic
Variance - an estimate of the average variability (spread) of a set of data
- Standard deviation (σ) - an estimate of the average variability (spread) of a set of data measured in the same units of measurement of the original data, derived from the square root of the variance
- Standard error (SE, AKA standard error of the mean) - the standard deviation of the sampling error of a statistic. For a given statistic, (e.g. the mean) it tells how much variability there is in the statistic across samples from the same population. Large values indicate that a statistic from a given sample may not be an accurate reflection
- Standard error of differences - a measure of the variability of differences between sample means
- Standardized residuals (AKA as studentized residuals) - the unstandardized results divided by an estimate of its standard deviation that varies point by point
- Residual - difference between the value the model predicts and the value observed in the data on which the model is based
Bar chart - a graph in which a summary statistic is plotted on the y-axis against a categorical variable on the x-axis
- Error bar - a graphical representation of the mean of a set of observations including the 95% confidence interval of the mean
Histogram - frequency distribution. Differs from bar chart in that the bars are touching
Boxplots - (AKA box-whisker diagram) - a graphical representation of some important characteristics of a set of observations. The center of the plot contains the median, surrounded by a box.
- Interquartile range - The top and bottoms of the box representing the limits between which the middle 50% of observations fall (the interquartile range)
- Whiskers - Two lines extending from the top and the bottom of the plot, displaying the most and least extreme scores
Scatterplot - a graph that plots values of one variable against the corresponding value of another
- Regression line - a line on a scatterplot representing the regression model of the relationship between variables plotted