Stats for Alzheimer’s Disease Analysis

What is R? R-value, otherwise known as the Pearson’s correlation coefficient, measures the correlation between two sets of data. It is calculated by dividing the covariance of the two datasets by the product of each dataset’s standard deviation. The resulting value is always between 1 and -1 where positive values refer to a positive correlation between datasets and negative values refer to negative correlations. The closer the value is to 1 or -1, the stronger the correlation effect [Downey 2014].

What is R2? R2 is the coefficient of determination, which shows the proportion of variation in a dependent variable that is predictable given the counterpart independent variable. Its formula is 1 – variance of differences between true and predicted values / variance between true and average value. R2 is often used over R-value to evaluate the strength of a relationship as it also gives the percent of variance in the data that is explained by the relationship [Downey 2014].

What is a p-value? P-values represent the probability of seeing an apparent effect assuming the effect is not real. In hypothesis testing, if the null hypothesis is true, but the p-value indicates the chance of seeing the data is less than say 5%, that suggests that its very unlikely that the null hypothesis is true. The exact cutoff can depend on specific fields and applications, but a common cutoff is 0.05, or 5%.

What R-value is a strong correlation of CSF nicotinamide vs change in ADAS-Cog? Values close to 1 or -1 will indicate a strong correlation, but a better measure may be R2, to check what amount of variation in ADAS-Cog scores can be predicated by CSF nicotinamide concentration [Downey 2014]. For medical machine learning applications, an R2 of 99% or higher would be preferred, but can be as low as 95% may still be sufficient to show a significant association between the two (Dr. Kruggel told me this once, sources online seem to vary to as low as 70% being a good amount).

How do you compare 2 correlations of drug and placebo vs ADAS-Cog. Correlation coefficients are not sampled from a normal distribution, so to compare them, they must first be Z-transformed, then checked for statistical significance with a z-test. This process would indicate if the treatment’s correlation to the ADAS-Cog results were significantly different, and if so, which had a stronger correlation to the results, as well as get a 95% confidence interval [Medcalc / Hinkle 1988].

[1] A. B. Downey, Think Stats. Sebastopol: O’Reilly Media, 2015. [2] F. Schoonjans, “Comparison of correlation coefficients,” MedCalc, https://www.medcalc.org/manual/comparison-of-correlation-coefficients.php (accessed Mar. 21, 2024).