Which Correlation Coefficient Should Be Used for Investigating Relations between Quantitative Variables?

Ebru Temizhan; Hamit Mirtagioglu; Mehmet Mendes

Authors

Ebru Temizhan Canakkale Onsekiz Mart University, Agriculture Faculty, Biometry and Genetics Unit, 17100, Canakkale, Turkey
Hamit Mirtagioglu Bitlis Eren University, Faculty of Arts and Sciences, Department of Statistics, Bitlis, Turkey
Mehmet Mendes Canakkale Onsekiz Mart University, Agriculture Faculty, Biometry and Genetics Unit, 17100, Canakkale, Turkey

Keywords:

correlation coefficient, type I error, test power, simulation, robust methods

Abstract

Since the purpose of many studies is to describe and summarize the relations between two or more variables, the correlation analysis has been become one of the most fundamental statistical concepts for many researchers. There are different correlation coefficients have been developed and proposed for different cases. In this stage, it is extremely important to aware of which correlation coefficient(s) is more appropriate to use based on the measurement levels, type of the variables, distribution of the variables, type of relations between the variables, and presence of outliers or not in dataset. In this study, nine different correlation coefficients have been compared in terms of Type I error rate and test power under different experimental conditions. As a result, it has been possible to produce information about which correlation coefficient is more appropriate to use in which situations. Results of this simulation study showed that the performances of these correlation coefficients are affected by sample size and effect size rather than the distribution shape. When both the type I error and test power estimates are evaluated together, the Pearson's correlation, Winsorized, Spearman Rank, and Kendall-Tau correlation coefficients are seem to be the most appropriate coefficients for many experimental conditions.

References

Carroll, J.B. (1961). The nature of the data, or how to choose a correlation coefficient. Psychometrika, 26, 347–372.

Chen, P.Y., Popovich, P.M. (2002). Correlation: Parametric and Nonparametric Measures. Series:Quantitative Applications in the Social Sciences, Sage Publications, Inc., California, USA.

Mendeş, M. (2019). İstatistiksel Yöntemler ve Deneme Planlanması. Birinci Baskı, Kriter Yayınları, İstanbul, 636 (in Turkish).

Tuğran, E., Kocak, M., Mirtagioğlu, H., Yiğit, S., & Mendes, M. (2015). A simulation based comparison of correlation coefficients with regard to type I error rate and power. Journal of Data Analysis and Information Processing, 3 (03), 87-101.

Wilcox R. R. (2012a). Introduction to Robust Estimation and Hypothesis Testing, 3rd Edn Oxford: Academic Press.

Wilcox R. R. (2012b). Modern Statistics for the Social and Behavioral Sciences. Boca Raton, FL: CRC Press.

Choi, J., Peters, M., & Mueller, R. O. (2010). Correlational analysis of ordinal data: from Pearson’s r to Bayesian polychoric correlation. Asia Pacific Education Review, 11(4), 459-466.

Wilcox R.R. (1994). The percentage bend correlation coefficient. Psychometrika 59, 601–616.

Zar, J. H. (1999). Biostatistical Analysis. Fourth Edition. Simon & Schuster/A Viacom Co., New Jersey, USA.

Bishara, A.J., Hittner, J.B. (2012). Testing the Significance of a Correlation With Nonnormal Data: Comparison of Pearson, Spearman, Transformation, and Resampling Approaches. Psychological Methods, 17(3), 399-417.

Bishara, A. J., Hittner, J. B. (2017). Confidence intervals for correlations when data are not normal. Behavior Research Methods, 49(1), 294–309.

Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for rank correlation coefficients. Biometrika, 44(3/4), 470–481.

Zar, J.H., (2014). Spearman Rank Correlation: Overview. Wiley StatsRef: Statistics Reference Online. doi:10.1002/9781118445112.stat05964.

Kendall, M., Gibbons, J.D. (1990) Rank Correlation Methods. 5th Edition, Edward Arnold, London.

Knight, W.E. (1966) A Computer Method for Calculating Kendall’s Tau with Ungrouped Data. Journal of the American Statistical Association, 61, 436–439.

Sheskin, D. (2011). Handbook of Parametric and Nonparametric Statistical Procedure (5th ed.). Boca Raton, FL: CRC Press.

Wilcox, R.R. (1993). Some Results on a Winsorized Correlation Coefficient. British Journal of Mathematical and Statistical Psychology, 46, 339-349.

Wilcox, R.R. (2001). Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy. Springer, New York. http://dx.doi.org/10.1007/978-1-4757-3522-2

Hoeffding, W. (1948). A Non-Parametric Test of Independence. Annals of Mathematical Statistics, 19, 546–557.

Fujita, A., Sato, J. R., Demasi, M. A. A., Sogayar, M. C., Ferreira, C. E., & Miyano, S. (2009). Comparing Pearson, Spearman and Hoeffding’s D measure for gene expression association analysis. Journal of Bioinformatics and Computational Biology, 7(4),663– 684.

Hollander, M.,Wolfe, D. (1973), Nonparametric Statistical Methods, New York: John Wiley & Sons, Inc.

Base SAS® 9.2 - Procedures Guide - DataJobs.com (Access date:).https://datajobs.com/data-science-repo/SAS-Stat-Guide-[SAS-Institute].pdf (Access date: August 5, 2021).

Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6),2769–2794.

Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3(4),1236–1265. Mathematical Reviews number (MathSciNet): MR2752127.

Székely, G.J., Rizzo, M.L. (2012). On the uniqueness of distance covariance. Statistics & Probability Letters, 82 (12), 2278–2282.-27-

Székely, G.J., Rizzo, M.L. (2013). The distance correlation t-test of independence in high dimension. Journal of Multivariate Analysis, 117, 193-213.-28-

Bhattacharjee, A. (2014). Distance Correlation Coefficient: An Application with Bayesian Approach in Clinical Data Analysis. Journal of Modern Applied Statistical Methods, 13 (1), 354-366. -29-

Dueck, J., Edelmann, D., Gneiting, T.,& Richards, D. (2014). The affinely invariant distance correlation. Bernoulli, 20(4), 2305-2330. https://doi.org/10.3150/13-BEJ558 -30-

Sejdinovic, D., Sriperumbudur, B., Gretton, A., & Fukumizu, K. (2013), Equivalence of distance-based and RKHS statistics in hypothesistesting. The Annals of Statistics, 41(5),2263-2291.-31-

Zhong, J., DiDonato, N., &Hatcher, P.G. (2012). Independent component analysis applied to diffusion-ordered spectroscopy: separating nuclear magnetic resonance spectra of analytes in mixtures. Journal of Chemometrics, 26, 150-157.-32-

Langfelder, P., Horvath, S. (2012). Fast R functions for robust correlations and hierarchical clustering. Journal of Statistical Software, 46(11). doi:10.18637/jss.v046.i11 -33-

Zheng, C-H., Yuan, L., Sha, W., &Sun, Z-L. (2014). Gene differential coexpression analysis based on biweight correlation and maximum clique. BMC Bioinformatics, 15 (Suppl 15):53.-34-

Lin, Y., Wen, Z.L.S., & Zheng, C.H., (2013). Biweight Midcorrelation-Based Gene Differential Coexpression Analysis and Its Application to Type II Diabetes. ICIC 2013, CCIS 375, pp. 81–87.-35-

R Development Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.-36-

Blomqvist, N. (1950). On a measure of dependence between two random variables. Annals of Mathematical Statistics, 21, 593–600.-37-

Schmid, F., Schmidt, R. (2007). Nonparametric Inference on Multivariate Versions of Blomqvist's Beta and Related Measures of Tail Dependence. Metrika, 66(3),323-354.-38-

Mosteller and Tukey (1977). Data Analysis and Regression: A Second Course in Statistics. Addison-Wesley, pp. 203-209.-39-

Shoemaker and Hettmansperger (1982). Robust Estimates of and Tests for the One- and Two-Sample Scale Models, Biometrika 69, 47-54.-40-

Wilcox, R.R. (1997). Introduction to Robust Estimation and Hypothesis Testing. Academic Press.-41-

Keskin, E., Mendeş, M. (2021). Comparing Different Correlation Coefficients over Large Samples. IV. International Conference on Data Science and Applications (ICONDATA’21), June 4-6, 2021, TURKEY. -42-

Kraemer, H.C. (2006). Correlation coefficients in medical research: from product moment correlation to the odds ratio. Statistical Methods in Medical Research, 15(6),525-545. -43-

Pernet, C.R., Wilocox, R.R., & Rousselet, G.A. (2013). Robust correlation analyses: false positive and power validation using a new open source Matlab toolbox. Frointiers in Psychology, 3, 1-18.-44-

Zhou, Y., Zhang, Q., &Singh, V.P. (2016). An adaptive multilevel correlation analysis: a new algorithm and case study. Hydrological Sciences Journal–Journal Des Sciences Hydrologiques, 61(15), 2718-2728.-45-

Bradley, J. C. (1978). Robustness. British Journal of Mathematical and Statistical Psychology, 31, 144-152. -46-

Cochran, W. G. (1954). Some methods for strengthening the common χ2 -tests. Biometrics, 10, 417-451. -47-

Which Correlation Coefficient Should Be Used for Investigating Relations between Quantitative Variables?

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Information

Developed By

Language

Announcements

Latest publications