Spearman Rank Correlation Analysis using Agri Analyze

Summary

This blog explains the concept of correlation as a measure of the strength and direction of the relationship between two variables. It differentiates between parametric correlation (Pearson’s correlation) and non-parametric correlation methods such as Spearman’s rank correlation and Kendall’s tau. The article highlights the importance of data distribution assumptions, emphasizing when rank correlation should be used instead of linear correlation. A detailed explanation of Spearman’s rank correlation, including its ranking procedure and formula, is provided. The blog also covers testing the significance of the correlation coefficient using a t-test. A solved numerical example is included for clarity. Finally, it demonstrates a step-by-step guide to performing Spearman rank correlation using the Agri Analyze tool, along with interpretation of automated outputs such as heatmaps, correlation matrices, and smart insights.

1.Introduction

Correlation is a statistical measure that quantifies the strength and direction of the relationship between two variables. For example, it can be used to assess whether there is a connection between the heights of fathers and their sons.

There are two primary types of correlation analysis:

  • Parametric Correlation: This method, often using Pearson's correlation coefficient (r), measures the linear relationship between numerical variables. It assumes a specific distribution of the data.
  • Non-Parametric Correlation: Employing techniques like Kendall's tau or Spearman's rho, these methods analyze the relationship between variables based on their ranks rather than their actual values. They are suitable for categorical data or ordinal (rank) data and do not require assumptions about data distribution.

One of the key assumptions in correlation analysis is that both variables being studied are normally distributed. If at least one of the variables follows a normal distribution, linear correlation can still be used. However, if neither variable is normally distributed, the linear correlation method is not appropriate. In such situations, rank correlation should be utilized instead.

There are two distinct methods for computing rank correlation: Spearman's rank correlation and Kendall's tau. Both methods can be applied to the same dataset. Numerically, Spearman's rank correlation typically yields higher values than Kendall's tau. However, both methods generally produce nearly identical inferences, so there is no compelling reason to favor one over the other. Spearman's rank correlation is more widely used due to its computational simplicity.

Spearman's rank correlation is sometimes called . In order to avoid confusion with the population correlation coefficient , the notation , is used to represent Spearman's correlation coefficient. Spearman's rank correlation procedure starts with ranking of the measurements of the variable X and Y separately. The differences between the ranks of each of n pairs are then found out. They are denoted by d. The Spearman's rank correlation is then computed by using the formula:

\[ r_s = 1 - \frac{6\sum d^2}{n(n^2 - 1)} \]

In case of tied ranks, it is given by

\[ r_s = \frac{SS(X) + SS(Y) - \sum d^2}{2\sqrt{SS(X)\,SS(Y)}} \]

Here, SS(X) = sum of squares of ranks in X.

\[ SS(X) = \frac{n^3 - n}{12} - \sum \frac{(t^3 - t)}{12} \]

t = the number of ties at a given rank for variable X

Testing the Significance of the Correlation Coefficient: A Step-by-Step Guide

To test the significance of the correlation coefficient, typically perform a hypothesis test to determine whether the observed correlation is statistically significant. The steps for testing the significance of the correlation coefficient r are as follows:

  1. Formulate the Hypotheses:
    • Null Hypothesis (H0): \( \rho = 0 \) (There is no linear relationship between the variables)
    • Alternative Hypothesis (H1): \( \rho \neq 0 \) (There is a linear relationship between the variables)
  2. Calculate the Test Statistic:

    The test statistic for the correlation coefficient is given by:

    \[ t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \]

    Where,

    • r is the sample correlation coefficient
    • n is the number of data points (sample size)
  3. Determine the Degrees of Freedom: The degrees of freedom (df) for this test is \( n - 2 \).
  4. Find the Critical Value:

    Use a t-distribution table or statistical software to find the critical value of t for a given significance level (α) and n − 2 degrees of freedom. Common significance levels are 0.05, 0.01 and 0.10.

  5. Compare the Test Statistic to the Critical Value:
    • If \( |t| \) is greater than the critical value, reject the null hypothesis (H0). This means the correlation is statistically significant.
    • If \( |t| \) is less than or equal to the critical value, do not reject the null hypothesis. This means there is not enough evidence to conclude that the correlation is significant.
  6. Interpret the Results:

    Based on the comparison, draw conclusions about the significance of the correlation coefficient.

Solved example of Spearman Rank Correlation

Problem statement: There are two variables X and Y each having 5 observations. Compute the Spearman rank correlation and also test its significance using t test. The data is shared below: X: 10, 20, 30, 40, 50 and Y: 20, 25, 15, 35, 30

Spearman Rank Correlation Coefficient

The Spearman rank correlation coefficient measures the strength and direction of the monotonic relationship between two variables. It is calculated using the formula:

\[ r_s = 1 - \frac{6 \sum d^2}{n(n^2 - 1)} \]

1. Calculate Rank of X and Y value and then differences among ranks and their squares:

X Rank Y Rank d d2
10 1 20 2 -1 1
20 2 25 3 -1 1
30 3 15 1 2 4
40 4 35 5 -1 1
50 5 30 4 -1 1

2. Sum the squared differences:

\[ \sum d^2 = 1 + 1 + 4 + 1 + 1 = 8 \]

3. Calculate Spearman's rank correlation coefficient:

\[ r_s = 1 - \frac{6 \sum d^2}{n(n^2 - 1)} = 1 - \frac{6 \times 8}{5(5^2 - 1)} = 0.6 \]

In this example, both Pearson and Spearman rank correlation coefficients are 0.6, indicating a moderate positive relationship between the variables.

Steps for Testing Significance

1. State the Hypotheses:

  • Null Hypothesis (H0): ρ = 0 (There is no linear relationship between the variables)
  • Alternative Hypothesis (H1): ρ ≠ 0

2. Calculate the Test Statistic:

The test statistic for the correlation coefficient is calculated using the formula:

\[ t = \frac{r\sqrt{n - 2}}{\sqrt{1 - r^2}} \]

where r is the Pearson correlation coefficient and n is the number of pairs.

For our data:

\[ t = \frac{0.6\sqrt{5 - 2}}{\sqrt{1 - 0.6^2}} = 1.299 \]

3. Compute Degrees of Freedom and get critical value:

Degrees of freedom \( df = n - 2 = 5 - 2 = 3 \)

Using a t-distribution table, find the critical t-value for a two-tailed test at a chosen significance level (typically \( \alpha = 0.05 \)) with 3 degrees of freedom.

The critical value is approximately \( \pm 3.182 \).

4. Conclusion: Since \( |t| \) is less than the critical value, we do not reject the null hypothesis. There is insufficient evidence to conclude that there is a significant linear relationship between X and Y at the 0.05 significance level.

5. Summary: Based on the test of significance, the correlation coefficient of 0.6 is not statistically significant at the 0.05 significance level. Therefore, we cannot conclude that there is a significant linear relationship between the variables X and Y for this dataset.

Steps to perform Spearman Rank Correlation in Agri Analyze:

Dataset consists of 5 variables. Each has 18 observations. The snip of the dataset is shared below:

Step 1: To create a CSV file with columns for 5 variables.

Variable1 Variable2 Variable3 Variable4 Variable5
12 14 6 11 20
11 13 6 10 18
14 17 8 13 23
13 16 7 12 22
15 18 8 14 25
16 19 9 15 27
12 14 6 11 20
17 20 9 16 28
18 22 10 16 30
10 12 5 9 17
12 14 6 11 20
12 14 6 11 20
14 17 8 13 23
34 41 18 31 56
23 28 12 21 38
26 31 14 24 43
28 34 15 26 46

Step 2: Click on ANALYTICAL TOOL ->CORRELATION AND REGRESSION ANALYSIS ->SPEARMAN'S RANK CORRELATION

Step 3: Open link https://www.agrianalyze.com/SpearmanRankCorrelation.aspx (For first time users free registration is mandatory)

Step 4: Link Here to download sample file Sample File Download

Spearman Rank Correlation

Step 5: Click submit, pay a nominal fee, and download the output report with detailed interpretation.

Output Report: Link of the output report

Spearman Rank Correlation

Video Tutorial: Link of the Youtube Tutorial

The blog is written by:
Darshan Kothiya, PhD Scholar, Department of Agricultural Statistics, BACA, AAU, Anand