Summary
Path analysis is the statistical procedure of inferring direct and indirect relationships among variables.
In extension and survey research, the method assists a researcher in establishing how various factors influence a response of interest, such as farmer’s adoption of technologies, productivity, income or attitudes.
By means of breaking down correlations into their respective direct and indirect effects, path analysis enables insights that are deeper than those facilitated by simple correlation or regression analysis.
This blog provides a basic explanation of the concept, methodology interpretation of results through a solved example and steps for analysis in Agri Analyze in simple language.
1.Introduction
Path coefficient analysis was developed Sewall Wright in (1921). It’s a multivariate analysis technique which deals with a closed system of variables which are linearly related. Path analysis provides the direct effect of a character on the dependent character as well as its indirect effects through other variables in the system.
Path analysis is analogous to the analysis of variance and may be called as analysis of correlation coefficient (Li, 1956).
Path coefficients are standardized partial regression coefficients which has no units.
Most of the events in agricultural extension and social science research are influenced by many factors. For instance, farmers' adoption of improved technology may depend on education, income, landholding, extension contact, training and information sources.
The influence of the various factors using the traditional statistical methods of correlation and regression only shows relationships but does not point out how the variables exert influence on one another.
Path analysis enables researchers to understand:
- What variables have a direct effect on the dependent variable?
- Which variables affect the dependent variable indirectly through intervening variables?
- How strong are these effects?
Therefore, path analysis is used extensively in extension research, sociology, economics and biometrical genetics.
How Path analysis is different from Regression?
in multiple regression:
\[
Y = b_1X_1 + b_2X_2 + b_3X_3 + e
\]
Where:
\(b_i\) tells the direct effect of \(X_i\) on \(Y\).
Effect is measured by holding other variables constant.
It tells how much change in Y per unit change in X.
It does not tell:
• Through which variable is the effect coming ?
• How much of the correlation is indirect ?
What Path Analysis Adds
• Uses standardized regression coefficients.
• Decomposes correlation into:
– Direct effects
– Indirect effects through other variables
\[
r_{y x_1} = \text{Direct effect} + \text{Indirect via } X_2 + \text{Indirect via } X_3
\]
Popular Use Cases for Extension and Survey Research
Some of the areas where path analysis is used include:
(a) Agricultural Extension
Factors that affect the adoption of agricultural technology
Determinants of crop productivity
Impact of extension services on farmer income
Role of education and training in technology adoption
(b) Survey Research
Determinants of attitudes, perceptions and behavior
Influence of education, income and media exposure on awareness
Factors affecting participation in development programs
(c) Biometrical and Genetic Studies
Contribution of traits to crop yield
The Direct and indirect effects of yield components
Selection criteria in breeding programs
(d) Social Sciences
Factors affecting poverty, employment and development
Relationships between psychological and behavioral factors
Methodology of Path Analysis
Step 1: Identification of Variables
Dependent Variable (\(Y\)): Outcome or Study variable
Example: Adoption level of farmers, Grain yield (q/ha) etc.
Independent Variables (\(X_1, X_2, X_3\)): Factors affecting Y
Example: Education, income, landholding, extension contact, training etc.
Step 2: Conceptual Framework (Path Diagram)
A path diagram is drawn to show relationships among variables.
Example:
- Education (X₁)
- Income (X₂)
- Extension contact (X₃)
- Adoption (Y)
Arrows indicate causal relationships.
Step 3:Calculation of Correlation Coefficients
Compute correlation coefficients among all variables:
\[
r(YX_1),\; r(YX_2),\; r(YX_3)
\]
\[
r(X_1X_2),\; r(X_1X_3),\; r(X_2X_3)
\]
Step 4: Estimation of Path Coefficients (Direct & Indirect Effects):
Path coefficients are standardized regression coefficients obtained using below formulas:
\[
\begin{bmatrix}
r_{x_1y} \\
r_{x_2y} \\
r_{x_3y}
\end{bmatrix}
=
\begin{bmatrix}
r_{x_1x_1} & r_{x_1x_2} & r_{x_1x_3} \\
r_{x_2x_1} & r_{x_2x_2} & r_{x_2x_3} \\
r_{x_3x_1} & r_{x_3x_2} & r_{x_3x_3}
\end{bmatrix}
\begin{bmatrix}
a \\
b \\
c
\end{bmatrix}
\]
Which can be represented as
\[
\mathbf{A} = \mathbf{B} .\mathbf{C}, \quad \mathbf{C} = \mathbf{B}^{-1}.\mathbf{A}
\]
Where C is a vector containing direct effects a, b and c
\[
\text{Indirect Effect of } X_1 \text{ via } X_2
=
b \;(\text{direct effect of } X_2 \text{ on } Y)
\times r_{x_1 x_2}
\]
As per above mentioned formula of indirect effect can be computed for the rest of the indirect effect.
Step 5: Residual Effect
Residual effect shows the influence of variables not included in the study:
\[
\text{Residual Effect}
=
{
1
- a^2
- b^2
- c^2
- 2 r_{x_1 x_2} ab
- 2 r_{x_1 x_3} ac
-2 r_{x_2 x_3} bc
}
\]
Step 6: Interpretation of Results
Identify:
Variables with highest direct effects
Variables with strong indirect effects
Key determinants of the dependent variable
Results and How to Interpret the Results ?
Example of Interpretation
| Variable |
Direct Effect |
Indirect Effect |
Total Effect |
| Education (X1) |
0.45 |
0.20 |
0.65 |
| Income (X2) |
0.30 |
0.15 |
0.45 |
| Extension Contact (X3) |
0.55 |
0.10 |
0.65 |
Interpretation:
- Extension contact has the highest direct effect (0.55) on adoption, indicating it is the most influential factor.
- Education has a strong indirect effect, meaning it influences adoption through income and extension contact.
- Income has moderate direct and indirect effects.
- Variables with high direct effects should be prioritized in extension strategies.
- Residual effect indicates that some variation is explained by other factors not included in the model.
Solved Example (Manual Calculation)
In an extension survey, Adoption (\(Y\)) as dependent variable is influenced by:
- Education (\(X_1\))
- Income (\(X_2\))
- Extension Contact (\(X_3\))
Correlation Matrix
| Correlation Matrix |
Correlation with Y |
|
X1 |
X2 |
X3 |
Variable |
Correlation |
| X1 |
1 |
0.6 |
0.4 |
(Y, X1) |
0.80 |
| X2 |
0.6 |
1 |
0.3 |
(Y, X2) |
0.65 |
| X3 |
0.4 |
0.3 |
1 |
(Y, X3) |
0.50 |
Step 1: Identification of Variables
Dependent variable (\(Y\)): Adoption
Independent variables:
- \(X_1\) = Education
- \(X_2\) = Income
- \(X_3\) = Extension contact
Step 2: Estimation of Path Coefficients
\[
\begin{bmatrix}
r_{x_1y} \\
r_{x_2y} \\
r_{x_3y}
\end{bmatrix}
=
\begin{bmatrix}
r_{x_1x_1} & r_{x_1x_2} & r_{x_1x_3} \\
r_{x_2x_1} & r_{x_2x_2} & r_{x_2x_3} \\
r_{x_3x_1} & r_{x_3x_2} & r_{x_3x_3}
\end{bmatrix}
\begin{bmatrix}
a \\
b \\
c
\end{bmatrix}
\]
Substituting values:
\[
\begin{bmatrix}
0.80 \\
0.65 \\
0.50
\end{bmatrix}
=
\begin{bmatrix}
1 & 0.6 & 0.4 \\
0.6 & 1 & 0.3 \\
0.4 & 0.3 & 1
\end{bmatrix}
\begin{bmatrix}
a \\
b \\
c
\end{bmatrix}
\]
Solving the above equations, we obtain:
\( a = 0.573034 \)
\( b = 0.247191 \)
\( c = 0.196629 \)
Step 3: Indirect Effects Computation
(a) Indirect effects of \(X_1\)
- Via \(X_2\): \( 0.60 \times 0.2472 = 0.1483 \)
- Via \(X_3\): \( 0.40 \times 0.1966 = 0.0786 \)
- Total indirect effect of \(X_1\): \( 0.1483 + 0.0786 = 0.2269 \)
(b) Indirect effects of \(X_2\)
- Via \(X_1\): \( 0.60 \times 0.5730 = 0.3438 \)
- Via \(X_3\): \( 0.30 \times 0.1966 = 0.0590 \)
- Total indirect effect of \(X_2\): \( 0.3438 + 0.0590 = 0.4028 \)
(c) Indirect effects of \(X_3\)
- Via \(X_1\): \( 0.40 \times 0.5730 = 0.2292 \)
- Via \(X_2\): \( 0.30 \times 0.2472 = 0.0742 \)
- Total indirect effect of \(X_3\): \( 0.2292 + 0.0742 = 0.3034 \)
Step 4: Total Effect Check
| Variable |
Direct |
Indirect |
Total |
| X1 |
0.5730 |
0.2269 |
0.8000 |
| X2 |
0.2472 |
0.4028 |
0.6500 |
| X3 |
0.1966 |
0.3034 |
0.5000 |
The total of direct and indirect effect should be equal to correlation of independent variable with dependent variable.
As we all know, path analysis is breaking down of correlation into direct and indirect effect viz.,
\[
r(x_1, y) = 0.8 = \text{Direct Effect } (0.5730) + \text{Total Indirect Effect of } X_1 \; (0.2269)
\]
Step 5: Residual Effect
Residual effect (R):
\[
\text{Residual Effect}
=
1 - a^2 - b^2 - c^2
- 2r_{x_1 x_2}ab
- 2r_{x_1 x_3}ac
-2r_{x_2 x_3}bc
\]
\[
a^2 + b^2 + c^2
=
0.328369 + 0.061103 + 0.038663
=
0.428135
\]
\[
2r_{x_1 x_2}ab = 0.1699980
\]
\[
2r_{x_1 x_3}ac = 0.090172
\]
\[
2r_{x_2 x_3}bc = 0.029163
\]
\[
\text{Sum of all explained variance } (R^2) = 0.717450
\]
\[
\text{Residual Effect } (R)
=
\sqrt{1 - 0.717450}
\]
\[
R = 0.5316
\]
Step 6: Final Result Table
| Variable |
Direct |
Indirect |
Total |
| x1 |
0.573034 |
0.226966292 |
0.8 |
| x2 |
0.247191 |
0.402808989 |
0.65 |
| x3 |
0.196629 |
0.303370787 |
0.5 |
| Residual |
0.531587 |
Step 7: Interpretation
1. Direct Effects
- Education (X₁) has the highest direct effect (0.5730), indicating that it contributes 57% of the direct influence on crop yield.
- Income (X₂) has a moderate direct effect (0.2471), which is about half of the effect of education
- Extension contact (\(X_3\)) has the lowest direct effect (0.1966).
Thus, the order of direct influence is:
2. Indirect Effects
- Education (\(X_1\)) shows substantial indirect influence.
- Income (\(X_2\)) affects yield mainly through indirect pathways.
- Extension contact (\(X_3\)) has indirect effects greater than its direct effect.
3. Comparison of Direct and Indirect Effects
- Education (X1) affects crop yield mainly through direct pathways (Direct > Indirect).
- Income (X2) affects crop yield mainly through indirect pathways (Indirect > Direct).
- Extension contact (X3) affects crop yield predominantly through indirect pathways (Indirect > Direct).
4. Total Effects
- Fertilizer (X1) has the highest total effect (≈ 0.80).
- Irrigation (X2) has a moderate total effect (≈ 0.65).
- Improved seed (X3) has the lowest total effect (≈ 0.50).
Thus, the overall importance of variables is:
5.Residual Effect
- The residual effect value (R = 0.53) indicates that 53% of the variation in crop yield is unexplained by the three variables (X1, X2, X3).
- The residual effect of 0.53 implies that approximately 28.25% of the variation in crop yield is due to other factors not included in the model.
- Other factors such as soil fertility, climate, pest management, and farming practices also play an important role in determining crop yield.
Steps to perform analysis of Path Analysis using Agri Analyze
Agri Analyze is the tool that helps researchers to perform path analysis online.
Step 1:To create a CSV file. Direct link of sample file: Click here
| Score for Adoption of Risk Management Practices |
Age |
Social participation |
Annual income |
Scientific orientation |
Land holding |
Experience |
Risk orientation |
Attitude |
Market orientation |
| 75 |
26 |
8 |
150000 |
8 |
5 |
4 |
8 |
10 |
10 |
| 71 |
28 |
9 |
250000 |
9 |
3 |
6 |
9 |
8 |
8 |
| 55 |
21 |
10 |
350000 |
10 |
3 |
2 |
10 |
9 |
9 |
| 79 |
35 |
4 |
550000 |
10 |
4 |
10 |
5 |
8 |
10 |
| 76 |
33 |
5 |
600000 |
10 |
5 |
8 |
6 |
9 |
10 |
| 80 |
38 |
8 |
800000 |
9 |
3 |
11 |
8 |
7 |
8 |
| 85 |
40 |
9 |
540000 |
10 |
5 |
15 |
9 |
5 |
5 |
| 81 |
44 |
10 |
1100000 |
8 |
8 |
16 |
10 |
6 |
9 |
| 55 |
25 |
9 |
1200000 |
9 |
8 |
2 |
2 |
8 |
10 |
| 56 |
33 |
8 |
1500000 |
6 |
9 |
3 |
5 |
10 |
5 |
| 42 |
21 |
7 |
350000 |
8 |
2 |
2 |
6 |
10 |
6 |
| 89 |
55 |
8 |
850000 |
9 |
5 |
30 |
8 |
8 |
10 |
| 90 |
56 |
5 |
950000 |
10 |
4 |
32 |
10 |
9 |
4 |
| 95 |
60 |
6 |
1600000 |
10 |
10 |
40 |
8 |
5 |
2 |
| 99 |
61 |
8 |
1800000 |
8 |
11 |
2 |
5 |
6 |
3 |
| 36 |
22 |
9 |
1950000 |
9 |
19 |
10 |
6 |
7 |
5 |
| 55 |
35 |
7 |
250000 |
5 |
20 |
6 |
4 |
8 |
8 |
| 66 |
34 |
10 |
365000 |
9 |
19 |
8 |
6 |
6 |
9 |
| 68 |
38 |
5 |
4520000 |
6 |
21 |
8 |
8 |
5 |
10 |
| 90 |
39 |
4 |
360000 |
8 |
5 |
14 |
9 |
8 |
2 |
| 85 |
44 |
8 |
480000 |
9 |
4 |
20 |
10 |
10 |
5 |
| 88 |
42 |
9 |
520000 |
7 |
4 |
15 |
2 |
5 |
4 |
| 86 |
48 |
10 |
1650000 |
8 |
12 |
15 |
3 |
6 |
7 |
| 79 |
49 |
10 |
352000 |
9 |
5 |
11 |
4 |
4 |
8 |
| 55 |
28 |
8 |
450000 |
8 |
5 |
8 |
5 |
5 |
2 |
| 59 |
36 |
9 |
650000 |
9 |
3 |
7 |
6 |
6 |
3 |
| 65 |
38 |
7 |
870000 |
10 |
4 |
10 |
8 |
4 |
4 |
| 79 |
48 |
6 |
1500000 |
9 |
10 |
22 |
9 |
8 |
8 |
| 78 |
49 |
5 |
2500000 |
8 |
12 |
25 |
10 |
9 |
9 |
| 71 |
42 |
8 |
3500000 |
9 |
11 |
18 |
2 |
1 |
10 |
| 79 |
42 |
9 |
560000 |
8 |
5 |
15 |
5 |
10 |
10 |
| 86 |
33 |
10 |
110000 |
9 |
8 |
8 |
8 |
8 |
5 |
| 88 |
31 |
10 |
320000 |
7 |
9 |
9 |
9 |
9 |
6 |
| 85 |
38 |
8 |
1580000 |
9 |
20 |
5 |
10 |
10 |
7 |
| 95 |
39 |
9 |
352000 |
8 |
21 |
9 |
10 |
6 |
8 |
| 95 |
40 |
7 |
1500000 |
9 |
22 |
20 |
8 |
7 |
9 |
| 99 |
49 |
8 |
1150000 |
8 |
23 |
25 |
9 |
5 |
10 |
| 98 |
47 |
9 |
3250000 |
9 |
24 |
21 |
3 |
5 |
10 |
| 75 |
48 |
5 |
3520000 |
8 |
25 |
24 |
5 |
6 |
10 |
| 85 |
46 |
6 |
1520000 |
9 |
26 |
20 |
6 |
7 |
8 |
| 89 |
48 |
8 |
3252200 |
9 |
22 |
25 |
7 |
8 |
9 |
| 88 |
42 |
9 |
3254000 |
7 |
21 |
20 |
8 |
10 |
8 |
| 80 |
41 |
10 |
2250000 |
7 |
23 |
20 |
9 |
10 |
6 |
| 75 |
40 |
10 |
3210000 |
7 |
21 |
18 |
10 |
10 |
7 |
| 44 |
22 |
9 |
1582000 |
8 |
22 |
2 |
5 |
6 |
8 |
Step 2: Click on ANALYTICAL TOOL ->MULTIVARIATE ANALYSIS ->PATH ANALYSIS
Step 3: Open link https://www.agrianalyze.com/PathAnalysis.aspx (For first time users free registration is mandatory)
Step 4: Link Here to download sample file Sample File Download
Step 5: Click submit, pay a nominal fee, and download the output report with detailed interpretation.
Output Report:
Link of the output report
1. Path Coefficients Matrix (Direct and Indirect Effect) in excel
| Trait |
Age |
Social participation |
Annual income |
Scientific orientation |
Land holding |
Experience |
Risk orientation |
Attitude |
Market orientation |
| Age |
0.85897 |
-0.01809 |
-0.07611 |
0.0049 |
0.0227 |
-0.03956 |
0.01544 |
-0.00671 |
-0.00937 |
| Social participation |
-0.20383 |
0.07622 |
0.03642 |
-0.0066 |
0.00495 |
0.01478 |
-0.01409 |
-0.00102 |
-0.00013 |
| Annual income |
0.2708 |
-0.0115 |
-0.24143 |
-0.01261 |
0.09229 |
-0.01848 |
-0.01932 |
-0.00488 |
0.03435 |
| Scientific orientation |
0.07951 |
-0.0095 |
0.05748 |
0.05294 |
-0.05036 |
-0.01079 |
0.02424 |
-0.00529 |
-0.00173 |
| Land holding |
0.13332 |
0.00258 |
-0.15235 |
-0.01823 |
0.14625 |
-0.01121 |
-0.00275 |
-0.00224 |
0.04025 |
| Experience |
0.64614 |
-0.02143 |
-0.08484 |
0.01086 |
0.03119 |
-0.05259 |
0.0451 |
-0.00298 |
0.00122 |
| Risk orientation |
0.06897 |
-0.00559 |
0.02425 |
0.00667 |
-0.00209 |
-0.01233 |
0.19228 |
0.01169 |
-0.01291 |
| Attitude |
-0.20298 |
-0.00273 |
0.04151 |
-0.00986 |
-0.01152 |
0.00551 |
0.07916 |
0.0284 |
0.00664 |
| Market orientation |
-0.06427 |
-0.00008 |
-0.06623 |
-0.00073 |
0.04701 |
-0.00051 |
-0.01983 |
0.00151 |
0.12523 |
2. A publication ready path diagram
3. Mapping of variables for path diagram in excel
| Symbol |
Full Name |
| X1 |
Age |
| X2 |
Social participation |
| X3 |
Annual income |
| X4 |
Scientific orientation |
| X5 |
Land holding |
| X6 |
Experience |
| X7 |
Risk orientation |
| X8 |
Attitude |
| X9 |
Market orientation |
| Y |
Score for Adoption of Risk Management Practices |
Video Tutorial:
Link of the Youtube Tutorial
References
Li, C. C. (1956). The concept of path coefficient and its impact on population genetics. Biometrics, 12(2), 190–210.
Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20, 557–585.
The blog is written by:
Alok Sahu, PhD scholar, Department of Agricultural Statistics, JAU, Junagadh
https://www.linkedin.com/in/alok-sahu-7116a62a3/