Path Analysis in Survey and Extension Research with Agri Analyze

Summary

Path analysis is the statistical procedure of inferring direct and indirect relationships among variables. In extension and survey research, the method assists a researcher in establishing how various factors influence a response of interest, such as farmer’s adoption of technologies, productivity, income or attitudes. By means of breaking down correlations into their respective direct and indirect effects, path analysis enables insights that are deeper than those facilitated by simple correlation or regression analysis. This blog provides a basic explanation of the concept, methodology interpretation of results through a solved example and steps for analysis in Agri Analyze in simple language.

1.Introduction

Path coefficient analysis was developed Sewall Wright in (1921). It’s a multivariate analysis technique which deals with a closed system of variables which are linearly related. Path analysis provides the direct effect of a character on the dependent character as well as its indirect effects through other variables in the system. Path analysis is analogous to the analysis of variance and may be called as analysis of correlation coefficient (Li, 1956). Path coefficients are standardized partial regression coefficients which has no units. Most of the events in agricultural extension and social science research are influenced by many factors. For instance, farmers' adoption of improved technology may depend on education, income, landholding, extension contact, training and information sources. The influence of the various factors using the traditional statistical methods of correlation and regression only shows relationships but does not point out how the variables exert influence on one another.

Path analysis enables researchers to understand:

  • What variables have a direct effect on the dependent variable?
  • Which variables affect the dependent variable indirectly through intervening variables?
  • How strong are these effects?
Therefore, path analysis is used extensively in extension research, sociology, economics and biometrical genetics.

How Path analysis is different from Regression?

in multiple regression:

\[ Y = b_1X_1 + b_2X_2 + b_3X_3 + e \]

Where:

  • \(b_i\) tells the direct effect of \(X_i\) on \(Y\).
  • Effect is measured by holding other variables constant.
  • It tells how much change in Y per unit change in X.
  • It does not tell:

    • Through which variable is the effect coming ?

    • How much of the correlation is indirect ?

    What Path Analysis Adds

    • Uses standardized regression coefficients.

    • Decomposes correlation into:

    – Direct effects

    – Indirect effects through other variables

    \[ r_{y x_1} = \text{Direct effect} + \text{Indirect via } X_2 + \text{Indirect via } X_3 \]

    Popular Use Cases for Extension and Survey Research

    Some of the areas where path analysis is used include:

    (a) Agricultural Extension

  • Factors that affect the adoption of agricultural technology
  • Determinants of crop productivity
  • Impact of extension services on farmer income
  • Role of education and training in technology adoption
  • (b) Survey Research

  • Determinants of attitudes, perceptions and behavior
  • Influence of education, income and media exposure on awareness
  • Factors affecting participation in development programs
  • (c) Biometrical and Genetic Studies

  • Contribution of traits to crop yield
  • The Direct and indirect effects of yield components
  • Selection criteria in breeding programs
  • (d) Social Sciences

  • Factors affecting poverty, employment and development
  • Relationships between psychological and behavioral factors
  • Methodology of Path Analysis

    Step 1: Identification of Variables

    Dependent Variable (\(Y\)): Outcome or Study variable Example: Adoption level of farmers, Grain yield (q/ha) etc.

    Independent Variables (\(X_1, X_2, X_3\)): Factors affecting Y Example: Education, income, landholding, extension contact, training etc.

    Step 2: Conceptual Framework (Path Diagram)

    A path diagram is drawn to show relationships among variables.

      Example:
    • Education (X₁)
    • Income (X₂)
    • Extension contact (X₃)
    • Adoption (Y)
    Arrows indicate causal relationships.

    Step 3:Calculation of Correlation Coefficients

    Compute correlation coefficients among all variables:

    \[ r(YX_1),\; r(YX_2),\; r(YX_3) \] \[ r(X_1X_2),\; r(X_1X_3),\; r(X_2X_3) \]

    Step 4: Estimation of Path Coefficients (Direct & Indirect Effects):

    Path coefficients are standardized regression coefficients obtained using below formulas:

    \[ \begin{bmatrix} r_{x_1y} \\ r_{x_2y} \\ r_{x_3y} \end{bmatrix} = \begin{bmatrix} r_{x_1x_1} & r_{x_1x_2} & r_{x_1x_3} \\ r_{x_2x_1} & r_{x_2x_2} & r_{x_2x_3} \\ r_{x_3x_1} & r_{x_3x_2} & r_{x_3x_3} \end{bmatrix} \begin{bmatrix} a \\ b \\ c \end{bmatrix} \]

    Which can be represented as \[ \mathbf{A} = \mathbf{B} .\mathbf{C}, \quad \mathbf{C} = \mathbf{B}^{-1}.\mathbf{A} \]

    Where C is a vector containing direct effects a, b and c

    \[ \text{Indirect Effect of } X_1 \text{ via } X_2 = b \;(\text{direct effect of } X_2 \text{ on } Y) \times r_{x_1 x_2} \]

    As per above mentioned formula of indirect effect can be computed for the rest of the indirect effect.

    Step 5: Residual Effect

    Residual effect shows the influence of variables not included in the study:

    \[ \text{Residual Effect} = { 1 - a^2 - b^2 - c^2 - 2 r_{x_1 x_2} ab - 2 r_{x_1 x_3} ac -2 r_{x_2 x_3} bc } \]

    Step 6: Interpretation of Results

    Identify:

  • Variables with highest direct effects
  • Variables with strong indirect effects
  • Key determinants of the dependent variable
  • Results and How to Interpret the Results ?

    Example of Interpretation

    Variable Direct Effect Indirect Effect Total Effect
    Education (X1) 0.45 0.20 0.65
    Income (X2) 0.30 0.15 0.45
    Extension Contact (X3) 0.55 0.10 0.65
    Interpretation:
    • Extension contact has the highest direct effect (0.55) on adoption, indicating it is the most influential factor.
    • Education has a strong indirect effect, meaning it influences adoption through income and extension contact.
    • Income has moderate direct and indirect effects.
    • Variables with high direct effects should be prioritized in extension strategies.
    • Residual effect indicates that some variation is explained by other factors not included in the model.

    Solved Example (Manual Calculation)

    In an extension survey, Adoption (\(Y\)) as dependent variable is influenced by:

    • Education (\(X_1\))
    • Income (\(X_2\))
    • Extension Contact (\(X_3\))

    Correlation Matrix

    Correlation Matrix Correlation with Y
    X1 X2 X3 Variable Correlation
    X1 1 0.6 0.4 (Y, X1) 0.80
    X2 0.6 1 0.3 (Y, X2) 0.65
    X3 0.4 0.3 1 (Y, X3) 0.50

    Step 1: Identification of Variables

    Dependent variable (\(Y\)): Adoption

    Independent variables:

    • \(X_1\) = Education
    • \(X_2\) = Income
    • \(X_3\) = Extension contact

    Step 2: Estimation of Path Coefficients

    \[ \begin{bmatrix} r_{x_1y} \\ r_{x_2y} \\ r_{x_3y} \end{bmatrix} = \begin{bmatrix} r_{x_1x_1} & r_{x_1x_2} & r_{x_1x_3} \\ r_{x_2x_1} & r_{x_2x_2} & r_{x_2x_3} \\ r_{x_3x_1} & r_{x_3x_2} & r_{x_3x_3} \end{bmatrix} \begin{bmatrix} a \\ b \\ c \end{bmatrix} \]

    Substituting values:

    \[ \begin{bmatrix} 0.80 \\ 0.65 \\ 0.50 \end{bmatrix} = \begin{bmatrix} 1 & 0.6 & 0.4 \\ 0.6 & 1 & 0.3 \\ 0.4 & 0.3 & 1 \end{bmatrix} \begin{bmatrix} a \\ b \\ c \end{bmatrix} \]

    Solving the above equations, we obtain:

    \( a = 0.573034 \)

    \( b = 0.247191 \)

    \( c = 0.196629 \)

    Step 3: Indirect Effects Computation

    (a) Indirect effects of \(X_1\)

    • Via \(X_2\): \( 0.60 \times 0.2472 = 0.1483 \)
    • Via \(X_3\): \( 0.40 \times 0.1966 = 0.0786 \)
    • Total indirect effect of \(X_1\): \( 0.1483 + 0.0786 = 0.2269 \)

    (b) Indirect effects of \(X_2\)

    • Via \(X_1\): \( 0.60 \times 0.5730 = 0.3438 \)
    • Via \(X_3\): \( 0.30 \times 0.1966 = 0.0590 \)
    • Total indirect effect of \(X_2\): \( 0.3438 + 0.0590 = 0.4028 \)

    (c) Indirect effects of \(X_3\)

    • Via \(X_1\): \( 0.40 \times 0.5730 = 0.2292 \)
    • Via \(X_2\): \( 0.30 \times 0.2472 = 0.0742 \)
    • Total indirect effect of \(X_3\): \( 0.2292 + 0.0742 = 0.3034 \)

    Step 4: Total Effect Check

    Variable Direct Indirect Total
    X1 0.5730 0.2269 0.8000
    X2 0.2472 0.4028 0.6500
    X3 0.1966 0.3034 0.5000

    The total of direct and indirect effect should be equal to correlation of independent variable with dependent variable. As we all know, path analysis is breaking down of correlation into direct and indirect effect viz.,

    \[ r(x_1, y) = 0.8 = \text{Direct Effect } (0.5730) + \text{Total Indirect Effect of } X_1 \; (0.2269) \]

    Step 5: Residual Effect

    Residual effect (R):

    \[ \text{Residual Effect} = 1 - a^2 - b^2 - c^2 - 2r_{x_1 x_2}ab - 2r_{x_1 x_3}ac -2r_{x_2 x_3}bc \]

    \[ a^2 + b^2 + c^2 = 0.328369 + 0.061103 + 0.038663 = 0.428135 \]

    \[ 2r_{x_1 x_2}ab = 0.1699980 \]

    \[ 2r_{x_1 x_3}ac = 0.090172 \]

    \[ 2r_{x_2 x_3}bc = 0.029163 \]

    \[ \text{Sum of all explained variance } (R^2) = 0.717450 \]

    \[ \text{Residual Effect } (R) = \sqrt{1 - 0.717450} \]

    \[ R = 0.5316 \]

    Step 6: Final Result Table

    Variable Direct Indirect Total
    x1 0.573034 0.226966292 0.8
    x2 0.247191 0.402808989 0.65
    x3 0.196629 0.303370787 0.5
    Residual 0.531587

    Step 7: Interpretation

    1. Direct Effects
    • Education (X₁) has the highest direct effect (0.5730), indicating that it contributes 57% of the direct influence on crop yield.
    • Income (X₂) has a moderate direct effect (0.2471), which is about half of the effect of education
    • Extension contact (\(X_3\)) has the lowest direct effect (0.1966).

    Thus, the order of direct influence is:

    • \(X_1 > X_2 > X_3\)
    2. Indirect Effects
    • Education (\(X_1\)) shows substantial indirect influence.
    • Income (\(X_2\)) affects yield mainly through indirect pathways.
    • Extension contact (\(X_3\)) has indirect effects greater than its direct effect.
    3. Comparison of Direct and Indirect Effects
    • Education (X1) affects crop yield mainly through direct pathways (Direct > Indirect).
    • Income (X2) affects crop yield mainly through indirect pathways (Indirect > Direct).
    • Extension contact (X3) affects crop yield predominantly through indirect pathways (Indirect > Direct).
    4. Total Effects
    • Fertilizer (X1) has the highest total effect (≈ 0.80).
    • Irrigation (X2) has a moderate total effect (≈ 0.65).
    • Improved seed (X3) has the lowest total effect (≈ 0.50).

    Thus, the overall importance of variables is:

    • X1 > X2 > X3
    5.Residual Effect
    • The residual effect value (R = 0.53) indicates that 53% of the variation in crop yield is unexplained by the three variables (X1, X2, X3).
    • The residual effect of 0.53 implies that approximately 28.25% of the variation in crop yield is due to other factors not included in the model.
    • Other factors such as soil fertility, climate, pest management, and farming practices also play an important role in determining crop yield.

    Steps to perform analysis of Path Analysis using Agri Analyze

    Agri Analyze is the tool that helps researchers to perform path analysis online.

    Step 1:To create a CSV file. Direct link of sample file: Click here

    Score for Adoption of Risk Management Practices Age Social participation Annual income Scientific orientation Land holding Experience Risk orientation Attitude Market orientation
    75 26 8 150000 8 5 4 8 10 10
    71 28 9 250000 9 3 6 9 8 8
    55 21 10 350000 10 3 2 10 9 9
    79 35 4 550000 10 4 10 5 8 10
    76 33 5 600000 10 5 8 6 9 10
    80 38 8 800000 9 3 11 8 7 8
    85 40 9 540000 10 5 15 9 5 5
    81 44 10 1100000 8 8 16 10 6 9
    55 25 9 1200000 9 8 2 2 8 10
    56 33 8 1500000 6 9 3 5 10 5
    42 21 7 350000 8 2 2 6 10 6
    89 55 8 850000 9 5 30 8 8 10
    90 56 5 950000 10 4 32 10 9 4
    95 60 6 1600000 10 10 40 8 5 2
    99 61 8 1800000 8 11 2 5 6 3
    36 22 9 1950000 9 19 10 6 7 5
    55 35 7 250000 5 20 6 4 8 8
    66 34 10 365000 9 19 8 6 6 9
    68 38 5 4520000 6 21 8 8 5 10
    90 39 4 360000 8 5 14 9 8 2
    85 44 8 480000 9 4 20 10 10 5
    88 42 9 520000 7 4 15 2 5 4
    86 48 10 1650000 8 12 15 3 6 7
    79 49 10 352000 9 5 11 4 4 8
    55 28 8 450000 8 5 8 5 5 2
    59 36 9 650000 9 3 7 6 6 3
    65 38 7 870000 10 4 10 8 4 4
    79 48 6 1500000 9 10 22 9 8 8
    78 49 5 2500000 8 12 25 10 9 9
    71 42 8 3500000 9 11 18 2 1 10
    79 42 9 560000 8 5 15 5 10 10
    86 33 10 110000 9 8 8 8 8 5
    88 31 10 320000 7 9 9 9 9 6
    85 38 8 1580000 9 20 5 10 10 7
    95 39 9 352000 8 21 9 10 6 8
    95 40 7 1500000 9 22 20 8 7 9
    99 49 8 1150000 8 23 25 9 5 10
    98 47 9 3250000 9 24 21 3 5 10
    75 48 5 3520000 8 25 24 5 6 10
    85 46 6 1520000 9 26 20 6 7 8
    89 48 8 3252200 9 22 25 7 8 9
    88 42 9 3254000 7 21 20 8 10 8
    80 41 10 2250000 7 23 20 9 10 6
    75 40 10 3210000 7 21 18 10 10 7
    44 22 9 1582000 8 22 2 5 6 8

    Step 2: Click on ANALYTICAL TOOL ->MULTIVARIATE ANALYSIS ->PATH ANALYSIS

    Step 3: Open link https://www.agrianalyze.com/PathAnalysis.aspx (For first time users free registration is mandatory)

    Step 4: Link Here to download sample file Sample File Download

    Step 5: Click submit, pay a nominal fee, and download the output report with detailed interpretation.

    Output Report: Link of the output report

    1. Path Coefficients Matrix (Direct and Indirect Effect) in excel

    Trait Age Social participation Annual income Scientific orientation Land holding Experience Risk orientation Attitude Market orientation
    Age 0.85897 -0.01809 -0.07611 0.0049 0.0227 -0.03956 0.01544 -0.00671 -0.00937
    Social participation -0.20383 0.07622 0.03642 -0.0066 0.00495 0.01478 -0.01409 -0.00102 -0.00013
    Annual income 0.2708 -0.0115 -0.24143 -0.01261 0.09229 -0.01848 -0.01932 -0.00488 0.03435
    Scientific orientation 0.07951 -0.0095 0.05748 0.05294 -0.05036 -0.01079 0.02424 -0.00529 -0.00173
    Land holding 0.13332 0.00258 -0.15235 -0.01823 0.14625 -0.01121 -0.00275 -0.00224 0.04025
    Experience 0.64614 -0.02143 -0.08484 0.01086 0.03119 -0.05259 0.0451 -0.00298 0.00122
    Risk orientation 0.06897 -0.00559 0.02425 0.00667 -0.00209 -0.01233 0.19228 0.01169 -0.01291
    Attitude -0.20298 -0.00273 0.04151 -0.00986 -0.01152 0.00551 0.07916 0.0284 0.00664
    Market orientation -0.06427 -0.00008 -0.06623 -0.00073 0.04701 -0.00051 -0.01983 0.00151 0.12523

    2. A publication ready path diagram

    3. Mapping of variables for path diagram in excel

    Symbol Full Name
    X1 Age
    X2 Social participation
    X3 Annual income
    X4 Scientific orientation
    X5 Land holding
    X6 Experience
    X7 Risk orientation
    X8 Attitude
    X9 Market orientation
    Y Score for Adoption of Risk Management Practices

    Video Tutorial: Link of the Youtube Tutorial

    References

    Li, C. C. (1956). The concept of path coefficient and its impact on population genetics. Biometrics, 12(2), 190–210.

    Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20, 557–585.

    The blog is written by:
    Alok Sahu, PhD scholar, Department of Agricultural Statistics, JAU, Junagadh

    https://www.linkedin.com/in/alok-sahu-7116a62a3/