Evaluation and Comparison of Different Data Mining Models for Identifying Areas at Risk of Gully Erosion (Study Area: Mian Ab Watershed in Khuzestan Province)

Document Type : Full length article

Authors

1 Master's student in Remote Sensing and Geographical Information System, University of Isfahan

2 Assistant Professor, Faculty of Geography Sciences and Planning, University of Isfahan, Isfahan, Iran.

10.22059/jphgr.2025.387445.1007862

Abstract

Introduction

Soil erosion and sediment production are major limitations in the use of water and soil resources. Today, gully erosion is becoming one of the most important types of erosion worldwide, and thus has garnered significant attention from researchers in recent decades. Various studies have been conducted on how gully erosion occurs and develops in different climates. In many regions, a significant portion of the sediment produced in watersheds is also due to gully erosion, with approximately 125 million hectares of Iran's 165 million hectares of land exposed to water erosion.

Soil erosion leads to soil degradation and abandonment of farmland, resulting in irreparable damage. Developing appropriate strategies for preventing and mitigating gully erosion requires a complete understanding of its dynamics and controlling factors. Therefore, to prevent the rapid growth of gullies or to minimize the damage they cause, it is possible to reduce the risk of this type of erosion by using hazard mapping and identifying the effective factors.

Given the development of machine learning models and their successful performance in various scientific fields, many researchers have utilized machine learning models for hazard mapping and predicting erosion risk. The results indicate the successful and accurate performance of these models. This study also evaluates the effectiveness of two machine learning algorithms, SVM and CART, in mapping the risk of gully erosion.

Methodology (Materials and methods or Data and methods)

In this study, the sensitivity of gully erosion in the Miānāb-Shushtar watershed has been investigated, and machine learning methods have been utilized to predict gully erosion sensitivity. In the first step, a map of gully locations has been prepared, using various methods and tools including satellite images, aerial photographs, and field visits. Subsequently, topographic indices such as elevation, slope, slope aspect, soil texture, Stream Power Index (SPI), Topographic Wetness Index (TWI), vegetation cover (NDVI), lithology, distance from rivers, Terrain Ruggedness Index (TRI), distance from roads, soil erodibility index (K), rainfall erosivity index (R), and drainage density index are examined as environmental parameters influencing gully erosion occurrence. In the next step, 70% of the gullies under study are randomly selected and used as training data, while the remaining 30% are utilized as validation data. In the following stage, the map of gully locations is entered into the SVM and CART models as the dependent variable, with the environmental layers serving as independent variables to model the occurrence of gully erosion. To validate the models, the locations of existing gully erosion in the validation dataset are compared with the gully erosion sensitivity map obtained from the models.

Results and Discussion

In this study, the variables of landforms, elevation, slope, slope direction and length, vegetation cover, soil texture, distance from roads, land use, lithology, soil erodibility, topographic moisture, flow power, drainage density, erosive rainfall, and distance from rivers were selected and examined as influential factors in gully erosion. Erosion points were used as the dependent variable in this research. Field surveys and ground surveys were employed to collect these points. The exact locations of the gullies were recorded using handheld GPS and then reviewed and corrected using Google Earth software. In total, 3,000 gully erosion points were collected, representing the spatial distribution of this phenomenon in the area. Most of the points affected by this type of erosion are located in the southern and eastern geographical regions of the watershed.

Next, to obtain a potential gully erosion map for the watershed, layers of the studied indicators were prepared. After preparing the independent and dependent variables, a risk zoning map for gully erosion was created using the CART model in R software. The correlation coefficient between the predicted values of the CART model and the observed values was 0.889, indicating a strong positive relationship. The R² coefficient for this model was calculated to be 0.791, which is considered an appropriate level of determination for models related to gully erosion.

According to the results of this model, distance from roads was identified as the most significant factor affecting gully erosion in the Miānāb watershed, followed by slope direction and the LS topographic factor in second and third places, highlighting the importance of topographic features. The Stream Power Index (SPI) and soil texture were identified as the fourth and fifth important variables, respectively. According to the zoning map produced by the CART model, areas with very high risk are primarily concentrated in the eastern and southeastern parts of the basin, which coincide with sloped and foothill lands. Areas with high risk are distributed in a band adjacent to these regions. In contrast, areas with moderate risk are mainly located in the center of the basin and near stream networks.

The results from the SVM model demonstrate its significant performance in predicting and assessing erosion. In this study, the model was trained using 2,100 samples and tested with 900 samples. The evaluation results indicated a correlation coefficient of 0.92 between observed and predicted values, showing a very strong correlation between actual and predicted data. Additionally, R² was calculated to be 0.846. Statistical indicators suggest that the SVM model successfully identified and modeled the complex patterns of gully erosion in the study area. Sensitivity analysis indicated that the most important factors affecting the SVM model included soil texture with an importance coefficient of 0.156 and the NDVI index with a coefficient of 0.150. According to the zoning map from the SVM model, the "very low risk" class is widely observed in the central and western sections of the basin, indicating relative stability and favorable environmental conditions in these areas. On the other hand, "high risk" and "very high risk" classes are mainly concentrated in the eastern sections, close to tributaries and relatively steeper slopes. This geographical distribution may occur due to the high density of waterways alongside other influencing indicators of this phenomenon.

Based on the results obtained from SVM and CART models,

Keywords

Main Subjects



Articles in Press, Accepted Manuscript
Available Online from 25 October 2025
  • Receive Date: 22 December 2024
  • Revise Date: 16 July 2025
  • Accept Date: 20 July 2025