Agricultural Science and Food Processing
ISSN: 3066-1579 (Online) | ISSN: 3066-1560 (Print)
Email: [email protected]
Food security is a cornerstone of human survival and sustainable economic development. However, frequent meteorological disasters—such as droughts, floods, and extreme temperatures—severely threaten agricultural productivity, leading to significant crop losses and destabilizing food supply chains [1, 2]. Accurate prediction of agricultural disaster losses is critical for proactive disaster mitigation, resource allocation, and policy formulation to safeguard food production [1, 3, 4, 5, 6, 7, 8].
Therefore, it is very important and meaningful to study how to quickly and accurately predict the loss rate of disasters. However, due to the irregularity of the frequency and degree of meteorological disasters, predicting disaster losses is a difficult scientific problem, which raising an urgent need to develop a fast and feasible method [9, 10].
Traditional approaches to loss assessment often rely on post-disaster surveys, which are time-consuming and reactive. Time series prediction methods, which analyze historical loss data to forecast future trends, offer a promising alternative by enabling preemptive risk management [6]. Based on historical loss sequence, and past loss values as input data, a forecast model that conforms to the loss variation law is constructed to predict future loss.
However, the formation of disaster losses is very complex and influenced by multiple factors, often exhibiting nonlinear and unstable characteristics, making it difficult to building an effective forecasting model. In addition, agricultural disaster losses exhibit complex nonlinear and non-stationary patterns due to interactions between climate variability, soil conditions, and crop resilience, making accurate prediction challenging [11].
In the field of agricultural disaster loss prediction, due to the seasonality, periodicity, and high randomness of agricultural production data, traditional statistical methods often fail to precisely predict crop yield reduction and economic losses. Machine learning methods can effectively capture these complex nonlinear characteristics, showcasing stronger advantages in agricultural yield prediction, pest outbreak forecasting, and disaster loss estimation. The time series prediction methods mainly include statistical and machine learning methods. The statistical method utilizes statistical equations to construct prediction models. Traditional statistical models include autoregressive (AR) [12], autoregressive moving average (ARMA) [13], autoregressive integral moving average (ARIMA) [14] models, and so on. The AR model can fit the relationship between regression variables themselves. The iterative relationship between adjacent variables is represented by a linear combination of historical data. The moving average (MA) model introduces sliding windows to extract variation features. The integration of AR and MA can more accurately simulate time sequence. Afterwards, the ARIMA method was also developed and widely applied [15, 16].
In addition, other statistical methods are also popular, such as exponential smoothing model (ES) [17, 18], cubic polynomial curve fitting model (CPCF) [19], and grey model (GM) [20], etc. These statistical methods have shown good performance in solving low dimensional and linear problems [21], but have weak performance in nonlinear prediction, which cannot be predicted well meteorological disaster losses with complex, nonlinear, multi-dimensional, and uncertain characteristics.
Machine learning (ML) algorithms have achieved satisfactory results in regression [22]. It can effectively fit nonlinear relationships between multi-dimensional variables by constructing complex learning networks and significantly enhance forecast accuracy [2, 16, 23], which become a possible solution for loss prediction [15, 24, 25].
In the past few decades, various machine learning models for time series prediction have been proposed, such as artificial neural network (ANN) [26, 27, 28, 29], backpropagation neural network (BPNN) [30, 31], generalized regression neural network (GRNN) [32], recurrent neural network (RNN) [33], long short-term memory network (LSTM) [34, 35], Gated Recurrent Unit network (GRU) [36, 37], radial basis function network (RBF) [38], support vector machine model (SVM) [39, 40, 41, 42], extreme learning machines (ELMs), and so on.
Machine learning method performs well in disaster loss rate estimation, with robust analysis and processing capabilities for high-dimensional and nonlinear data. Therefore, research in this area has gradually increased [6, 28, 43, 44, 45, 46, 47]. For example, some researchers have utilized BPNN [48, 49], RF [2, 50, 51], or Support Vector Machine (SVM) [4, 52, 53] for disaster loss rate estimation and have achieved excellent results, which provide a mature research foundation [7]. For example, ANN and LSTM neural network models have been widely applied in crop yield forecasting, crop growth simulation, and pest occurrence risk assessment, demonstrating excellent prediction performance. The SVM method, due to its robustness in small sample problems, has been successfully used in agricultural disaster loss prediction (e.g., damage to coastal farmland due to storm surges) and food supply chain disruption forecasting.
The following is a brief overview and application of some main machine learning algorithms.
Artificial neural network (ANN) has many advantages such as nonlinearity, adaptability, parallelism, robustness, and strong computing power [54], and have been applied in many fields for prediction, such as economic growth estimation [55, 56], stock price prediction [57, 58], and exchange rate prediction [59, 60]. The numerical experiment indicates that the performance of various neural networks with different network structures varies and is suitable for different situations [38].
For instance, BPNN is a classic network in time series analysis. Even many scholars use heuristic optimization algorithms to optimize its initial weights and thresholds in order to improve model accuracy. However, it still has drawbacks such as slow convergence speed, low running efficiency and poor generalization ability [11, 61, 62].
RNN is a typical algorithm. In RNN, the hidden layer endows memory function by connecting nodes in different components, but it has a divergence issue over a long period of time. Therefore, LSTM was proposed by introducing gate components. Furthermore, it was improved by redesigning the gate structure [63], such as bidirectional LSTM and GRU.
Additionally, GRNN performs well in small sample fitting [64].
Support Vector Machine (SVM) is a new type of machine learning method, which has many strengths to process nonlinear data, global optimization, strong adaptability, and strong generalization ability, especially for complex problems such as small samples, nonlinearity, and high-dimensional. Compared with neural network models, support vector machine models require less training data [6]. Therefore, it is an effective tool and more suitable for simulating and predicting disaster loss sequences [65].
Based on traditional SVM and by combining regularization theory, Least Squares Support Vector Machine (LSSVM) is developed, which transforms inequality constraints into equality constraints, and uses quadratic programming method to solve function estimation problems, greatly improving convergence speed [66]. Overall, LSSVM can not only solve problems with few samples and nonlinearity, but also has many advanced properties (such as simple operation, fast convergence speed and high prediction accuracy, etc.), which can be used for disaster loss prediction [6, 11]. However, it should also be noted that non-stationary time series have a significant impact on their prediction accuracy [11].
At present, researches on disaster loss prediction are mainly focused on earthquakes [67, 68, 69], tropical cyclones [70, 71, 72, 73], floods [74, 75, 76, 77], storm surges [78],Wang et al. [79] and forest fires [16, 80].
Loss prediction models include two forms, i.e. single and combined models.
Here are some cases of single model. Wang et al. [79] estimate direct losses of storm surges by using GIS and open data. Yin et al. [78] established a grey correlation model of storm surge disaster losses in coastal areas of China. Jin et al. [4] using SVM to predicted storm surge disaster loss with small sample data. Feng et al. [81] forecasted the direct economic losses and sufferers of storm surges separately based on SVM and BPNN. Zhang et al. [82] evaluated the accuracy of five models, including BPNN, one-dimensional convolutional neural network, decision tree (DT), random forest (RF), and extreme gradient enhancement (XGBoost), in constructing mudslides prediction models. Lou et al. [83] constructed a loss assessment model of tropical cyclone based on SVM. Cao et al. [84] used an improved grey model to assess the direct economic losses caused by marine disasters in coastal cities of China. He et al. [85] used dynamic recurrent neural networks to predict flood disaster losses. Ye et al. [86] discussed the feasibility of artificial neural networks, nonlinear regression, and EI Niño for predicting storm surge disasters. Yang et al. [87] utilized the extended Kalman filter to forecast the economic losses and casualties caused by storm surge. Wu et al. [11] applied LSSVM to predict the economic losses of waterlogging in the subway station project. As research deepens, more and more models are available, but choosing the appropriate model still poses challenges [88].
Besides single model, many researchers proposed a combination model to improve the prediction accuracy [89, 90]. For example, Chen et al. [91] combined GA Elman neural network (ENN), support vector regression (SVR), and GRNN into a comprehensive evaluation model for predicting tropical cyclone losses. Feng and Liu [81] believe that the joint model of BP and SVM can better predict the economic losses by storm surges. Zhao et al. [15] combined the results of ENN and GRNN to achieve interval prediction of economic losses caused by storm surge disasters. They think that the performance of the combined model is better than that of the single model. Meng et al. [92] used four machine learning models to predict the direct economic losses caused by tropical cyclones in Guangdong Province. Yang et al. [93] predicted the affected population caused by tropical cyclones based on a mixed model of the generalized additive model(GAM) and XGBoost.
Time series prediction has been a research hotspot in the past decade. Traditional statistical methods and machine learning methods have been widely studied and applied.
Machine learning methods have strong learning abilities, which can automatically learn hidden feature information in data and capture nonlinear relationships. However, when the data is complex and unstable, this learning ability requires a large amount of data for training and optimization to achieve excellent performance with high accuracy and strong robustness [37].
However, due to the complexity, irregularity, noise, and instability of meteorological disasters, as well as difficulties in data collection, loss time sequences also exhibit small sample size, strong randomness, high volatility, weak regularity, non-stationary and nonlinear characteristics [6, 15, 37]. So, the prediction results from the original dataset are often not satisfactory.
Small Sample Problem: Agricultural production data often suffer from short recording periods and limited effective data, which not only reduces the model's generalization ability but also directly impacts the applicability of prediction results in real agricultural decision-making. Therefore, data augmentation techniques (e.g., interpolation, information diffusion methods) to generate additional data samples help improve the predictive capability of agricultural production models, making them more suitable for practical agricultural scenarios.
Non-stationary Problem: Agricultural production data is often influenced by crop growth periods, seasonal changes, and disaster occurrence frequencies, resulting in noticeable non-stationarity. Applying Empirical Mode Decomposition (EMD) to decompose agricultural disaster loss sequences allows for the extraction of different frequency loss patterns, helping to build more stable and reliable agricultural loss prediction models.
In summary, small sample size and non-stationary of data in loss prediction are the two main issues.
The small sample size problem often led to large errors and poor performance in loss estimation model. It inevitably brings some limitations to machine learning based loss rate estimation models, such as easy overfitting and poor generalization ability [7, 37]. Therefore, in this case, choosing a suitable learning algorithm, or finding a method to enhance the information of the original data, is currently urgently needed [2].
Some researchers proposed data augmentation technology to address the issue [37], such as virtual sample generation (VSG) [94, 95]. The strategy of VSG is to extract prior information from a given small sample data, and then generate new virtual samples to fill the information gap between the original samples, which can obtain more samples containing the original data features [37]. These virtual samples have similarity. Its data distribution and statistical characteristics are the same as the original sample [37].
There are also many VSG techniques applied in small samples. For example, Li et al. [96] use an information diffusion model (IDM) to generate virtual loss samples. Huang [97] also proposed an information diffusion model, which is a fuzzy statistical technique that converts single point samples into set valued samples, which can effectively utilize fuzzy information in small samples to fill information gaps. Additionally, Sun [2] proposed a new data augmentation technique called k-nearest neighbor Gaussian noise method (KNN-GN), which generates virtual samples by adding Gaussian noise to the original sample. Rogoza [98] proposed local extrapolation to simulate small samples less than ten. Yuan et al. [1] and Sun et al. [2] generate virtual samples through interpolation.
Presently, some scholars have conducted researches on small sample prediction. For example, Dan et al. [99] established a small sample preprocessing model to enhance data stationarity, and then used simulated annealing algorithm and SVM for prediction. Rajesh et al. [100] and Deng et al. [101] also applied grey theory to traditional small sample prediction. Sun et al. [2] combined KNN-GN with XGBoost to construct a prediction model, which can quickly and accurately estimate direct economic losses in a short period of time after a storm surge occurs.
On this issue, some studies have shown that the Empirical Mode Decomposition (EMD) can decompose the original sequence into a set of IMF (Intrinsic Mode Function) components with different frequencies and residuals. After decomposition, these subsequences are relatively stationary, which are easy to simulate, can fully mine data information, better reflect the physical characteristics of the original data, and extract linear features from nonlinear time series. Further, this method shows strong universality in processing non-stationary data [97].
This technology has also been applied to loss prediction. For instance, Chai et al. [6] decomposed the time series of ship collision conflicts into combinations of different frequency subsequences. Each subsequence displayed a more regular frequency range than the original conflict sequence. Subsequently, different LSSVMs were established for each IMF components. The final prediction result of the ship collision conflict number was obtained by summarizing the prediction results of each subsequence. After this step, the prediction accuracy was greatly improved [6].
Besides the above solutions in a single model, some ensemble learning strategies that use multiple machine learners also begun to emerge [9, 20, 37, 102, 103, 104, 105]. Ensemble learning is a learning strategy that combines multiple learners together to reduce bias, achieve superior generalization ability, improve the accuracy and reliability of prediction, and has already achieved excellent performance for predication [2, 106, 107]. Compared to a single machine learning model, it can absorb the advantages of a single method and more effectively extract the features of data. In addition to single and combined models, ensemble learning and optimization algorithms are gradually gaining attention in agricultural disaster prediction. For example, integrating optimization techniques such as Genetic Algorithms (GA) and Particle Swarm Optimization (PSO) with machine learning models for crop yield prediction, livestock loss estimation, and food supply chain forecasting further improves the model's accuracy. Additionally, the prediction results can directly support agricultural insurance pricing, disaster risk zoning, and food emergency reserve management.
Especially for small sample data, it can effectively fit nonlinear functions, and comprehensively extract high-dimensional and temporal features of data. Even for the loss prediction, whether it is the affected populations or economic losses, the composite prediction has a smaller error than the two single predictions. For instance, Du et al., [7] constructed a new combination model, namely Elman neural network-Generalized regression neural network-Definite integral model (ENN-GRNN-DI), for interval prediction of disaster losses [2, 7, 15].
In addition, some researches combined optimization algorithms and machine learning model. They adopted optimization methods to optimize the hyperparameters of machine learning algorithms to improve prediction accuracy. For example, Wang et al. [108] and Yuan et al. [1] respectively used the Beetle Antenna Search (BAS) algorithm and Levenberg Marquardt (LM) algorithm to optimize the BPNN, and used the optimized BPNN model to predict the economic losses of storm surges, and found that the prediction accuracy is significantly improved. Lin et al. [42] used a Vector Space Model (VSM) to correct the results of BPNN. Chen et al. [91] combined genetic algorithm (GA) with Elman neural network, SVR, and GRNN models to predict tropical cyclone losses. Liu et al. [109] proposed a hybrid model that combines wavelet transform (WT), genetic algorithm, and support vector machine. Wu et al. [11] established a new intelligent prediction model for economic losses of subway station caused by rainstorm and flood using sparrow search algorithm (SSA), mean impact value (MIV) and LSSVM. The results showed that SSA algorithms has the advantages of good stability, strong global search ability, and fewer parameters. This fusion method can not only improve the prediction accuracy, but also have strong interpretability.
Furthermore, some researchers adopted a strategy of combining machine learning algorithms with strong predictors, such as Zhao [31] combining an adaptive boosting algorithm with BPNN (Adaboost-BPNN) to predict direct economic losses from marine disasters [2, 11]. Moreover, other researchers combined machine learning algorithms with attention mechanisms to improve prediction accuracy.
Therefore, based on the above analysis of existing disaster loss prediction, in order to improve the accuracy of disaster loss predication, the following schemes can be proposed for conventional solutions:
1) The EMD-LSSVM method (i.e. Combining empirical mode decomposition and least squares support vector machine models): It decomposes the original disaster loss time series into a set of IMFs and a residual. Then, corresponding LSSVM models are established using IMF components. Finally, the predicted values of final loss are obtained by adding each sub-sequence result [6].
2) The Interpolation-LSSVM method (i.e. Combining Interpolation and least squares support vector machine models): It can achieve the goal of accurately estimating disaster loss under small sample conditions. Firstly, Interpolation is used to generate a new enhanced data set. Then, LSSVM algorithm is conducted to obtain the optimal loss estimation. Finally, the model robustness was verified [7].
Based on the analysis above, future agricultural disaster loss prediction methods should focus on the EMD-LSSVM method (suitable for non-stationary agricultural data) and interpolation-enhanced LSSVM method (suitable for small sample agricultural problems). Both methods help improve the accuracy and stability of disaster predictions, which in turn provide more precise disaster risk forecasting services for agricultural production, food processing, and agricultural supply chains. This will effectively reduce the negative impact of meteorological disasters on agriculture and food production, ensuring food security.
Agricultural Science and Food Processing
ISSN: 3066-1579 (Online) | ISSN: 3066-1560 (Print)
Email: [email protected]
Portico
All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/