Statistics of water consumption of Kelowna City in different temporal resolutions (*m^{3}).

## Abstract

The forecasting of future value of water consumption in an urban area is highly complex and nonlinear. It often exhibits a high degree of spatial and temporal variability. It is a crucial factor for long-term sustainable management and improvement of the operation of urban water allocation system. This chapter will study the application of two pre-processing phase space reconstruction (PSR) and wavelet decomposition transform (WDT) methods to investigate the behavior of time series to forecast short-term water demand value of Kelowna City (BC, Canada). The research proposes two pre-process technique to improve the accuracy of the models. Artificial neural networks (ANNs), gene expression programming (GEP) and multilinear regression (MLR) methods are the tools that considered for forecasting the demand values. Evaluation of the tools is based on two steps with and without applying the pre-processing methods. Moreover, autocorrelation function (ACF) is used to calculate the lag time. Correlation dimension is used to study the chaotic behavior of the dataset. The models’ relative performance is compared using three different fitness indexes; coefficient of determination (CD), root mean square error (RMSE) and mean absolute error (MAE). The results showed how pre-processing combination of WDT and PSR improved the performance of the models in forecasting short-term demand values.

### Keywords

- artificial neural network
- correlation dimension
- chaos
- gene expression programming
- Kelowna
- water demand
- wavelet

## 1. Introduction

Climate change significantly affects the water availability all around the world. This effect plays a crucial role in arid and semiarid regions. On the other hand, urban development, population growth, industrial development and economic expansion also increases water scarcity concerns critically worldwide. Therefore, the governments have to be prepared beforehand for any consequences related to water problems, especially drinking water. The efficient operation and a management plan of urban water supply requires information about the value of consumption in the future. For using different standards to simulate hydraulic constitutions in pipeline systems (to improve the reliability of the system), it is necessary to have an accurate simulation of consumption value in a specific period. In other words, “The purpose of water demand forecast is to demonstrate futuristic information available for public water suppliers as they conduct their business” [1, 2]. Short-term (e.g., less than a week), mid-term (e.g., weekly to monthly) and long-term (e.g., greater than monthly) period forecast demand values are critical for daily operations and future management of the system. Long-term urban demand forecasting (up to 25 years), mid-term (up to 2 years) and short-term values (up to 2 days) depends upon vital factors such as water supply planning, pipeline maintenance, and water distribution system optimization (e.g. optimized pumping, pipeline maintenance, minimize energy cost and water supply cost, improving system reliability and water quality), respectively [3, 4, 5]. While studies have advanced the understanding of nonlinear characteristics and high complexity of water consumption factors, further research is still required. The present accepted knowledge for these factors is still limited and depends upon (1) accurate estimation and forecast water consumption and (2) determination of type and degree of nonlinearity among the effective variables [6]. Over the past decades, two groups of deterministic and probabilistic methods have been proposed to forecast urban water demand. The deterministic approach is solely based on the input variables and their initial conditions, whereas a probabilistic model relies on modeling uncertainties and randomness of the input variables.

Given the significant challenges and complexity of probabilistic methods and the fact that pre-processing methods can provide a useful approximation to their probabilistic counterparts, this research focused on the application of pre-processing to forecast short-term consumption.

## 2. Literature review

Midterm water demand forecast helps the water management authorities to develop an integrated plan which balances supply and demand in a given period. Water stress of an area can be reduced by accurate estimation of drinking water supply demand [3, 7, 8, 9]. Moreover, management can provide water sustainability based on their experience as well as the accurate and reliable value of future demand [10].

Compared to other hydrological forecast studies (e.g., river discharge, sedimentation, rainfall, etc.) water consumption is not as influenced by the input factors as other studies do. The most significant input variables are temperature, precipitation, and past demand values that were popular in most of the studies [11, 12, 13]. Two different types of variables affecting water demand: climatic (e.g., temperature, relative humidity, rainfall, etc.) and socioeconomic (e.g., population and income) [14]. Climatic variables can affect short-term and mid-term values while socioeconomic variables are useful for long-term forecasting [11, 15, 16]. However, a few studies investigated the impact of climatic variables on demand forecasting [17, 18, 19]. Literature enlists various deterministic and probabilistic techniques for forecasting urban drinking water demand. In general, conventional methods were prevalent for a better understanding of determinants of water demand [20, 21, 22], which consider linear relationships between effective variables and water demand, which is nonlinear. The mentioned studies are broadly categorized into two-fold: physical based and black box models. Without analyzing the physical processes, the second one applies artificial intelligence techniques (artificial neural networks, genetic programming, etc.), fuzzy-based (fuzzy logic, neuro-fuzzy, etc.), soft computing (support vector machine, etc.), and nonlinear deterministic (nonlinear local approximation, etc.) to identify the relationship between the input and output variables. Conventional regression models [3], autoregressive integrated moving average (ARIMA) [23], autoregressive integrated moving average with explanatory variable (ARIMAX) [24, 25], artificial neural networks (ANN) [9, 26, 27, 28, 29], a combination of conventional and ANN [11, 12, 30], feedforward neural networks [12, 31], general regression neural networks [32, 33], support vector machines [14, 9, 34, 35, 36, 37], gene expression programming [14, 38], fuzzy regression [39], neuro-fuzzy systems [40, 41], Fourier analysis [4], hybrid models (e.g. combined wavelet-ANN and wavelet-GEP) [13, 38], fuzzy cognitive map learning method [42, 43]. This research applies probabilistic ANN, GEP approach and a conventional method (MLR) to determine the performance of the methods with/without phase space reconstruction and wavelet decomposition in the case.

The chaotic nature has been addressed for various systems [44, 45, 46, 47, 48, 49]. Any chaotic system is deterministic in which minor changes in the initial conditions could lead to entire different behaviors in the next periods [44]. Chaos theory was successfully used to understand the nonlinear dynamic of the system. The models that are based on chaos theory and nonlinear dynamics are a better representative of the behavior of dynamic of observed data [50]. In general, chaos theory improves the understanding of nonlinear dynamics [51]. Ng et al. applied chaos theory on noisy time series of discharge in Saugeen River (Canada) [52]. They argued that noisy time series not only increase the complications of the data but also gave high embedding dimension. Sivakumar et al. utilized the concept of nonlinear dynamic behavior to classify rivers from phase-space data reconstruction perspective [53].

Genetic programming (GP) and gene expression programming (GEP) are among the heuristic algorithms based on Darwin’s evolution theory [53]. GP was employed to complete missing data in wave records and forecasting [55, 56, 57]. Aytek and Kishi used GP model to suspended sediment in the Tongue River (United States) and found GP more accurate than sediment rating curves and multiple linear regressions (MLR) [58]. Ghorbani et al. investigated the chaos theory, artificial neural network (ANN) and GEP in estimating suspended sediment in the Mississippi River (United States) [59]. GEP is superior to GP as it is more convenient to interpret the results by a GEP tree that comes along with output results. GEP also performs better at extracting a mathematical equation which shows the relation between input and output variables [59, 60, 61]. Nasseri et al. developed a hybrid model combining the extended Kalman filter with genetic programming for monthly water demand forecasting in Tehran [62]. Shabani et al. proposed a new rationale and a novel technique in forecasting water demand using lag time to feed the determinants of water demand by the development of GEP and SVM models [14]. Yousefi et al. implemented sophisticated mathematical models to forecast water demand of City of Kelowna in monthly temporal scale. Their study assessed the performance of GEP using wavelet decomposition [38].

Among the variety of examined methods Artificial Neural Networks (ANNs), have been applied to the various period in the wide variety of hydrological issues. The main reason of ANNs frequent usage is its ability to overcome the relationship in determining the complexity of time series, even with the shortage of amount of data available to train the models. Therefore, most of the studies applicable in area of water resources demand applies ANNs to forecast short, mid and long-term demand values [13, 30, 31].

Regarding the literature review reported by Nourani et al. concluded about the dominant application of wavelet-based models [63]. Moreover, Labat notified about the improving ability of wavelet in models’ performance [64]. Therefore, the application of wavelet brought researchers attention into the area such as denoising [65]; stream flow and water resources [66]; evaporation and climatic models [67]; groundwater level modeling [68]; water demand forecasting [13, 38], where in most of the mentioned studies combination of Wavelet-ANNs performed accurately over conventional models without hybrid wavelet models (e.g. ARIMA, MLR, ANN and etc.).

The objectives of this study are four-fold: (1) to investigate chaotic behavior of case data and finding the proper lag time; (2) to find the accuracy of the forecasting for one-day ahead lead time with various input combination, and (3) to study if phase space reconstruction (PSR) based on optimum embedding dimension would improve the accuracy of the models, and 4) application of wavelet decomposition by five different transform functions combined with all the mentioned models with and without PSR.

## 3. Methodology

### 3.1. Case study and data information

#### 3.1.1. Understandings

Unlike natural water resources like rainfall, the lower percentage of drinking water which is change to waste water after use, back to the cycle. Water pressure in a pipeline, water quality, supply peak consumption time, pipeline maintenance, maintenance cost, specialist and educated human resources, pipeline failure management, etc. are the variables that all of them should be under control at the same time. Also, to develop an integrated long-term plan, availability of resources is crucial. Therefore, knowing about the value of consumption in a specific period is the first step for any management plan beyond urban drinking water supply and allocation. This chapter investigates the first step of every long-term plan development in urban drinking water as discussed below. Water utility management needs drinking water long-term forecasted values in several terms. (1) water distribution network design; (2) supply and consumption management; (3) efficient application of distribution network; (4) pipeline pressure management; (5) network development; (6) optimizing the cost of water supply and network maintenance.

#### 3.1.2. Study area

The present research selected Water consumption of the City of Kelowna (BC, Canada) as the test case. The city of Kelowna water utility provides services for approximately 65,000 residents. Poplar Point, Eldorado, Cedar Creek and Swick Road pump stations cover services for 99% of the population of the area [69]. However, few areas in the boundary are named as “Future City” where does not contain any population yet, land development plan shows water servicing is considered in the area. Monitoring of water quality, the operation of the pumps, water level in reservoirs, and pipeline pressure are conducted by the use of Supervisory Control and Data Acquisition Software (SCADA).

#### 3.1.3. Review of data records

Hourly water demand for the above-mentioned stations has been made available by the city utility of Kelowna. The data used 6 years (approximately 52,464 hourly consumption) starting from January 1st, 2011 to 30th December 2016. Figure 1 shows the variation of daily and monthly water demand and the consumption pattern. Concerning the 6 years water demand samples of daily scale (2186 points), the first 5 years (1882 points) are used for calibrating the models and the last year (365 points – 2016) is considered as the test period. Table 1 shows the characteristics of the dataset in the test case.

Property | Number of Data | Max. Value^{*} | Min. Value^{*} | Average^{*} | Standard deviation^{*} | Coefficient of variation | Skew | Kurtosis |
---|---|---|---|---|---|---|---|---|

Data | 2186 | 114597.2 | 14,124 | 43046.4 | 20074.5 | 0.46 | 0.73 | −0.38 |

### 3.2. Phase space reconstruction (PSR)

Given a set of physical variables and their interactions, the dynamics of a system (e.g., water consumption) can be defined by a single point moving on a trajectory, where each of its points represents a state of the system. The lag-embedding method reconstructs phase-space from a univariate or multivariate time series generated by a deterministic dynamic system [70]. The underlying dynamics can be studied by building an m-dimensional space * X*defined by [49]:

_{t}

where * X*is a vector of the observed data of {

_{t}

*}*x

_{t}

*= 1,…,*t

*,*N

*is the total number of observed data,*N

*is the lag time, and*τ

*is embedding dimension. The embedding dimension (*m

*) is typically in the range of 1–10 [53, 54]. The lag-embedding method is sensitive to both embedding parameters of*m

*and*τ

*. Average mutual information (AMI) and autocorrelation function (ACF) are the two well-known methods for estimating the lag time [71, 72]. More details about ACF and different functions are available at [73].*m

### 3.3. Correlation dimension (Chaos investigation)

Correlation dimension is a nonlinear measure of the correlation between pairs lying on the attractor. The dimension of a system reveals the number of effective variables in the system. Kermani (2016) classified different dimensions in a system as topological, Hausdorf, box counting, point-wise, and correlation dimension. These dimensions are nearly equal in chaotic systems [52, 74]. This research employed correlation dimension, as it is a lower bound measure of the fractal dimension [59, 74]. For time series whose underlying dynamics is chaotic, the correlation dimension gets a finite fractional value, whereas it is infinite for stochastic systems. The later does not saturate to a specific amount of correlation exponent [75]. For an m-dimensional phase-space, the correlation function, * Cm*(

*), is defined as the fraction of states closer than*r

*[76].*r

where * H*is the Heaviside step function,

*is the*X

_{i}

*state vector,*i

_{th}

*is the number of points on the reconstructed attractor,*N

_{p}

*is the radius of a sphere with the content of*r

*or*X

_{i}

*. The Theiler window (*X

_{j}

*) is the correction needed to avoid spurious results due to temporal correlations instead of dynamical ones.*w

*(*C

_{m}

*) is proportional to r for stochastic time series, whereas for chaotic time series it scales with r as:*r

where c_{e} is correlation exponent defined by:

The parameters m and * C*can be determined as the slopes of the lines when plotted

_{e}

*(*C

_{m}

*) against*r

*in logarithmic scale. In a deterministic system,*r

*increases by increasing*C

_{e}

*until eventually remaining unchanged. The correlation dimension of time series is defined as the specific value of*m

*after which*m

*remains unchanged [54, 59].*C

_{e}

### 3.4. Artificial neural networks

When ANN is based roughly on the neural layout of the human brain and is capable of non-linear modeling processes that can classify the patterns and recognize the capabilities [77]. Regarding the ability of multilayer perceptron (MLP)-ANN outperformance as a conventional ANN approaches [77, 78, 79], this research employed three-layer MLP-ANN (input, hidden and output layers) and the different number of neurons. In hidden layer, neurons are calculated by the summation of demand values (* d*) with the given weight for each value (

_{i}

*) to determine the output signal as (*w

_{ij}

*).*u

_{j}

where

### 3.5. Gene expression programming

Evolutionary computation has received significant attention among researchers for studying complex engineering systems. Genetic algorithm (GA), genetic programming (GP), and gene expression programming (GEP) were inspired by Darwin’s theory of evolution [60, 61]. GEP defines an algorithm and equation which shows the relation between input and output variables. GA and GP rely on a string of numbers with defined length called “chromosomes”, while GEP employs a set of nonlinear entities with different shapes and sizes, “expression/parse trees”. The expression tree accommodates the ease of a GA solution as well as the capability of accepting the nonlinear/complex behavior in a typical GP solution. The chromosome can have one or more genes of equal length. A gene represents a set of symbols containing two parts; a head which has functions and terminals and a tail which only has terminals. Initiating with the random generation of chromosomes, GEP is followed by different applications of genetic operators like replication, recombination, mutation, etc. The terminating condition for developing GEP depends upon the selection of maximum fitness. This research applied 30 chromosomes, eight head sizes, three genes, and arithmetic operators of {+,-, ×, * x, x*, √

^{2}

*}.*x

### 3.6. Multilinear regression

When MLR corresponds to a linear combination of the components of multiple signals * x*(e.g. recorded discharge, lag time discharge, or combination of both) to a single output signal y (Demand) by:

where * x*is the defined input (demand) and

_{i}

*is regression coefficient determined by the least square method with the residual*a

_{i}

*defined by:*r

### 3.7. Wavelet decomposition

Commonly wavelet transforms are used for decomposition, de-noising, and compression of the time series [83]. Time series have a combination of low and high frequency which represent improved features (e.g., cyclical trends) and chaotic element, respectively [84]. Considering these frequencies, separation of low and high frequency is helpful in studying the original pattern and behavior of the time series. One of the mentioned methods is discrete wavelet transform (DWT) to separate per level of frequencies in time series. One of the common discretion ways proposed by Mallat that this study used the mentioned DTW method to separate the frequencies of the applied data [85]. The level of the decomposition shows the subseries. For example, for level 1 decomposition, the number of subseries is two. Therefore, the number of levels indicates the number of subseries plus one. Level 3 is considered as suitable decomposition level in the present study regarding the number of data (2186 day) and following Nourani et al. (2009) that offered [83]:

where * L*is the number, the level of decomposition and

_{n}

*is the number of used data. Thus, the proper level in this study is considered as 3. However, increasing the level number does not necessarily improve the accuracy of the models. Therefore, the original data are discretized in a high-frequency subset (*N

a

_{3}) and three high frequencies as (

d

_{1}), (

d

_{2}) and (

d

_{3}), where the summation of all is equal with the value of original data. This research employed Haar, the second and fourth order Daubechies (db2, db4), and the second and fourth order Symlets (Sym2, sym4) wavelets to decompose daily water demand time series into sub-series. The software MATLAB 2015 (https://www.mathworks.com) was employed for the analysis.

### 3.8. Evaluation of models’ performance

This research measured the models’ accuracy by coefficient of determination (CD), root mean squared error (RMSE) and mean absolute error (MAE) defined as:

where * N*is the number of values,

_{t}

*and*O

*are the observed and forecasted values of demand, respectively.*F

## 4. Preliminary results

### 4.1. Phase space reconstruction and investigation of chaotic behavior

Existence of chaotic behavior in the time series is shown in Figure 2. However, the results are not entirely based on the proof of having chaotic behavior, as the figure only shows possible low-dimensional chaotic behavior. Theoretically, several methods are well known for investigating the chaotic behavior such as lag time calculation method (e.g., average mutual information (AMI), Autocorrelation function (ACF)), correlation dimension, largest Lyapunov exponent, etc.). This study investigates the chaotic behavior by applying ACF and correlation dimension. Having chaotic behavior allows using ACF to calculate the lag time of the time series. The value of lag time is considered as the first approach of ACF to 0 (Figure 2).

The results show 83-days as the lag time of the time series. Therefore, 83-day is used to design combination of inputs as phase space for the time series. In this study, the difference between 1st day and 83rd day is used as delay period for phase space reconstruction varying embedding dimensions from 1 to 10 (_{1}: * D*;

_{t}

m

_{2}:

*,*D

_{t}

*;*D

_{t-τ}

m

_{10}:

*,…,*D

_{t}

D

_{t-10τ}). It should be noticed that several methods were introduced in literature to calculate the value of optimum embedding dimension which may be more than 10 for the used time series in this study. This study aims at showing the performance of embedding dimension and reconstructed phase space, where m is only considered 1 to 10. Figure 2 shows the value of ACF for the demand series and reconstructed phase space (

*= 83). Figure 3a shows the relation between*τ

*(*C

*) and*r

*and (3b) correlation exponent by varying*r

*. Figure 3b shows that the value of correlation exponent increases by*m

*and as m = 17, the correlation exponent reaches a specific value (*m

*= 3.41). This constant value of*Ce

*at*Ce

*= 17 indicates the existence of the deterministic behavior of the time series.*m

### 4.2. Multilinear regression

Excel 2010 was used to implement MLR model. The train period was used to derive regression coefficient from getting the value of variables in the linear equation. The availability of trained equation, helped in testifying the last year data as the test period. In the first fold, the 1-day delay was considered for * m*1 to 10, and second fold applied 83-day delay. Table 2 shows the results of both MLR and PSR-MLR in the test period.

MLR, = 1 | PSR-MLR, = 83 | ||||||
---|---|---|---|---|---|---|---|

CD | RMSE(^{3}/) | MAE | CD | RMSE(^{3}/) | MAE | ||

0.9565 | 3642.89 | 50.42 | |||||

0.9565 | 3804.14 | 52.14 | 0.9565 | 3804.14 | 52.14 | ||

0.9468 | 14106.70 | 112.82 | 0.9570 | 5319.51 | 66.90 | ||

0.9473 | 13174.97 | 108.82 | |||||

0.9505 | 3724.99 | 49.81 | 0.9568 | 4167.55 | 56.45 | ||

0.9503 | 3746.33 | 50.09 | 0.9569 | 5907.90 | 71.65 | ||

0.9503 | 3747.49 | 50.10 | 0.9565 | 4370.03 | 58.86 | ||

0.9493 | 6058.34 | 70.88 | 0.9566 | 4581.10 | 60.89 | ||

0.9505 | 3736.33 | 50.02 | 0.9566 | 5023.16 | 64.71 | ||

0.9506 | 3738.35 | 50.07 | 0.9566 | 4327.34 | 58.48 |

Statistical indices for the fitness values showed * m* = 1 for 1-day delay and

*= 4 for the reconstructed phase space with the value of (CD = 0.9565, RMSE = 3642.89 and MAE = 50.42) and (CD = 0.9572, RMSE = 3636.34 and MAE = 51.04), respectively. However, the difference between the two models is not considerable, in the large value of demand in long-term this difference can come into account. Figure 4 shows the comparison of observed and demand values. Moreover, the suggested equation for the best result by MLR is given by:*m

### 4.3. Performance of artificial neural network

ANN is another approach to model the demand values which represented in Section 3.4. ANN’s structures have different hidden layer neurons (HLN) from 1 to 20 with 200 epochs for each model. Table 3 represents the result of ANN for both 1-day delay and PSR values. The results in the table for each * m*, are extracted from the result of various HLN and epochs. Figure 5 shows the example for selecting

*= 3 among (20 × 200 = 4000). This calculation has been done for all*m

*from 1 to 10 for both 1-day delay and PSR. (4000 × 10 × 2 = 80,000) number of calculations where the best 10 values have been selected (Table 3).*m

ANN, = 1 | PSR-ANN, = 83 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Structure | Epoch | CD | RMSE^{*} | MAE | Structure | Epoch | CD | RMSE^{*} | MAE | ||

1-5-1 | 110 | 0.9505 | 3611.56 | 48.34 | 1-6-1 | 150 | 0.9573 | 3369.55 | 47.63 | ||

1-3-1 | 140 | 0.9509 | 3602.25 | 48.25 | 1-4-1 | 20 | 0.9568 | 3369.50 | 47.83 | ||

1-16-1 | 20 | 0.9514 | 3554.13 | 48.06 | |||||||

1-16-1 | 170 | 0.9516 | 3550.99 | 47.90 | 1-2-1 | 70 | 0.9575 | 3333.67 | 47.25 | ||

1-3-1 | 160 | 0.9513 | 3561.33 | 47.98 | 1-3-1 | 110 | 0.9578 | 3340.36 | 47.15 | ||

1-3-1 | 40 | 0.9572 | 3340.16 | 47.46 | |||||||

1-3-1 | 100 | 0.9511 | 3563.14 | 48.08 | 1-3-1 | 100 | 0.9570 | 3348.68 | 47.80 | ||

1-8-1 | 20 | 0.9510 | 3570.70 | 47.84 | 1-2-1 | 150 | 0.9573 | 3333.88 | 47.24 | ||

1-4-1 | 200 | 0.9511 | 3566.00 | 47.69 | 1-2-1 | 140 | 0.9571 | 3338.53 | 47.89 | ||

1-3-1 | 100 | 0.9515 | 3546.72 | 47.94 | 1-4-1 | 10 | 0.9539 | 3518.86 | 49.28 |

Selection of ANN structures are represented in Table 3 for the test period. Statistical indices for the fitness values showed * m* = 6 for 1-day delay and

*= 3 for PSR, with the values of (CD = 0.9520, RMSE = 3535.66 and MAE = 47.58) and (CD = 0.9578, RMSE = 3330.53 and MAE = 47.13), respectively. Regarding the results, PSR-ANN mostly dominates in all embedding dimensions for the fitness accuracy indices. Figure 6 shows the comparison of observed and demand values in the test period for both ANN and PSR-ANN in*m

*= 6 and 3, respectively. The results showed (*m

*,*D

_{t}

D

_{t + τ},

D

_{t + 2τ}) as the best input combination for the models.

### 4.4. Performance of gene expression programming

GEP preliminarily investigates the relationship between input and output as discussed in Section 3.5. Unlike the other models in this study, 1-day ahead is output, and various combinations of input in terms of * m*are considered as input variables. The arithmetic operations used in this study are {+, −, ×,

*,*x

x

_{2}, √

*}, and GEP applies them to fit the best accuracy between input and output variables. Further details of GEP initial term values are in following of [14, 38, 59] to extract the GEP model for both 1-day delay and PSR. The results are shown in the Table 4 for the test period.*x

GEP, = 1 | PSR-GEP, = 83 | ||||||
---|---|---|---|---|---|---|---|

CD | RMSE(^{3}/) | MAE | CD | RMSE(^{3}/) | MAE | ||

0.9494 | 3621.87 | 48.59 | 0.9565 | 3363.46 | 48.03 | ||

0.9565 | 3357.00 | 47.82 | |||||

0.9494 | 3633.87 | 48.42 | |||||

0.9494 | 3637.74 | 48.42 | 0.9566 | 3359.53 | 47.95 | ||

0.9494 | 3639.05 | 48.43 | 0.9562 | 3372.70 | 48.04 | ||

0.9494 | 3619.77 | 48.60 | 0.9566 | 3359.64 | 48.08 | ||

0.9495 | 3630.44 | 48.38 | 0.9564 | 3365.04 | 47.95 | ||

0.9494 | 3634.41 | 48.42 | 0.9567 | 3353.24 | 47.62 | ||

0.9494 | 3628.46 | 48.42 | 0.9562 | 3370.08 | 48.05 | ||

0.9494 | 3631.12 | 48.40 | 0.9565 | 3356.68 | 47.84 |

According to the Table 4, there is not much difference among the different * m*. But the difference in PSR-GEP results can be considered as a proof of sensitivity to the initial values of specific time lags where the variations of the results for different

*are more than 1-day delay. There is not a significant difference in the results in this study comparing to other alternative models, especially PSR-ANN is not an advantage of GEP. However, extracting the mathematical equation through GEP is one of advantage of GEP comparing to other artificial models. As a result of given model, equation for*m

*= 3 (PSR-GEP) can calculate the demand value for 1-day ahead by:*m

Although, variety of other arithmetic operations may have been applied here but focusing on the aim of study, only simple known operations were applied to extract the GEP equation. The results of PSR-GEP and alternative ones prove the advantage of PSR to improve the accuracy of the models. Statistical indices for the fitness values showed * m* = 2 for 1-day delay and

*= 3 for the reconstructed phase space with the value of (CD = 0.9497, RMSE = 3609.82, and MAE = 48.37) and (CD = 0.9569, RMSE = 3343.36, and MAE = 47.50), respectively. Figure 7 shows the comparison of observed and demand values in the test period for both GEP and PSR-GEP in*m

*= 2 and 3, respectively.*m

## 5. Wavelet decomposition and models’ performance

The combination of models with wavelet decomposition is derived by adding the output of each wavelet to the input of the models. Figure 8 shows the example of the decomposed values for water demand time series by db2 transform function. To discrete the demand values, five wavelet transforms were applied (Section 3.7.). As suggested by Nourani et al. [83], 3rd level decomposition is recommended for 2186 point data.

Table 5 indicates the results of wavelet decomposition for the selected models in the previous section. As the table highlights, db4 and db2 are the transforms which resulted in the highest accuracy in W-MLR and W-PSR-MLR, with the value of (CD = 0.9697, RMSE = 2804.44 and MAE = 42.11) and (CD = 0.9745, RMSE = 2699.83 and MAE = 43.61), respectively. After implying the decomposed inputs for MLR and PSR-MLR for result comparison improved the results in both models. Also, sym4 and db2 are the transforms which resulted in the highest accuracy in W-ANN and W-PSR-ANN, with the value of (CD = 0.9915, RMSE = 1486.21 and MAE = 30.06) and (CD = 0.9756, RMSE = 2517.24, and MAE = 41.68), respectively. Also, calculations for W-ANN and W-PSR-ANN are done with HLN 1 to 20 and epochs 1 to 200, and the mentioned results in the table are selective of the highest among them. Unlike the results of MLR, W-ANN forecasted accurately than W-PSR-ANN which is the inversion of the results of ANN and PSR-ANN. However, wavelet decomposition improved the results of W-ANN and W-PSR-ANN comparing to the alternative without decomposition (Table 3). Moreover, db4 and db2 are the transforms which resulted in the highest accuracy in W-GEP and W-PSR- GEP, with the value of (CD = 0.9845, RMSE = 2027.28 and MAE = 36.62) and (CD = 0.9753, RMSE = 2532.21, and MAE = 41.69), respectively. Following the results of ANN method, W-GEP forecasted accurately than W-PSR-GEP. However, wavelet decomposition improved the results of W-GEP and W-PSR-GEP comparing to the alternative without decomposition (Table 4).

Models | Fitness | Transform functions | ||||
---|---|---|---|---|---|---|

haar | db2 | db4 | sym2 | sym4 | ||

0.9612 | 0.9477 | 0.9677 | 0.9694 | |||

^{3} | 3168.62 | 3681.17 | 2893.60 | 2816.06 | ||

44.48 | 49.04 | 43.54 | 42.24 | |||

0.9670 | 0.9719 | 0.9745 | 0.9712 | |||

^{3} | 3008.34 | 2811.58 | 2699.83 | 2845.69 | ||

45.39 | 43.95 | 43.61 | 44.24 | |||

0.9868 | 0.9816 | 0.9861 | 0.9856 | |||

^{3} | 1853.11 | 2189.15 | 2136.25 | 1948.28 | ||

33.78 | 36.91 | 39.50 | 33.86 | |||

0.9685 | 0.9723 | 0.9752 | 0.9715 | |||

^{3} | 2867.87 | 2677.44 | 2547.89 | 2724.56 | ||

43.16 | 42.09 | 42.19 | 42.61 | |||

0.9721 | 0.9766 | 0.9297 | 0.9255 | |||

^{3} | 2698.16 | 2492.46 | 4311.60 | 4429.89 | ||

41.21 | 39.05 | 54.23 | 55.25 | |||

0.9667 | 0.9721 | 0.9748 | 0.9704 | |||

^{3} | 2937.76 | 2689.20 | 2555.82 | 2770.80 | ||

43.66 | 42.13 | 41.90 | 42.51 |

All PSR models resulted in the highest values which used the decomposed inputs by db2 transform. It is noticeable that PSR affects the inherent of the time series which the results of performance of all models are in common about improving the accuracy. Considering this fact, PSR can be introduced as a pre-processing method like wavelet decomposition; however, complexity and accuracy of PSR cannot be compared with the higher result of wavelet decomposition. Figure 9 shows the comparison of all selected models with highest accuracy (W-PSR-MLR, W-ANN, and W-GEP) in forecast of short-term water demand values.

The figure shows that the performance of W-ANN and W-GEP is better than W-PSR-MLR, while W-ANN’s calculated values are more accurate than W-GEP in simulating peak points. This study eventually would suggest that these peak points are indication of critical issues related to water distribution system (pressure management, peak time demand, etc.) taking in account the performance of the models and simulations of highest and lowest values of demands. Therefore, it is recommended to evaluate models’ performance in two separate parts as maximum values and minimum values along with evaluating criteria such as CD, RMSE, and MAE for the test period. The difference is not visible in Figure 9. Therefore focusing on Figure 10, it shows the performance of models by residual values in the test period.

In Figure 10 the residual values show the remarkable difference of performance of models. W-ANN values distributed in the area of (−15%, +15%), unlike other two models. W-GEP dominates over W-PSR-MLR; however, the fitness criteria values for both are very close to each other (Tables 2 and 4).

This chapter presents the performance of two pre-processes methods in improving the accuracy of three models to forecast short-term urban water demand value in Kelowna City, BC, Canada. The first pre-process approach of PSR which is calculated by ACF method has improved the results of all models in this study. However, PSR does not improve the accuracy of models for entire dataset. Based on the behavior of time series, ACF or AMI (two lag time calculation methods) may have improved a non-deterministic dataset, but it seems in a chaotic dataset, PSR improves the performance of models in increasing accuracy with a proper number of embedding dimensions. Wavelet decomposition, the second pre-process method in the present study has also improved the accuracy of the models but, decomposition did not work on PSR based methods except MLR. It can be concluded that PSR and wavelet are in common with their outfits as two applicable pre-process methods. Also, PSR pre-processing is simpler than wavelet. Therefore, it is recommended to use PSR for the models. As per the results of this study it seems PSR works on a chaotic dataset which seems to be considered as disadvantage of PSR. Comparing the mentioned two pre-process methods, wavelet decomposition is significant to use, though, it is time-consuming and complex than PSR. Also, each transform functions have specific application where each of them can be used independently (e.g., seasonal, de-noising, peak points, etc.).

## 6. Conclusion

Over the past decades, hydrologists have paid attention to data-driven modeling techniques. City governments and WDS operators are always looking for an accurate estimation of water demand values not only for future but also focusing on probable failures like peak consumption and pressure values to manage the WDS pipelines. Therefore, the wide variety of modeling techniques such as artificial and evolutionary simulation methods are proposed by researchers. This chapter investigated the performance of three techniques (ANN, GEP, and MLR) in forecasting short-term water demand of Kelowna City (BC, Canada). About 6 years daily dataset was employed for training and testing the models. First 5 years were considered to train the model and the last year as the test period. All three techniques performed considerably accurate, while the focus of this chapter was on improving the accuracy of the models for the same dataset. Firstly, the model was calibrated by different input combination with 1-day lag time. Then, models were calibrated by the lag time of the data set (83-day) which was calculated by ACF method. WDT was combined with the models to capture multi-scale features of the signals by decomposing observed demand values into sub-series. Five WDT functions (haar, Db2, db4, Sym2, and sym4) were employed to decompose the dataset. The results were then compared with the MLR, ANN, and GEP when no pre-processing (PSR, WDT) was applied. The research results were accurate than PSR. WDT have also improved the accuracy of models with PSR and without PSR. However, the impact of wavelet on the models with PSR was not as considerable as without PSR. The lowest error was reported by W-ANN among all alternative models in this chapter. Regarding the improvement of all models combining WDT and PSR, it is recommended to use the method in modeling and forecasting issues, especially about the dataset that the peak points are very critical in the case. The inherent behavior of dataset (deterministic or stochastic) can affect the performance of the pre-processing methods. Therefore, behavior of datasets should be investigated before deciding to combine any pre-process methods.

## Acknowledgments

The authors would like to thank the financial support from the Natural Sciences and Engineering Research Council (NSERC) of Canada. The Okanagan Basin Water Board and the City of Kelowna are thanked for providing water consumption data.