ANEMIA MODELLING USING THE MULTIPLE REGRESSION ANALYSIS

The aim of this article is to forecast anemia from a population through biomedical variables of individuals using the multiple linear regression model. The study is conducted in terms of dataset consisting of 539 subjects provided from blood laboratories. A multiple linear regression model is produced through biomedical information. To achieve this, a mathematical method based on multiple regression analysis has been applied in this research for a reliable model that investigate if there exists a relation between the anemia and the biomedical variables and to provide the more realistic one. For comparison purposes, the linear deep learning methods have also been considered and the current results are seen to be slightly better. The model based on the variables and outcomes is expected to serve as a good indicator of disease diagnosis for health providers and planning treatment schedules for their patients, especially predict of the type of anemia.


Introduction
A mathematical model is an essential tool for analyzing pathological characteristics and it can be used for various reasons as in the literature [1][2][3][4][5][6][7][8][9][10].To assess situations seen in hospitals, any disease condition has several effects for a single disease.So, most outcomes in real life problems are affected by multiple input variables.As signified in the literature [11][12][13][14], the anemia of chronic inflammation and it was initially thought to be associated primarily with the infectious, inflammatory diseases.Anemia is a lower blood hemoglobin level below normal limits determined by the World Health Organization (WHO) [11].This decrease in the level of hemoglobin leads to the lack of access to the organs of the body enough amount of oxygen and therefore appear in the symptoms of a headache, fatigue, and inability to focus and attention.
As pointed out by Hbert et al [12], anemia is one of the most common cases among blood diseases worldwide.
This article aims at predicting pathological subjects from a population through physical biomedical variables (eight blood variables, sex, and age) and output (Anemia types).It is important to predict the type of anemia because there has been an increase in the incidence of anemia among different segments of society.
To make the best biomedical decisions, medical predictions play a very important role in the process of diagnosis and planning treatment for health providers.So, our goal is to develop a new mathematical model to study the effect of the blood variables, sex, and age on the types of anemia.Our model, different from the mathematical models given in the literature [15][16][17][18][19], has also been successfully used in the prediction of several types of anemia through a large group of blood variables, sex, and age.
In the literature, many studies were carried out [16,18,[21][22][23] by using relatively less number of input variables to predict the type of anemia.The methods used in the corresponding studies produced relatively less accurate results.For the blood variables, Hemoglobin (HB), Red Blood Cell (RBC), Mean Corpuscular Volume (MCV), Mean Corpuscular Hemoglobin (MCH), and Red Cell Distribution Width (RCDW) were analyzed by Sirachainan et al [16].They used the MCV and MCH [18].Jimnez [21] used the RBC, HB, and Hematocrit (HCT).Another researcher [22] considered the MCV, MCH, HCT, and HB.Piplani et al [23] used the HB, RBC, MCV, and MCH.Despite all those pioneering advances in these fields, the corresponding studies used a limited number of blood variables.To the best knowledge of the authors, more general models representing the behaviour closer to nature have been produced for the first time.The more number of input variables makes the derived model more realistic in the biomedicine.Thus, for such a realistic model, for such a large number of input variables a study has been accomplished here.Therefore, this study is believed to be an important contribution to predict the types of anemia.
Despite very effective, striking and frontier studies in the literature, researchers have used models with limited number of variables.Therefore, the present study focuses on the determination of the type of anemia through a very large number of the observational variables, more realistic one.Since many researchers have commonly considered the multiple regression analysis among the modelling techniques to deal with various problems including anemia [19,[23][24][25][26][27][28][29][30][31][32], the multiple analysis is taken into account in modelling the current biomedical problem.
The remainder of the paper is organized as follows: Section 2 highlight the study samples, explain linear regression analysis procedure and test the model.Building the linear model of data by the regression analysis has been given in Section 3. Regression model has been analyzed and discussed in Section 4.
Finally, conclusions and future research directions have been detailed.

2.1.
Study samples.The data were collected from observations of blood variables in order to identify a healthy or infected person and involved 539 subjects provided from blood laboratories in Iraq.Individuals between 6-56 years old have been taken into consideration and included 248 males, 291 females.Subjects are consisting of 211 healthy ones and of 328 anemic ones to build the model.The number of variables studied and selected for building the model is eleven, the independent variables identified are ten and a dependent variable.The dependent variable consists of six different outputs are healthy (0) and five blood diseases are iron deficiency anemia (1), deficiency vitamin B12 (2), thalassemia (3), sickle cell (4) and spherocytosis (5).
Here the samples for people and for each subject readings of blood variables are [11,12] Hemoglobin (HB), Red Blood Cells (RBC), Mean Corpuscular Hemoglobin (MCH), White Blood Cell (WBC), Hematocrit (HCT), Mean Corpuscular Hemoglobin Concentration (MCHC), Platelets (PLT), Mean Corpuscular Volume (MCV) and sex and age.The corresponding blood variables can be briefly introduced as follows.The HB is a portable protein inside the RBC and contains iron atoms, and that carries oxygen from the lungs to the body's tissues and returns carbon dioxide from the tissues back to the lungs.The RBCs are concave cells are useless nucleus contains the HB.The MCH is the calculated value derived from the HB measurement and a number of red cells.The WBCs are the cells of the immune system that are involved in protecting the body against infectious disease.The HCT is percentage of the RBCs volume of total blood volume.
The MCHC is the calculated concentration of HB in a specific volume of RBC.The PLT is an irregular, disc-shaped element in the blood that assists in blood clotting.The PLTs are usually classed as blood cells as well.Average size of the red cells in a sample is measured by the MCV.The other biophysical variables, sex and age, are considered.Because natural HB in the body varies from male to female, and thus male: 1, female: 2. Yet, natural HB in the body varies according to age.The anemia types and blood variables for our data are displayed in Table 1.
The observations recorded for each of these n levels can be expressed in the following way . . . 2) The dependent observations y 1 , y 2 , ..., y n , and the independent observations x 1 , x 2 , ..., x k , have n levels.Then x ij represents the ith level of the jth predictor variable, x j .
System (2) can be represented as follows: where y,X,B and stand for the observations, the regression coefficients and an unobserved random variable that adds noise to the linear relationship between the dependent variable and regressors, respectively.To obtain the regression model, B should be known.Therefore, B is estimated by using the least square where X T represents the transpose of the matrix X while (X T X) −1 represents inverse of the matrix (X T X).
Knowing the estimate B, the MLR model can now be expressed as [33,34] ŷ = BX, ( where ŷ is the estimated value for y from the regression.

2.3.
Test for the Model.The linear regression model estimation is selected and the sum of square tests.
The computation formula can be given as follows: (2.7) The coefficient of determination is a measure showing the rate of the contribution of the independent variables in the interpretation of the change in the dependent variable as known from the literature [35,36].It is given as follow: A terminological difference arises in the expression mean squared error (MSE).The MSE of a regression is a measure of the average of the sum of squared error and how the concentration of data around the regression model.The smaller the MSE, whenever the results are more accurate [35,36].Then it is given by (2.9)

Building Linear Regression Analysis Model
The currently produced MLR model is a linear equation determined as previously mentioned in Section 2.2.The obtained model is as follows: where y is type of the anemia and B i , 0 ≤ i ≤ 10, are the parameters to be determined.
The linear regression model, as explained in Section 2.2, is estimated as ŷ = 6.377 − 0.224HB − 0.224RBC − 0.029M CH + 0.001W BC +0.0005M CV − 0.016HCT + 0.007M CHC + 0.001P LT −0.311Sex − 0.009Age. (3.2) Here the coefficient values of the linear model have been obtained through the multiple regression approach, to find the model that is more realistic (see Table 4).As previously mentioned, the model can be represented in matrix form as follows: Here ŷ and X represent the estimates for output (types of the anemia) and the independent observations matrix, respectively.

Results and Discussion
Different strategies of mathematical methods are implemented to analyze blood variables, as in the literature [16,18,22,37].The multiple regression analysis have been taken into account by many researchers [19,[23][24][25][26][27][28][29][30][31][32] while dealing with various anemia problems at different levels.However, they used a limited number of blood variables and they did not study a relationship for the prediction of the types of anemia.
Therefore, the current study concentrates on the investigation of the relationship between a very large number of blood variables and the types of anemia.Various versions of models, based on the variables, are derived (see Table 2).The models produced in terms of larger number of blood variables show better correlation than the models produced in terms of less number of blood variables for predicting the types of anemia in equation (3.2).
However, naturally some of the variables are of more effect than others.
After the essential requirements have been verified for the multivariate analysis in equation (3.2), the variables have been included for the MLR analysis.Those variables consist of regression coefficients B, the blood variables (HB, RBC, MCH, WBC, MCV, HCT, MCHC, PLT), sex, and age.Therefore, the MLR shows the synergistic effect of predicting the types of anemia better than the ones used fewer blood variables.
The enter method of the MLR has been used in the current analysis.All the variables were introduced into the regression model as selected by the enter method of the MLR.
In the outcome of the current analysis, it has been found that there is a more significant relation (R 2 =0.699) of the MLR model.It means that 69.90% of the change in the relationship between all blood variables, sex, and age for the types of anemia is explained.
Thus, it is concluded that the regression model with the blood variables, sex, and age are seen to be significant (p < 0.000).That means simultaneous consideration of the blood variables, sex, and age has a significant effect on the relationship on the determination of the types of anemia (see Table 3).Beta value for the HB signifies that for every change in the HB, the dependent variable will be changed by the Beta coefficient value (see Table 4).
The t-test was used to measure the partial effect of the variables HB, RBC, MCH, WBC, MCV, HCT, MCHC, PLT, sex, and age on the types of anemia.Notice that these variables have been seen to affect the types of anemia but in varying rates (see Table 4).The histogram of the residuals which confirm that the data are distributed according to a normal distribution with a mean of zero and a standard deviation of 0.991 (see Figure 1).
To find out the extent of spread the random error around the linear regression model, the MLR use the mean square residuals, MSE=0.659(see Table 3).Small values of the MSE indicate the concentration of data around the linear regression model (see Figure 2).
In this study, comparing criteria are constructed on the principle of whether the technique provides a suitable prediction or not.This task is achieved by comparing with the deep learning method (LSTM).
The results demonstrate that the linear regression has the best fit to the initial dataset comparing to the deep learning method (LSTM) (see Table 5).Therefore, the present study provides an accurate model for prediction of the types of anemia.

Conclusions and Recommendation
This study has forecasted the types of anemia through biomedical information under the consideration of eight different blood variables, sex, and age of individuals.Multiple linear regression model, for the first time, have been derived in forecasting the types of anemia.The results revealed that the regression model is very promising and is capable of making the prediction.In the analysis of the current anemia problem, the multiple regression method has been found to be more accurate than linear deep learning methods.It has been concluded that the model is expected to be helpful for diagnosis of the types of anemia to health providers and designing an appropriate treatment programs for their patients.For future research, these mathematical model may be improved under the consideration of various computational methods.

Figure 1 .
Figure 1.Histogram plot of the residuals

Figure 2 .
Figure 2. Normal PP Plot of Regression Standardized Residual

Table 1 .
Some samples from the data 2.2.Multiple Linear Regression Model.Consider a multiple linear regression (MLR) model with k predictor variables, independent observations

Table 2 .
Various forms of the multiple linear models: blood variables, sex, and age

Table 3 .
Analysis of variance for the correlation in equation (3.2) Beta) compares the effect force of each individual blood variables, sex, and age to the types of anemia.It is thus given by StandardizedBeta j = B j * SD(X j )/SD(Y ) .

Table 4 .
Analysis of the multiple regression coefficients given in equation(3.2)

Table 5 .
Comparison of the MLR results with the results of the linear deep learning method