Sensitivity Analysis for Survival Regression Models
1Hussain Jassim N., 2Low Heng Chin, and 3F.M. Abbas Alkarkhi
1,2School of Mathematical Sciences,
Universiti Sains Malaysia, 11800 Pulau Penang
3School of Industrial Technology,
Universiti Sains Malaysia, 11800 Pulau Penang
Sensitivity analysis (SA) plays a central role in a variety of statistical methodologies, including classification and discrimination, calibration, comparison and model selection. SA gives a simple model by identifying the importance of covariates, so a few important covariates will be included in the model based on their contribution in explaining the variation in the data. SA is the study of how the uncertainty in the output of a model (numerical or otherwise) can be apportioned to different sources of uncertainty in the model inputs. SA is hence considered by some as a prerequisite for a model building in any setting, and in any field where models are used. It allows the impact of different factors on response variable to be analyzed. It helps to explain the impact of different model structures. Furthermore, SA can be used to find out which subset of input factors accounts for most of the output variance. SA has been used extensively in linear regression models, but not in survival regression models. Also SA is an easy and useful method to screening variables in survival regression models. This study presents SA in survival regression models; an application in the medical field is used to illustrate it.
Applying Robust M-Regression in Modeling Oil Palm Yield
Zuhaimy Ismail1 & Azme Khamis2
1Department of Mathematic, Universiti Teknologi Malaysia, Malaysia.
2Center of Science Studies, Universiti Tun Hussein Onn, Malaysia.
This paper will discuss the used of multiple linear regression in oil palm oil yield modeling. The foliar nutrient compositions were used as independent variable and fresh fruit bunch as dependent variable. Outliers in a set of data will influence the modeling accuracy as well as the estimated parameters especially in statistical analysis. A statistical procedure is regarded as robust if it performs reasonably well even when the assumptions of the statistical model are not true. If we assume our data follow standard linear regression model, then least squares estimates and test perform quite well, but they are not robust when the present of the outlier in the data set. In this case we are interested on M-regression to model the yield data. Since the quantile-quantile plot shows the existing of outlier, we proposed to use robust M-regression to overcome the negative impact of outlier. The data used for this study are prived by The Malaysian Oil Palm Board (MPOB) taken from two of the estates in Peninsular Malaysia. The factors included in the data set were foliar composition and fresh fruit bunches (FFB) yield. The variables in foliar composition included percentage of nitrogen concentration (N), percentage of phosphorus concentration (P), percentage of potassium concentration (K), percentage of calcium concentration (Ca) and percentage of magnesium concentration (Mg). The N, P, K, Ca and Mg concentrations were considered as independent variables and the FBB yield as dependent variable. From this analysis, it shows that robust regression gives better results than conventional regression in modeling oil palm yield.
Keywords: Multiple Linear Regression; Robust M-Regression; Oil Palm Yield.
Using Logistic Regression to Determine the Sex of Spiderhunters (Family: Nectariniidae)
Charlie J.M. Laman, Siti Nurlydia binti Sazali and Mustafa Abdul Rahman
Department of Zoology,
Faculty of Resource Science and Technology,
Universiti Malaysia Sarawak,
94300 Kota Samarahan,
Spiderhunters (Family: Nectariniidae) are monomorphic birds. Sexing or sexual dimorphism of spiderhunters was investigated, based on measurements of the specimens’ seven external morphological characters (kept in the Sarawak Museum), and analyzed using logistic regression analyses. The dependent variable of logistic regression is binary or dichotomous, and can be represented by a binary indicator variable, taking the values of 0 and 1. Logistic regression is either the simple model (1 independent variable) or multivariate logistic model (two or more independent variables). A total of 8 species of spiderhunters, with 181 individuals (98 males, 83 females) were examined. Four prediction models were found with their respective parameter: bill length (BL) for little spiderhunter (Arachnothera longirostra), and wing length (A. modesta), respectively. However, the other species including thick-billed spiderhunter (A. crassirostris), spectacled spiderhunter (A. flavigaster), streaky-breasted spiderhunter (A. affinis) and whitehead’s spiderhunter (A. juliae) showed no significant differences of gender, in their external morphological characteristics. Deviances method was used to examine the goodness-of-fit of the 4 models; all models showed the favourable p-value, depicting that there is no evidence of a lack-of-fit and therefore the models obtained were appropriate. Overall, the percentages of correct predictions (correctly predicted specimens over total specimens) were 81.36%, 91.89%, 85.71% and 80.0%, respectively, for the four prediction models. Generally, spiderhunters showed that males are relatively larger than females in their selected external morphological characters, which may have resulted from natural selection and/or sexual selection.
Reliability Assessment of Corroding Pipeline – A Statistical and Probabilistic Approach
Norhazilan Md Noor
Fakulti Kejuruteraan Awam,
Universiti Teknologi Malaysia,
81310 Skudai, Johor.
Nowadays, the intelligent pig has become an important tool for in-line pipeline internal inspection. Nonetheless, lack of knowledge in the interpretation of metal loss pigging data due to corrosion may contribute to the inaccurate structural evaluation. The authors have used corrosion data gathered through repeated in-line inspections on offshore pipeline at different time to examine the relationships between the corrosion defect size and corrosion rate. The aim of statistical and probabilistic analysis on pigging data is to determine the most likely actual behavior of the metal loss pattern in terms of the type of distribution and the error severity. In order to provide an accurate statistical data for pipeline assessment process, appropriate analysis on this pigging data is necessary. The analysis starts with feature-to-feature data matching procedure based on repeated inspections over several years, followed by the statistical analysis of the matched data to examine the statistical distribution of corrosion dimension and corrosion growth rates. To reduce the embedded error within the data, a correction method has been introduced in the process. The approach of predicting of the future size of corrosion dimensions from previous pigging data is also highlighted. The results from data analysis procedure are the applied to evaluate the current and future integrity condition of corroded pipeline using Monte Carlo simulation method. The paper has demonstrated the application of statistic and probability method in data analysis such as Weibull plot, Chi-square test, Box and Muller method and Inverse transformation method so engineers can fully appreciate the importance of statistic and probability method in engineering fields.
Estimating the Intensity of Point Processes Models for Earthquake Occurrences
1,3Nurtiti Sunusi, 1Sutawanir Darwis, 2Wahyu Triyoso
1Statistics Research Group, Faculty of Mathematics & natural Sciences, Institut Teknologi Bandung, Indonesia.
2Geophysics Research Division, Faculty of Earth Sciences & Mineral Technology, Institut Teknologi Bandung, Indonesia.
3Mathematics Study Program, Faculty of Mathematics & Natural Sciences, Universitas Haluoleo, Kendari, South East Sulawesi, Indonesia.
The main task in earthquake prediction is to develop statistical model for analyzing the observation, so that we can evaluate the probability of earthquake occurring in a certain space-time-magnitude window. Earthquake is a physics phenomenon that appears at irregularly space and time. One of the stochastic models most suitable for describing physical phenomena like that is called point process. These processes are uniquely characterized by their conditional intensity, that is, the probability that an event will occur in the infinitesimal interval, given the history of the process up to . Once the conditional intensity function is given, the joint density distribution for the realization of occurrence data in (0,T) can be recorded, which is used to obtain the maximum likelihood estimates. Consequently, it is important to obtain good parametric models of conditional intensity function. The aim of this paper is estimating the conditional intensity of point processes models. In this paper, we consider the type of earthquake sequences description as a renewal process with sojourn time exponentially distributed. Our results show a promising direction of research of developing a heterogeneous point process.
Keywords: Point Processes; Renewal Process; Conditional Intensity.
Statistical Profiling of Low Employability Graduates in Malaysia: Feasible?
Lecturer, Faculty of Economics
Universiti Utara Malaysia
Using a panel data of 179 graduates of Universiti Utara Malaysia (UUM) and Universiti Tunku Abdul Rahman (UTAR), this paper estimates the statistical profiling models of low employability graduates with piecewise exponential and Weibull proportional hazard model. The estimated model suggest that the significant determinants of the Malaysia’s graduate unemployment duration are income support while unemployed, age, use of English as main communication language among friends, ethnicity, types of degree, father’s employment status and education level and time dependency. These determinants can be used to identify the group risk of being low employability graduates- those receive no income support while unemployed, young, do not use English as main communication language among friends, Malays, studied for degree other than UTAR Accounting, and those father’s employment other than self-employed and with low education level; whereas, the predicted hazard or survival function can be used to identify individual risk of being low employability. The estimated piecewise exponential and Weibull models are found to be correctly predicted and respectively, of the validation samples graduates. Thus, the piecewise exponential model with flexibility in baseline hazard specification, outperforms Weibull model. It is concluded that the implementation statistical profiling of low employability graduates in Malaysia, is feasible given the widely available of information technology.
The Effect of Imputing Missing SDs
Nik Ruzni Nik Idris
Kulliyah of Science
International Islamic University Malaysia
P.O. Box 10, 50728 Kuala Lumpur
Background and Objective
This paper examines the implication of (1) excluding studies with missing standard deviations (SDs) and (2) imputing the missing SDs, on the standard error (SE) of the overall Meta analysis estimate.
The SE of the estimates from the above scenarios were compared with those based on all studies. The SDs were assumed to be missing according to the following missing mechanism: (1) missing completely at random (MCAR) (2) The SDs are more likely to be missing in studies with small sample size small-size (3) The SDs are more likely o be missing in studies with large SDs (large-SD)
If the SDs are missing under MCAR and under small-size missing mechanism, imputation is a good approach. However, if the SDs were missing under large-SD missing mechanism, imputation leads to bias in the SE of the estimate. The estimates of the between-study variances from the imputed data were biased, resulting in overestimation of the SE of the estimate based on random effect model.
If the SDs are missing with MCAR or according to small-size missing mechanism, Multiple imputation is recommended as it takes into account the uncertainty due to imputation. If the non-reporting is due to larger size of SDs, the mean imputation is recommended as it produces the least bias SE of the estimates.
The Modified Spatial Interpolation Methods for Missing Rainfall Data in Malaysia
Shariffah Suhaila Syed Jamaludin1, M.D. Sayang2 & Abdul Aziz Jemain3
1Department of Mathematics, Faculty of Science, Universiti Teknologi Malaysia,
81310 Skudai, Johor, Malaysia.
2Center of Statistical Studies, Faculty of Information Technology and Quantitative Science,
Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia.
3School of Mathematical Sciences, Faculty of Science and Technology,
Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
A complete daily rainfall dataset with no missing values is highly demand in a variety of meteorological and hydrological purpose. In most situations, spatial interpolation techniques such as inverse distance and normal ratio methods were used for estimating missing rainfall values at a particular target station which based on the available rainfall values recorded at the neighbor stations. However, these two methods are found to be very useful in the case where the neighbor stations are very close and highly correlated with the target stations. In this study, several modification and improvement have been proposed to these methods in order to estimate the missing rainfall values at the target station using the information at the nearby stations. Four rain gauge stations at different locations are selected as the target stations to test the improvised methods. The result indicated that the modified methods improved the estimation of missing rainfall values at those target stations based on the Similarity Index, root Mean Square Error (RMSE) and Mean Absolute Error (MAE).
Cross – Sectional and Longitudinal Approaches in a Survival Mixture Model
Zarina Mohd Khalid
Department of Mathematics Faculty of Science,
Universiti Teknologi Malaysia,
Survival data modeling is one of main branches of medical statistics that specifically deals with time-to-event data. In particular, if the target population consists of long-term survivors, a survival mixture modeling approach should be more suitable in modeling the time to a certain event by including the fact that a group of patients will never experience the event of interest. A standard procedure in estimating the unknown parameters in such model is by using cross-sectional information recorded at any particular time point, usually during the first hospital visit. This study extends the standard procedure by considering information obtained longitudinal approach has resulted in estimators gaining better efficiencies and precisions.
Pipe Failure Probabilities of Water Distribution Systems
Syarifah Hidayah Syed Harun and Ismail bin Mohd
Department of Mathematics,
Faculty of Science and Technology, Universiti Malaysia Terengganu
Mangabang Telipot, 21030, Kuala Terengganu, Terengganu.
In this paper, we will describe two methods as we called Poisson method and Generic Expectation Function (GEF) method for using to find pipe failure probabilities of water distribution systems which is implicitly design by engineers. In reliability, one is concerned with system failure. In order to develop GEF method using means and coefficients of variation of input random variables through employing several probability distributions, normal and lognormal distributions are adopted. In this paper, 10 water distribution systems which are located in Terengganu, Malaysia have been used for illustrating the mentioned above methods from which the comparison can be discussed. Besides that, hydraulic simulation software, EPANET has been applied to get the input variables for each project. Failure probability of each pipe is focus on failure probability of pipe to fulfill the demand also pipe replacement probability.
Correction and Preparation of Continuously Measured Rain Gauge Data in Malaysia
Marlinda Abd. Malek1, Ismail Mohamad2 & Sobri Harun3
1Department of Civil Engineering, Universiti Tenaga Nasional, Km 7, Jalan Kajang- Puchong,
43009 Kajang, Selangor. Malaysia.
2Department of Mathematics, Faculty of Science, Universiti Teknologi Malaysia,
81310 Skudai.Johor. Malaysia.
3Faculty of Civil Engineering, Universiti Teknologi Malaysia, 81310 Skudai. Johor. Malaysia.
This paper is another effort in developing a statistical model to patch missing rainfall data. The model was developed and validated based on the past fifty years of observed hydrological data. Assuming that the missingness mechanism is Missing Completely At Random (MCAR), the model utilizes the basic theme of Expectation Maximization (EM) Algorithm, to repeatedly use complete-data methods to solve incomplete data problems. The technique of Nearest Neighbour (NNeigh) Imputation is a combined technique to overcome problems that are difficult or impossible for EM Algorithm. Supported with robust statistical evidences, the study have managed to secure the overall size of the data and proposed these methods to be the basis for preparing a clean and complete data set for public domain.
Keywords: Missing Rainfall Data; Missingness Mechanism; Missing Completely At Random; Expectation Maximization Algorithm; Nearest Neighbour Imputation.
Numerical Modelling of the 2004 Indonesian Tsunami along Peninsular Malaysia and North Sumatra due to a Time Dependent Source
Ahmad Izani Md Ismail
Universiti Sains Malaysia,
In a previous study (Roy et al. 2007), a nonlinear polar coordinate shallow water model was developed to compute different aspects of the 2004 Indonesian tsunami along North Sumatra in Indonesia and Penang Island in Peninsular Malaysia. In that study the initial tsunami wave was generated instantaneously in the source zone along the fault line and that was used as the initial condition of the model. But in reality, starting from the epicenter the rupture along the fault line occurred gradually northward with a rupture front speed 2-3km/s and whole process was completed in 500-600 s (Ni et al. 2005). Thus the initial disturbance of the sea surface along the source zone is also time dependent. In this study the model of Roy et al. (2007) has been used to simulate different aspects of the tsunami associated with Indonesian tsunami 2004 using a time dependent source. The computed results due to the time dependent source agree well with those of observations. A comparison between the responses due to time dependent source and its corresponding instantaneous version has also been carried out in order to test the efficiency of the instant source. The comparison shows that the responses due to the time dependent source have significant differences with those due to instantaneous version.
Half-Sweep Geometric Mean Method for Solution of Linear Fredholm Equation
M.S. Muthuvalu and Jumat Sulaiman
School of Science and Technology,
Universiti Malaysia Sabah,
Locked Bag 2073, 88999 Kota Kinabalu,
The objective of this paper is to examine the application of the Half-Sweep Geometric Mean (HSGM) method by using the half-sweep approximation equation based on quadrature formulas to solve linear integral equations of Fredholm type. The formulation and implementation of the Full-Sweep Geometric Mean (FSGM) and Half-Sweep Geometric Mean (HSGM) methods are also presented. Some numerical tests were carried out to show that the HSGM method is superior to the FSGM method.
Numerical Solution to Simulation of Time-Multiplexing Cellular Neural Network
R. Ponalagusamy and S. Senthilkumar,
Department of Mathematics,
National Institute of Technolofy,
620 015, Tamil Nadu,
This paper deals with a versatile algorithm for simulating CNN arrays and time multiplexing is implemented using numerical integration algorithm. The approach, time-multiplexing simulation, plays a pivotal role in the area of simulating hardware models and testing hardware implementation of CNN. Owing to hardware limitations in practical sense, it is not possible to have a one-one mapping between the CNN hardware processors and all the pixels of the image. This simulator provides a solution by processing the input image block by block, with the number of pixels in a block being the same as the number of CNN processors in the hardware. This article proposes an efficient pseudo code foe exploiting the latency properties of Cellular Neural Network along with well known RK-Fourth Order Embedded numerical integration algorithms. Simulation results and comparison have also been presented to show the efficiency of the Numerical Integration Algorithms. It is found that RK-Embedded Centroidal Mean outperforms well in comparison with the RK-Embedded Harmonic Mean and Embedded Contra-Harmonic Mean.