Time Dependent Data Exploration And Preprocessing:
Doing It All by SAS.
Exploration and preprocessing methodology of transactional data,
transform the data into a multivariate time series and select an
adequate model for analysis.
Unlike time series data, where observations are equally spaced by
a specific time interval, in transactional data, observations are not
spaced with respect to any particular time period. Our approach is
illustrated using observations of length of stay (LOS) of a patient
at a hospital Emergency Department (ED). The challenges of analyzing
these data include autocorrelations of the observations, non-linearity,
and the fact that observations were not recoded at regular time
First, using the SAS procedure, PROC HPF, we transformed the
transactional data set into multivariate time series data. Next, a
series of specialized plots such as histograms, kernel density plots,
boxplots, time series plots, and correlograms were produced using
the SAS procedure PROC GPLOT to capture the essentials of the
data to discover relationships in the variables, and to select an
optimal model of analysis. As a result of this step by step
preprocessing methodology, adequate models of analysis of
LOS were identified and the dimension of the data set was
reduced from 3345 observations to only 256 observations.
Joseph Twagilimana, University of Louisville, Louisville, KY [PDF]
See also Hospital Length of Stay: Mean or Median Regression.