Time Dependent Data Exploration And Preprocessing:

Doing It All by SAS.

Exploration and preprocessing methodology of transactional data,

transform the data into a multivariate time series and select an

adequate model for analysis.

Unlike time series data, where observations are equally spaced by

a specific time interval, in transactional data, observations are not

spaced with respect to any particular time period. Our approach is

illustrated using observations of length of stay (LOS) of a patient

at a hospital Emergency Department (ED). The challenges of analyzing

these data include autocorrelations of the observations, non-linearity,

and the fact that observations were not recoded at regular time

intervals.

First, using the SAS procedure, PROC HPF, we transformed the

transactional data set into multivariate time series data. Next, a

series of specialized plots such as histograms, kernel density plots,

boxplots, time series plots, and correlograms were produced using

the SAS procedure PROC GPLOT to capture the essentials of the

data to discover relationships in the variables, and to select an

optimal model of analysis. As a result of this step by step

preprocessing methodology, adequate models of analysis of

LOS were identified and the dimension of the data set was

reduced from 3345 observations to only 256 observations.

Joseph Twagilimana, University of Louisville, Louisville, KY [PDF]

See also Hospital Length of Stay: Mean or Median Regression.