Main

June 9, 2012

SAS financial services modeling


The SAS financial services modeling group in San Diego, exploring ways they can take advantage of high-performance analytics and big data techniques to deliver more models, more quickly, and to more customers. This wasn't completely an academic exercise--the team in San Diego has added several new customers recently and have been looking for ways to boost productivity, so this is the perfect setup for our high-performance story. Perhaps you've seen Jim Davis' blog where he ponders what you can do with all the extra time savings that high-performance analytics offers ... provide service to more customers is one good idea!

August 3, 2009

SSPS, bought by IBM for $1.2 billion

I.B.M. took a big step to expand its fast-growing stable of data analysis offerings by agreeing on Tuesday to pay $1.2 billion to buy SPSS Inc., a maker of software used in statistical analysis and predictive modeling.

Other independent analytics software makers may well become takeover targets, said Mr. Evelson of Forrester. Among the candidates, he said, are Accelrys, Applied Predictive Technologies, Genalytics, InforSense, KXEN and ThinkAnalytics.

The broad consolidation wave in business intelligence software, analysts say, will bring increasing price pressure on some segments of the industry as major companies seek to increase their share of the market. And the open-source programming language for data analysis, R, is another source of price pressure on software suppliers.

"None of the consolidation purchases we've seen in the business intelligence industry have been fire sales," said Jim Davis, senior vice president of the SAS Institute, a private company based in Cary, N.C., that is the largest supplier of business intelligence and predictive analytics software.

Continue reading "SSPS, bought by IBM for $1.2 billion" »

May 5, 2007

proc freq

PROC FREQ

SAS proc FREQ output to a SAS data set.

Simple example

See also: Use SAS proc freq /sort freq to
show the most commonly held values
.

January 14, 2007

SAS proc freq /sort freq

Show the most commonly values first with PROC FREQ.


PROC FREQ DATA= auto ORDER=FREQ;
TABLES marque;
RUN;

Toyota 320
Honda 200
BMW 180
Audi 115

ats.ucla.edu/STAT/SAS/faq/format.htm

August 30, 2006

SAS Business Intellegence

SAS BI: administering statistics about money and inventory.
See also Business Intellegence SAS user group.

June 28, 2006

SAS v9 docs

SAS v9 docs.
The manual: both a solution and a product, it's support and documetation !

May 29, 2006

Time Dependent Data Exploration And Preprocessing

Time Dependent Data Exploration And Preprocessing:
Doing It All by SAS.

Exploration and preprocessing methodology of transactional data,
transform the data into a multivariate time series and select an
adequate model for analysis.

Unlike time series data, where observations are equally spaced by
a specific time interval, in transactional data, observations are not
spaced with respect to any particular time period. Our approach is
illustrated using observations of length of stay (LOS) of a patient
at a hospital Emergency Department (ED). The challenges of analyzing
these data include autocorrelations of the observations, non-linearity,
and the fact that observations were not recoded at regular time
intervals.

First, using the SAS procedure, PROC HPF, we transformed the
transactional data set into multivariate time series data. Next, a
series of specialized plots such as histograms, kernel density plots,
boxplots, time series plots, and correlograms were produced using
the SAS procedure PROC GPLOT to capture the essentials of the
data to discover relationships in the variables, and to select an
optimal model of analysis. As a result of this step by step
preprocessing methodology, adequate models of analysis of
LOS were identified and the dimension of the data set was
reduced from 3345 observations to only 256 observations.

Joseph Twagilimana, University of Louisville, Louisville, KY [PDF]

See also Hospital Length of Stay: Mean or Median Regression.

May 26, 2006

House Price CAPM

Cointegration and Error Correction Mechanism Approaches:
Estimating a Capital Asset Pricing Model (CAPM) for House
Price Index Returns with SAS

Many researchers erroneously use the framework of linear
regression models to analyze time series data when predicting
changes over time or when extrapolating from present conditions
to future conditions. Caution is needed when interpreting the results
of these regression models. Granger and Newbold (1974) discovered
the existence of ‘spurious regressions’ that can occur when the
variables in a regression are nonstationary. While these regressions
appear to look good in terms of having a high R2 and significant
t-statistics, the results are meaningless. Both analysis and modeling
of time series data require knowledge about the mathematical model
of the process.

This paper introduces a methodology that utilizes the power
of the SAS DATA STEP, and PROC X12
and REG procedures. The DATA STEP uses the SAS LAG and
DIF functions to manipulate the data and create an additional
set of variables including Home Price Index Returns (HPI_R1), first
differenced, and lagged first differenced. PROC X12 seasonally
adjusts the time series. Resulting variables are manipulated
further (1) to create additional variables that are tested for
stationarity, (2) to develop a cointegration model, and (3) to
develop an error correction mechanism modeled to determine
the short-run deviations from long-run equilibrium. The relevancy
of each variable created in the data step to time series analysis is
discussed. Of particular interest is the coefficient of the error
correction term that can be modeled in an error correction mechanism
to determine the speed at which the series returns to equilibrium. The
main finding is that Metropolitan Statistical Areas (MSAs) with very
slow shortrun acceleration paths to the equilibrium have higher
returns and risk associated with house price returns than
MSAs with very rapid speed-of-adjustment coefficients.

-- Ismail Mohamed and Theresa R. DiVenti, PDF.

May 25, 2006

Unobserved Components Model, Proc UCM

Underlying model and several of the features of Proc UCM, new in the
Econometrics and Time Series (ETS) module of SAS .

Time series data is generated by marketers as they monitor “sales by month”
and by medical researchers who collect vital sign information over time. This
technique is well suited to modeling the effect of interventions (drug administration
or a change in a marketing plan). This new procedure combines the flexibility of
Proc ARIMA with the ease of use and interpretability of Smoothing models.

UCM does not have the capability to easily model transfer functions, a useful
ARIMA function that is planned for Proc UCM.

An Animated Guide©: Proc UCM (Unobserved Components Model)
Russ Lavery, Contractor for ASG, Inc., PDF

May 22, 2006

Statespace is SAS

Statespace in SAS/ETS.

The STATESPACE procedure analyzes and forecasts multivariate
time series using the state space model. The STATESPACE procedure
is appropriate for jointly forecasting several related time series that
have dynamic interactions. By taking into account the autocorrelations
among the whole set of variables, the STATESPACE procedure may
give better forecasts than methods that model each series separately.

December 19, 2005

Npar1way, out

NPAR1WAY
npar1wy out

proc npar1way d data=mydata;
class y;
var x; run;

Produces the all powerful two-way KS statistic
where X is used to predict y, y=0,1.

October 15, 2005

Joint regression analysis

Joint regression analysis to study genotype-environmental interaction,
genotype effects and/or interaction effects within individual
environments are related to environmental effects.

The interaction sum of squares is divided into two parts:
* one part represents the heterogeneity of linear regression
coefficients while
* the second represents the pooled deviations from individual
regression lines.

R. J. (Bob) Baker

September 8, 2005

Hospital Length of Stay: Mean or Median Regression

Length of stay (LOS) is an important measure of hospital activity and
health care utilization, but its empirical distribution is often
positively skewed.

Median regression appears to be a suitable alternative to analyze
the clustered and positively skewed LOS, without transforming and
trimming the data arbitrarily.

Continue reading "Hospital Length of Stay: Mean or Median Regression" »

July 19, 2005

sas proc quantreg for quantile regression

Some PROC QUANTREG features are:

* Implements the simplex, interior point, and smoothing algorithms for
estimation

* Provides three methods to compute confidence intervals for the
regression quantile parameter: sparsity, rank, and resampling.

* Provides two methods to compute the covariance and correlation
matrices of the estimated parameters: an asymptotic method and a
bootstrap method

* Provides two tests for the regression parameter estimates: the Wald
test and a likelihood ratio test

* Uses robust multivariate location and scale estimates for leverage
point detection

* Multithreaded for parallel computing when multiple processors are
available

[PDF, *]

July 17, 2005

SAS examples with explanation at ucla.edu/stat/SAS/

SAS examples with explanation abound at UCLA: 1, 2.

July 14, 2005

sas proc sql joins data merging

data step merge statements corresponding to various joins.

INNER JOIN

merge data1 (in=a) data2 (in=b); by id;
if a and b;

LEFT JOIN

merge data1 (in=a) data2 (in=b); by id;
if a;

RIGHT JOIN

merge data1 (in=a) data2 (in=b); by id;
if b;

FULL JOIN
merge data1 (in=a) data2 (in=b); by id;

Continue reading "sas proc sql joins data merging" »

July 12, 2005

sas proc sql assign value to macro variable

Calculate a value and assign it to a macro variable with
sas proc sql


proc sql noprint;
select ssn format=9. into :ssnok
separated by ' ' from ssn_list;
quit;

%put &ssnok;
123456789 234567890 345678901 456789012

Continue reading "sas proc sql assign value to macro variable" »

June 22, 2005

Good software documentation requires code formatting

Higher Order JavaScript has a decent sample of code formating
for documentation.

CSS:

pre {
background-color: #e4f0e4;
margin-left: 1em;
border-top: 1px #d0d0d0 solid;
border-left: 1px #d0d0d0 solid;
padding: .6em 0;
}

What's the best way to represent SAS code ?

June 4, 2005

Bruce Gilsen, PROGRAM EFFICIENCY

Bruce Gilsen, Federal Reserve Board, offers advice on how
to program SAS efficiently.

Continue reading "Bruce Gilsen, PROGRAM EFFICIENCY" »

June 1, 2005

two models in one regression

In SAS, you can estimate two distinct models with one call to proc reg.

proc reg data=USPopulation outest=est tableout alpha=0.1;
m1: model Population=Year/noprint;
m2: model Population=Year YearSq/noprint;
proc print data=est;
run;

Continue reading "two models in one regression" »

April 2, 2005

date simulation

Want a list of week-ending dates simulated, starting
at 2005 March 28 and ending at 2005 Nov 21.

Find magic number 16515 by trial and error,
or a SAS date function.

Find magic nbumber 250 by trial and error,
or a SAS date function.

data plan;
datebase = 16515;
do i = 1 to 250 by 7;
datex = datebase + i;
week = round (1+i/7);
output;
end;
format datex mmddyy10.;
run;
proc print data = plan; var datex week; run;

Continue reading "date simulation" »

January 15, 2005

BLAST and SAS: String matching algorithms and their application

Was the author really Shakespeare?- String matching algorithms and
their application
Raymond Wan
Gilead Sciences, Inc.

Sting matching especially approximate (fuzzy) string matching is
important to a lot of different fields in computer science. It is
used in CRM, database cleaning, bibliometrics, and especially
bioinformatics. In fact, a large portion of the supercomputing
resources in the world is now devoted to an algorithm called BLAST
(Basic Local Alignment Search Tool), which is a fuzzy string matching
algorithm. At the heart of all these applications is the need to
measure how different two text strings are to each other. We will look
at two different ways to build a measure and how it can be
implemented in SAS.

Continue reading "BLAST and SAS: String matching algorithms and their application" »

December 28, 2004

Rick Aster / SAS programming info

Rick Aster's SAS info aka programming secrets:
Professional SAS Programming Shortcuts and Professional SAS Programming Logic.

December 25, 2004

SAS Proc Tabulate FAQ

SAS Proc Tabulate FAQ [ucla]
with a few axamples.

December 24, 2004

SAS ODS introduction

Use SAS to generate nice looking statistical report documents.

Excel (XLS) file

ods html file = "c:\temp\data.xls";

proc print data =new;run;
ods html close;

Web(HTML) file

ods html file = "body.html";

proc print data =new;run;
ods html close;

Continue reading "SAS ODS introduction" »

December 23, 2004

comp.soft-sys.sas SAS newsgroup

comp.soft-sys.sas SAS newsgroup at googlegroups.
Hands on statistical computing howto.

December 22, 2004

SAS blog

Weblogsinc's SAS blog is more about business than statistics.

Update: 2006 Dec 01:This SAS Weblog is no loger active, but archives are on line.
Demoted to blogroll4.

Update 2005 Jan 15: Welcome Weblogsinc SAS blog readers.
There are more Coruscation SAS blog items.

December 11, 2004

SAS percentiles not automatically calculated?

SAS tip: How do I obtain percentiles not automatically calculated?

proc univariate data=hsb noprint;
var write;
output out=percentiles1 pctlpts=33 45 80 to 90 by 2 pctlpre=P;
run;
proc print data=percentiles1;run;

December 4, 2004

Histograms , superimposed with fitted probability density curves

PROC CAPABILITY is a component of SAS/QC (Quality Control). The
features
described below are now available in PROC UNIVARIATE (part of base SAS).

# Histograms and comparative histograms. Optionally, these can be
superimposed with fitted probability density curves for various
distributions and kernel density estimates.

# Cumulative distribution function plots (cdf plots). Optionally,
these can be superimposed with specification limits and probability
distribution curves for various distributions.

# Quantile-quantile plots (Q-Q plots), probability plots, and
probability-probability plots (P-P plots). These plots facilitate the
comparison of a data distribution with various theoretical
distributions.

# Goodness-of-fit tests for a variety of distributions including the
normal.

# Statistical intervals (prediction, tolerance, and confidence
intervals) for a normal population.

# The ability to inset summary statistics and capability indices in
plots produced on a graphics device.

December 3, 2004

Paul Dickman

Paul Dickman has some good SAS tips for statistical programming
and handling datasets and simple graphics.

November 30, 2004

sconsig's SAS tips

SAS Consultants' Special Interest Group (sconsig)'s SAS tips,
a sample of SAS-L, SUGI, and support.sas.com.

November 29, 2004

SAS doc

SAS docs 9.1.2 (Official version)

Mirrors:
OK State SAS documentation.
Queens' SAS documentation.
Topics in SAS Programming [UNC]