I worked up a version of linear regression in STAN. It is fairly similar to my recent logistic regression code. The model is

modelstring = "

data {

int<lower=0> nPredictors;

int<lower=0> nData;

real y[nData];

matrix[nData,nPredictors] x;

vector[nData] b0vec;

}

parameters {

real b0;

vector[nPredictors] b;

real<lower=0> tau;

}

transformed parameters {

vector[nData] mu;

vector[nData] offset;

offset <- b0vec * b0;

mu <- x * b + offset;

}

model {

b0 ~ normal(0,3.2);

tau ~ gamma(0.1,0.1);

for (d in 1:nPredictors)

b[d] ~ normal(0,2);

y ~ normal(mu, 1/sqrt(tau));

}

"

My aim was to use a vectorized calculation of y. Notice that there are no indices in y~normal(mu,1/sqrt(tau)). This drove me to use transformed parameters as shown. I timed two runs of this model at 4 min 30s and 3min 45s on a data set with 25,000 rows. Unlike the logistic model, the Jags version, straight out of DBDA, ran nearly as quickly at 5 min. 40s. The burn-in and total saved steps were the same and the STAN time included compiling the code.

I'll include the complete code as an attachment. I left in some of the code that needed modification in the form of comments starting with two hash marks. The main changes were caused by:

the need, or choice, to not read the y data as a matrix

the different structure of the extracted samples.

I tried various versions of the model and they all ran in about the same time. I don't claim this one is the best or even good.

Francis Campos

MultLinearcode_v6.txt