STAN version of multiple linear regression

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

STAN version of multiple linear regression

Francis Campos
I worked up a version of linear regression in STAN. It is fairly similar to my recent logistic regression code. The model is
modelstring = "
data {
  int<lower=0> nPredictors;
  int<lower=0> nData;
  real y[nData];
  matrix[nData,nPredictors] x;
  vector[nData] b0vec;
parameters {
  real b0;
  vector[nPredictors] b;
  real<lower=0> tau;
transformed parameters {
  vector[nData] mu;
  vector[nData] offset;
  offset <- b0vec * b0;
  mu <- x * b + offset;
model {
  b0 ~ normal(0,3.2);
  tau ~ gamma(0.1,0.1);
  for (d in 1:nPredictors)
    b[d] ~ normal(0,2);
  y ~ normal(mu, 1/sqrt(tau));

My aim was to use a vectorized calculation of y. Notice that there are no indices in y~normal(mu,1/sqrt(tau)). This drove me to use transformed parameters as shown. I timed two runs of this model at 4 min 30s and 3min 45s on a data set with 25,000 rows. Unlike the logistic model, the Jags version, straight out of DBDA, ran nearly as quickly at 5 min. 40s. The burn-in and total saved steps were the same and the STAN time included compiling the code.

I'll include the complete code as an attachment. I left in some of the code that needed modification in the form of comments starting with two hash marks. The main changes were caused by:
the need, or choice, to not read the y data as a matrix
the different structure of the extracted samples.

I tried various versions of the model and they all ran in about the same time. I don't claim this one is the best or even good.

Francis Campos