

It took some time, but I was able to switch to JAGS and figure out how to use the Poisson zeros trick to model a distribution that isn't included in JAGS  the skewed normal distribution. The code runs and returns a result. In order to make sure that it is working correctly, I've used rsn (random skewed normal) to create a data set with known parameters and then I see how closely the model comes to yielding those known parameters. I've used sample sizes of 50 and 500 and modified the priors and modeling parameters multiple times. (I've even "cheated" and set the initial values to the correct values, etc.) But the outputs do not seem close to me.
My parameters are:
Location = 0
Scale = 1
Shape = 0.5
Across multiple modeling runs, here are the parameter values I'm getting:
Location: Average = 0.63 (Std = 0.29)
Scale: Average = 0.48 (Std = 0.08)
Shape: Average = 0.98 (Std = 0.12)
Any thoughts?
For example, are these values actually the best I should expect?
Of course, it's possible that my equation is wrong, although I've checked it many times (and had someone else check it). So, I doubt this is the problem. But I'm willing to consider anything...
Any/all advice/suggestions would be most welcome! Thanks!
Gwen

Administrator

Without seeing the complete programming code, it is difficult to know what's really going on. But two general questions come to mind:
First, are you sure that your formula for the skewed normal (used to specify the loglikelihood for the Poisson zero) is parameterized the same way as the rsn random value generator you used? If they are parameterized differently, the results won't match.
Second, if the posterior distributions are skewed, then the means of the posterior won't necessarily match the generating values very well (I think). Try looking at the modes of the posteriors instead of the means.
On Fri, Aug 9, 2013 at 8:52 AM, Gwendolyn Campbell [via Doing Bayesian Data Analysis] <[hidden email]> wrote:
It took some time, but I was able to switch to JAGS and figure out how to use the Poisson zeros trick to model a distribution that isn't included in JAGS  the skewed normal distribution. The code runs and returns a result. In order to make sure that it is working correctly, I've used rsn (random skewed normal) to create a data set with known parameters and then I see how closely the model comes to yielding those known parameters. I've used sample sizes of 50 and 500 and modified the priors and modeling parameters multiple times. (I've even "cheated" and set the initial values to the correct values, etc.) But the outputs do not seem close to me.
My parameters are:
Location = 0
Scale = 1
Shape = 0.5
Across multiple modeling runs, here are the parameter values I'm getting:
Location: Average = 0.63 (Std = 0.29)
Scale: Average = 0.48 (Std = 0.08)
Shape: Average = 0.98 (Std = 0.12)
Any thoughts?
For example, are these values actually the best I should expect?
Of course, it's possible that my equation is wrong, although I've checked it many times (and had someone else check it). So, I doubt this is the problem. But I'm willing to consider anything...
Any/all advice/suggestions would be most welcome! Thanks!
Gwen
To start a new topic under Doing Bayesian Data Analysis, email [hidden email]
To unsubscribe from Doing Bayesian Data Analysis, click here.
NAML


Thanks, John, for getting back to me so quickly. I had thought of the issue of mismatched parameters, but it looks like the pdf I used for the skewed normal distribution (from wikipedia) and the rsn function use the same parameters... They certainly use the same parameter names and I aligned them in my code. I don't think this is it.
I'm not 100% sure I understand your second point. It may be that I didn't explain clearly how I got the numbers that I originally posted. Using one data set, I ran the model code 15 times. Each time I changed something about the modeling parameters  maybe the number of chains, maybe the initial values of the priors, etc. Each time I ran it, it output a value for each parameter (location, scale and shape) of the skewed normal distribution that it converged on. The averages that I reported were simply the averages across those 15 runs. Does that make sense? Is it relevant to your comment? If not, then I'm sorry  let me know and I'll try to make sense out of your comment again.
Thanks,
Gwen


PS  Here's the code:
(with alex=location, bob=scale and chip=shape)

require(rjags)
rm(list=ls(all=TRUE))
#
# THE MODEL.
modelstring = "
model {
for (i in 1:50) {
zeros[i] ~ dpois(phi[i])
phi[i] < log(2.5)+log(abs(pow(bob,2)/pow(chip,2)*exp(pow(chip,2))*(exp(pow(chip,2))1)))+log(abs(bobchip*(x[i]alex)))+pow((1/chip)*log(abs(1((1/bob)*(chip)*(x[i]alex))))(alexbob/chip*(exp((pow(chip,2))/2))1),2)/(2*pow((pow(bob,2)/pow(chip,2)*exp(pow(chip,2))*(exp((pow(chip,2)))1)),2))
}
alex ~ dunif(2,2)
bob ~ dunif(0.01,5)
chip ~ dunif(0.01,5)
}
" # close quote for modelstring
# Write model to a file, and send to BUGS:
writeLines(modelstring,con="model.txt")
#
# THE DATA.
# These data were created with parameter values [n=50, location=0, scale=1, shape=0.5]
dataList = list (
zeros = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
x = c(1.65534362, 1.33096199, 1.15361727, 1.14141452, 1.01578541,
0.88231001, 0.86824978, 0.66212271, 0.59999216, 0.47238434,
0.46603680, 0.38355513, 0.28087996, 0.19715436, 0.10505644,
0.01120428, 0.21384728, 0.22846082, 0.23366761, 0.26649210,
0.28282272, 0.32582633, 0.34812280, 0.42625461, 0.44375500,
0.54342404, 0.55071416, 0.57397426, 0.57490333, 0.66954006,
0.74401185, 0.76160107, 0.77146795, 0.78891622, 0.79762613,
0.92826248, 1.01350578, 1.07251810, 1.10103133, 1.24612441,
1.27405717, 1.28771869, 1.29966906, 1.35210939, 1.42059526,
1.50968752, 1.55416321, 1.55942491, 1.69682208, 1.92283079)
)
initsList = list(
alex = c(0),
bob = c(1),
chip = c(0.5)
)
#
# RUN THE CHAINS
parameters = c ("alex", "bob", "chip")
adaptSteps = 1000
burnInSteps = 1000
nChains = 7
numSavedSteps = 50000
thinSteps = 1
nIter = ceiling ( (numSavedSteps * thinSteps) / nChains)
jagsModel = jags.model( "model.txt" , data=dataList , inits=initsList ,
n.chains=nChains , n.adapt=adaptSteps )
cat("Burning in MCMC chain...\n")
update(jagsModel , n.iter=burnInSteps)
cat("Sampling final MCMC chain...\n")
codaSamples = coda.samples ( jagsModel , variable.names = parameters, n.iter=nIter, thin = thinSteps)
#
# EXAMINE THE RESULTS
mcmcChain = as.matrix( codaSamples )
chipSample = mcmcChain[,"chip"]
bobSample = mcmcChain[,"bob"]
alexSample = mcmcChain[,"alex"]
summary (codaSamples)

Administrator

To check that you can recover the generating parameters, you want to do a single Bayesian estimate for a very large data set. In other words, generate random data from the skewnormal using N=1,000 (say), and then run your Bayesian analysis once on that large data set. The estimated parameters should be reasonably close to the generating parameters. The larger the N, the closer the estimated parameters should be.
(In my previous message I was referring to how to summarize the posterior of that single Bayesian estimate. Perhaps look at the modes, not the means, of the marginal posteriors.) Also, you might be able to specify the negativelog likelihood of the skew normal more easily. JAGS has the standardized cumulative normal built in as the phi() function. It also claims to have the normal density built in as a function, not only as a distribution. Thus, you could specify the negloglike as
phi < log( (2/b) * dnorm( (xa)/b ,0,1) * phi( c*((xa)/b) ) ) I have used phi() in JAGS before, so I know it works. But I have not used dnorm() as a function, so I am not sure about that. If dnorm() does not work as a function, but only as a distribution, then the dnorm() part above would have to be typed out as the explicit Gaussian formula.
AND, it might work better in practice to distribute the log and take the sum (as you did): phi < log(2/b) + log(dnorm((xa)/b,0,1)) + log(phi(c*(xa)/b)) On Fri, Aug 9, 2013 at 1:00 PM, Gwendolyn Campbell [via Doing Bayesian Data Analysis] <[hidden email]> wrote:
PS  Here's the code:
(with alex=location, bob=scale and chip=shape)

require(rjags)
rm(list=ls(all=TRUE))
#
# THE MODEL.
modelstring = "
model {
for (i in 1:50) {
zeros[i] ~ dpois(phi[i])
phi[i] < log(2.5)+log(abs(pow(bob,2)/pow(chip,2)*exp(pow(chip,2))*(exp(pow(chip,2))1)))+log(abs(bobchip*(x[i]alex)))+pow((1/chip)*log(abs(1((1/bob)*(chip)*(x[i]alex))))(alexbob/chip*(exp((pow(chip,2))/2))1),2)/(2*pow((pow(bob,2)/pow(chip,2)*exp(pow(chip,2))*(exp((pow(chip,2)))1)),2))
}
alex ~ dunif(2,2)
bob ~ dunif(0.01,5)
chip ~ dunif(0.01,5)
}
" # close quote for modelstring
# Write model to a file, and send to BUGS:
writeLines(modelstring,con="model.txt")
#
# THE DATA.
# These data were created with parameter values [n=50, location=0, scale=1, shape=0.5]
dataList = list (
zeros = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
x = c(1.65534362, 1.33096199, 1.15361727, 1.14141452, 1.01578541,
0.88231001, 0.86824978, 0.66212271, 0.59999216, 0.47238434,
0.46603680, 0.38355513, 0.28087996, 0.19715436, 0.10505644,
0.01120428, 0.21384728, 0.22846082, 0.23366761, 0.26649210,
0.28282272, 0.32582633, 0.34812280, 0.42625461, 0.44375500,
0.54342404, 0.55071416, 0.57397426, 0.57490333, 0.66954006,
0.74401185, 0.76160107, 0.77146795, 0.78891622, 0.79762613,
0.92826248, 1.01350578, 1.07251810, 1.10103133, 1.24612441,
1.27405717, 1.28771869, 1.29966906, 1.35210939, 1.42059526,
1.50968752, 1.55416321, 1.55942491, 1.69682208, 1.92283079)
)
initsList = list(
alex = c(0),
bob = c(1),
chip = c(0.5)
)
#
# RUN THE CHAINS
parameters = c ("alex", "bob", "chip")
adaptSteps = 1000
burnInSteps = 1000
nChains = 7
numSavedSteps = 50000
thinSteps = 1
nIter = ceiling ( (numSavedSteps * thinSteps) / nChains)
jagsModel = jags.model( "model.txt" , data=dataList , inits=initsList ,
n.chains=nChains , n.adapt=adaptSteps )
cat("Burning in MCMC chain...\n")
update(jagsModel , n.iter=burnInSteps)
cat("Sampling final MCMC chain...\n")
codaSamples = coda.samples ( jagsModel , variable.names = parameters, n.iter=nIter, thin = thinSteps)
#
# EXAMINE THE RESULTS
mcmcChain = as.matrix( codaSamples )
chipSample = mcmcChain[,"chip"]
bobSample = mcmcChain[,"bob"]
alexSample = mcmcChain[,"alex"]
summary (codaSamples)
To start a new topic under Doing Bayesian Data Analysis, email [hidden email]
To unsubscribe from Doing Bayesian Data Analysis, click here.
NAML


Hey John,
I have already tried it with a sample size of 500 (without success), but I'm happy to do it with 1,000. I'll give it a try and see what happens. :)
And thanks for the simpler versions of the formula.
AND I'll revisit your previous email and figure out what's going on. ;)
I guess the thing that is troubling me is that our actual data sets contain fewer than 20 points each  so if it requires 1,000 data points to be able to get to the parameters of the underlying populations, then how much confidence can we have in any results we gain from our data? :(
Thanks again,
Gwen

Administrator

In testing your model and program, you use a single very large simulated data set, generated from known parameter values, because that large data set will yield a precise estimate of the parameters. If the estimated parameter values match the generating parameter values, you have evidence that your model and program are set up correctly. Try it for several different generating values of the parameters.
When you apply the Bayesian process to small data sets, the method reveals the uncertainty  but it specifies the uncertainty with great accuracy (i.e., the posterior distribution is well represented by a large MCMC chain) despite the smallness of the data set. This is a beauty of the Bayesian approach.
There are at least a couple ways to deal with small data sets. One is to incorporate prior knowledge into the model structure and/or parameter distribution priors. A second, which seems it might be useful for your data, is to use hierarchical structure that estimates individual and grouplevel parameters. You say your data contain 20 points "each", suggesting that you have many individual data sets. Use hierarchical structure across those data sets if you can.
On Fri, Aug 9, 2013 at 7:22 PM, Gwendolyn Campbell [via Doing Bayesian Data Analysis] <[hidden email]> wrote:
Hey John,
I have already tried it with a sample size of 500 (without success), but I'm happy to do it with 1,000. I'll give it a try and see what happens. :)
And thanks for the simpler versions of the formula.
AND I'll revisit your previous email and figure out what's going on. ;)
I guess the thing that is troubling me is that our actual data sets contain fewer than 20 points each  so if it requires 1,000 data points to be able to get to the parameters of the underlying populations, then how much confidence can we have in any results we gain from our data? :(
Thanks again,
Gwen
To start a new topic under Doing Bayesian Data Analysis, email [hidden email]
To unsubscribe from Doing Bayesian Data Analysis, click here.
NAML


Hey John,
Success! Your version of the equation  using both functions (dnorm & phi) worked and matched the 3 parameters for a data set with 1,000 points AND a data set for 500 points. It didn't do as well on a data set with only 50 points  as could be expected...
We have 5 sets of data, each with just shy of 20 data points, but at the moment we are particularly interested in comparing 3 of them.
We've run the oneway anova with nonhomogeneous variance JAGS code (assuming normal distribution)  but we don't think our data are distributed normally and want to run it again using this poisson zeros trick to allow a skewed normal distribution.
Now that I am confident that I've got a working formula  thank you!  I'll get to work incorporating it into the code. And I'll see about using a hierarchical approach, as you suggested.
Thanks again for all your detailed help  you may not remember, but you gave me quite a lot of help a couple of years ago when I was first trying to get R and BUGS installed and working together on my laptop. I don't know very many people who are so willing to give so much of their time to help strangers with newbie problems like mine... (One of our colleagues took one of your 2 week summer courses and that's how we learned about the Bayesian approach to analyzing statistical data in the first place.) So, thanks again! And I'll let you know what happens. :)
Take care,
Gwen

Administrator

Glad to hear it's working!
With only 5 data sets (of 20 points each), the hierarchical approach might not buy you much, unless the 5 data sets are remarkably similar to each other. But, you might give it a whirl just for the conceptual exercise!
And thanks for the nice remarks  appreciated!
John
On Mon, Aug 12, 2013 at 10:01 AM, Gwendolyn Campbell [via Doing Bayesian Data Analysis] <[hidden email]> wrote:
Hey John,
Success! Your version of the equation  using both functions (dnorm & phi) worked and matched the 3 parameters for a data set with 1,000 points AND a data set for 500 points. It didn't do as well on a data set with only 50 points  as could be expected...
We have 5 sets of data, each with just shy of 20 data points, but at the moment we are particularly interested in comparing 3 of them.
We've run the oneway anova with nonhomogeneous variance JAGS code (assuming normal distribution)  but we don't think our data are distributed normally and want to run it again using this poisson zeros trick to allow a skewed normal distribution.
Now that I am confident that I've got a working formula  thank you!  I'll get to work incorporating it into the code. And I'll see about using a hierarchical approach, as you suggested.
Thanks again for all your detailed help  you may not remember, but you gave me quite a lot of help a couple of years ago when I was first trying to get R and BUGS installed and working together on my laptop. I don't know very many people who are so willing to give so much of their time to help strangers with newbie problems like mine... (One of our colleagues took one of your 2 week summer courses and that's how we learned about the Bayesian approach to analyzing statistical data in the first place.) So, thanks again! And I'll let you know what happens. :)
Take care,
Gwen
To start a new topic under Doing Bayesian Data Analysis, email [hidden email]
To unsubscribe from Doing Bayesian Data Analysis, click here.
NAML


As a learning exercise I am trying to recreate your Skew Normal fitting method, but have not been able to get reasonable fits on the randomly generated SN data. Could you look at my code below to see if my model specification is correct?
I have not used the zeros trick before, but if I can get this to work, I may try expanding the model to a skewT distribution to allow for higher kurtosis modeling.
After specifying the data and zeros:
 rsnData<rsn(n=1000,xi=0,omega=1,alpha=0.5)
 rsnZeros<c(rep(0,length(rsnData)))
I ran the program for 500000 samples (i tried 10 times this too, but no change...)
The posterior plots of the sampled parameters are not even close to the true parameters.
Any thoughts? I would greatly appreciate some skilled advice!
code:
####################################################
# (with alex=location, bob=scale and chip=shape)

require(rjags)
library(sn)
# rm(list=ls(all=TRUE))
#
#old# phi[i] < log(2.5)+log(abs(pow(bob,2)/pow(chip,2)*exp(pow(chip,2))*(exp(pow(chip,2))1)))+log(abs(bobchip*(x[i]alex)))+pow((1/chip)*log(abs(1((1/bob)*(chip)*(x[i]alex))))(alexbob/chip*(exp((pow(chip,2))/2))1),2)/(2*pow((pow(bob,2)/pow(chip,2)*exp(pow(chip,2))*(exp((pow(chip,2)))1)),2))
# THE MODEL.
modelstring = "
model {
for (i in 1:50) {
zeros[i] ~ dpois(phi[i])
phi[i] < log(2/chip)+log(dnorm((x[i]alex)/bob,0,1))+log(phi(chip*(x[i]alex)/bob))
}
alex ~ dunif(2,2)
bob ~ dunif(0.01,5)
chip ~ dunif(0.01,5)
}
" # close quote for modelstring
# Write model to a file, and send to BUGS:
writeLines(modelstring,con="model.txt")
#
# THE DATA.
# These data were created with parameter values [n=50, location=0, scale=1, shape=0.5]
# define data in workspace# rsnData<rsn(n=1000,xi=0,omega=1,alpha=0.5)
# define zeros in workspace# rsnZeros<c(rep(0,length(rsnData)))
dataList = list (
zeros = rsnZeros,
x = rsnData
)
#cheated here with inits = to actual params
initsList = list(
alex = c(0),
bob = c(1),
chip = c(0.5)
)
#
# RUN THE CHAINS
parameters = c ("alex", "bob", "chip")
adaptSteps = 1000
burnInSteps = 1000
nChains = 7
numSavedSteps = 500000
thinSteps = 1
nIter = ceiling ( (numSavedSteps * thinSteps) / nChains)
jagsModel = jags.model( "model.txt" , data=dataList , inits=initsList ,
n.chains=nChains , n.adapt=adaptSteps )
cat("Burning in MCMC chain...\n")
update(jagsModel , n.iter=burnInSteps)
cat("Sampling final MCMC chain...\n")
codaSamples = coda.samples ( jagsModel , variable.names = parameters, n.iter=nIter, thin = thinSteps)
#
# EXAMINE THE RESULTS
mcmcChain = as.matrix( codaSamples )
chipSample = mcmcChain[,"chip"]
bobSample = mcmcChain[,"bob"]
alexSample = mcmcChain[,"alex"]
a.avg<mean(alexSample) ; plot(density(alexSample))
b.avg<mean(bobSample) ; plot(density(bobSample))
c.avg<mean(chipSample) ; plot(density(chipSample))
summary (codaSamples)
# plot the random data with known parameters
plot(density(rsnData), xlim=c(10,20))
xPlot< seq(10,20, legnth=5000)
ySn< dsn(xPlot, a.avg, b.avg,c.avg)
lines(ySn)
#plot actual
ySn.act<dsn(xPlot,xi=0,omega=1,alpha=0.5)
lines(ySn.act)
######################################

Administrator

I have not tried running your code, but looking it over suggests this thought: The argument to dpois(phi) must be positive. Therefore phi, which is log(pdf(dataparameters)), must be positive. To do that, add a constant that is big enough to make it positive for any data & parameter combination.
Sorry if you already did that and I didn't spot it in the code, or if log(pdf(dataparam)) is already always positive...


John,
I added the Cterm as you suggested to ensure phi>0, but am still getting a poor fit of the SN parameters. The image below represents the JAGs results (n=5000000 samples), with the blue line representing the fit SN, the red as the 'true' distribution, and the black as the normalized density of the generated rsn data.
The location parameter (alex) is close, but the other two parameter estimates do not even include the true values!
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
alex 1.03514 0.23274 0.16914 0.56997 1.35645
bob 2.94024 3.93393 4.40042 4.73211 4.97501
chip 0.01001 0.01006 0.01014 0.01029 0.01078
I am at a loss on what the issue could be. Any additional thoughts?
Thanks,
Robert

