Quantcast

How to model situation, when what we observe is a sum of two (unknown) variables?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to model situation, when what we observe is a sum of two (unknown) variables?

Adam Ryczkowski
This post was updated on .
(This post I also posted on CrossValidated, http://stats.stackexchange.com/questions/60107/how-to-model-sum-of-two-values-in-jags and on JAGS official forum https://sourceforge.net/p/mcmc-jags/discussion/610036/thread/fd973859 . I hope that posting it in so many places I might reach a broader audience; it is relatively hard to get help with Bayesian statistics at this time of year... Of course when I learn anything new about this matter I'll update the post.)

Imagine, that we measured two values and we know, that in reality one measurement directly corresponds to the latent variable "s1", and the other measurement is in fact sum of two latent values: "s1" and unknown "s2". Given the measurements input[1] for "s1" and input[2] for "s1+s2" what is our belief about the distribution of "s1" and "s2" separately? We assume, that s1 and s2 are from normal distribution with the same variance. We measure input[i] together with its SE =0.2, that acts as a weight in this "fixed effects model".

The straightforward model in JAGS that came to my mind is this:

    var
       input[2], #Input
       tau,
       s1,
       s2
    model{
    input[1] ~ dnorm(s1,25)
    input[2] ~ dnorm(s1+s2,25)
    s1 ~ dnorm(0,tau)
    s2 ~ dnorm(0,tau)
    tau~dgamma(0.001,0.001)
    }

As I understand it, it states  something like this:

A) Model the variable s1 as mean of normal distribution so the input[1] is a probable outcome from it.
B) Favor s1 and s2 such, that input[2] is a probable outcome from normal distribution with mean s1+s2 and SD = 0.2.

Because SD for both normal distributions is the same, model should choose s1 and s2 such that input[1] and input[2] are (on average) equally plausible random samples from N(s1,0.2) and N(s1+s2,0.2).

Condition B) should make the sampler favoring adjusting s2 over s1, since it is the only place that puts any stochastic constraints on s2 (except from a vague prior), in contrast to s1, which is already
constrained to make input[1] plausible result from N(s1,0.2)

So I would expect the model to yield the mean(s1) ≈ input[1], and mean(s2) ≈ input[2] - mean(s1)...

Now what I get is something completely different. Where is the error in my understanding?

For example, setting the input[1]<-0.3 and input[2]<-0.9 produces s1 =0.485±0,254 and s2 =0,321±0,174 (mean ± SD), where I would I expect s1 ≈ 0.3 and s2 ≈ 0.6!

I used 120000 adaptation and burn-in steps. I sampled 4000000 (4 million) samples that were thinned with factor 400.

Maybe I should use the `dsum` probability distribution (a unique feature of JAGS)?

But then, this model does not run:

    var
       input[2], #Input
       tau,
       i1,
       i2
    model{
    input[1] ~ dnorm(i1,25)
    input[2] ~ dsum(i1,i2)
    i1 ~ dnorm(0,tau)
    i2 ~ dnorm(0,tau)
    tau~dgamma(0.001,0.001)
    }

I've got the following error:

>     RUNTIME ERROR:
>     Inconsistent arguments for logDensity in distribution dsum

And I am really not sure how to fix it, besides the fact that this model doesn't let me specify SE of the input[2].
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to model situation, when what we observe is a sum of two (unknown) variables?

John K. Kruschke
Administrator

You have a lot of details flying around in there, so I'll just throw out a quick reply about a few things that stand out.

First, in this model

    model{
    input[1] ~ dnorm(s1,25)
    input[2] ~ dnorm(s1+s2,25)
    s1 ~ dnorm(0,tau)
    s2 ~ dnorm(0,tau)
    tau~dgamma(0.001,0.001)
    }

why do you want the priors of s1 and s2 to have an estimated precision? With only two values (s1 and s2) influencing tau, there's not much point to estimating it. Just set tau to a small constant.

Second, in this model

  model{
    input[1] ~ dnorm(i1,25)
    input[2] ~ dsum(i1,i2)
    i1 ~ dnorm(0,tau)
    i2 ~ dnorm(0,tau)
    tau~dgamma(0.001,0.001)
    }

there is an error because i2 is a precision, which must be positive, but its prior is dnorm, which allows negative values.


On Mon, May 27, 2013 at 1:46 PM, Adam Ryczkowski [via Doing Bayesian Data Analysis] <[hidden email]> wrote:
(This post I also posted on CrossValidated, http://stats.stackexchange.com/questions/60107/how-to-model-sum-of-two-values-in-jags . I hope that posting it here I might reach a different audience; it is relatively hard to get help with Bayesian statistics... If I know answer from other sources I'll post the update here)

Imagine, that we measured two values and we know, that in reality one measurement directly corresponds to the latent variable "s1", and the other measurement is in fact sum of two latent values: "s1" and unknown "s2". Given the measurements input[1] for "s1" and input[2] for "s1+s2" what is our belief about the distribution of "s1" and "s2" separately? We assume, that s1 and s2 are from normal distribution with the same variance. We measure input[i] together with its SE $=0.2$, that acts as a weight in this "fixed effects model".

The straightforward model in JAGS that came to my mind is this:

    var
       input[2], #Input
       tau,
       s1,
       s2
    model{
    input[1] ~ dnorm(s1,25)
    input[2] ~ dnorm(s1+s2,25)
    s1 ~ dnorm(0,tau)
    s2 ~ dnorm(0,tau)
    tau~dgamma(0.001,0.001)
    }

As I understand it, it states  something like this:

A) Model the variable s1 as mean of normal distribution so the input[1] is a probable outcome from it.
B) Favor s1 and s2 such, that input[2] is a probable outcome from normal distribution with mean s1+s2 and SD = 0.2.

Because SD for both normal distributions is the same, model should choose s1 and s2 such that input[1] and input[2] are (on average) equally plausible random samples from N(s1,0.2) and N(s1+s2,0.2).

Condition B) should make the sampler favoring adjusting s2 over s1, since it is the only place that puts any stochastic constraints on s2 (except from a vague prior), in contrast to s1, which is already
constrained to make input[1] plausible result from N(s1,0.2)

So I would expect the model to yield the mean(s1) ≈ input[1], and mean(s2) ≈ input[2] - mean(s1)...

Now what I get is something completely different. Where is the error in my understanding?

For example, setting the input[1]<-0.3 and input[2]<-0.9 produces s1 =0.485±0,254 and s2 =0,321±0,174 (mean ± SD), where I would I expect s1 ≈ 0.3 and s2 ≈ 0.6!

I used 120000 adaptation and burn-in steps. I sampled 4000000 (4 million) samples that were thinned with factor 400.

Maybe I should use the `dsum` probability distribution (a unique feature of JAGS)?

But then, this model does not run:

    var
       input[2], #Input
       tau,
       s1,
       s2
    model{
    input[1] ~ dnorm(i1,25)
    input[2] ~ dsum(i1,i2)
    i1 ~ dnorm(0,tau)
    i2 ~ dnorm(0,tau)
    tau~dgamma(0.001,0.001)
    }

I've got the following error:

>     RUNTIME ERROR:
>     Inconsistent arguments for logDensity in distribution dsum

And I am really not sure how to fix it, besides the fact that this model doesn't let me specify SE of the input[2].


To start a new topic under Doing Bayesian Data Analysis, email [hidden email]
To unsubscribe from Doing Bayesian Data Analysis, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to model situation, when what we observe is a sum of two (unknown) variables?

Adam Ryczkowski
Thank you very much for your time and answer

>Second, in this model
>
>  model{
>    input[1] ~ dnorm(i1,25)
>    input[2] ~ dsum(i1,i2)
>    i1 ~ dnorm(0,tau)
>    i2 ~ dnorm(0,tau)
>    tau~dgamma(0.001,0.001)
>   }
>
>there is an error because i2 is a precision, which must be positive, but its prior is dnorm, which allows >negative values.


Please tell me, why do you think I should treat the i2 as precision? The JAGS 3.3.0 manual says nothing about any special meaning of the position of the parameters to the dsum "distribution". Although this part of the manual is not very detailed, I thought that it goes without saying that in "dsum" arguments are exchangeable. Or did you mistake dsum with dnorm distribution?

It's true that I'd very much like to have some sort of precision for sum. But I have no idea how to model it.

Anyway, I filled a bug report to JAGS about the cryptic error message with dsum.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to model situation, when what we observe is a sum of two (unknown) variables?

John K. Kruschke
Administrator

Sorry, I misread "dsum" as "dnorm" like the first model. The perils of reading too fast...  I've never used dsum in my own models.



On Mon, May 27, 2013 at 3:03 PM, Adam Ryczkowski [via Doing Bayesian Data Analysis] <[hidden email]> wrote:
Thank you very much for your time and answer

>Second, in this model
>
>  model{
>    input[1] ~ dnorm(i1,25)
>    input[2] ~ dsum(i1,i2)
>    i1 ~ dnorm(0,tau)
>    i2 ~ dnorm(0,tau)
>    tau~dgamma(0.001,0.001)
>   }
>
>there is an error because i2 is a precision, which must be positive, but its prior is dnorm, which allows >negative values.


Please tell me, why do you think I should treat the i2 as precision? The JAGS 3.3.0 manual says nothing about any special meaning of the position of the parameters to the dsum "distribution". Although this part of the manual is not very detailed, I thought that it goes without saying that in "dsum" arguments are exchangeable. Or did you mistake dsum with dnorm distribution?

It's true that I'd very much like to have some sort of precision for sum. But I have no idea how to model it.

Anyway, I filled a bug report to JAGS about the cryptic error message with dsum.


To start a new topic under Doing Bayesian Data Analysis, email [hidden email]
To unsubscribe from Doing Bayesian Data Analysis, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to model situation, when what we observe is a sum of two (unknown) variables?

John K. Kruschke
Administrator
In reply to this post by Adam Ryczkowski

P.S. It also seemed strange to me that you would be modeling only two fixed data values, input[1] and input[2]. But I suppose you're doing that just for testing an example...?


On Mon, May 27, 2013 at 1:46 PM, Adam Ryczkowski [via Doing Bayesian Data Analysis] <[hidden email]> wrote:
(This post I also posted on CrossValidated, http://stats.stackexchange.com/questions/60107/how-to-model-sum-of-two-values-in-jags . I hope that posting it here I might reach a different audience; it is relatively hard to get help with Bayesian statistics... If I know answer from other sources I'll post the update here)

Imagine, that we measured two values and we know, that in reality one measurement directly corresponds to the latent variable "s1", and the other measurement is in fact sum of two latent values: "s1" and unknown "s2". Given the measurements input[1] for "s1" and input[2] for "s1+s2" what is our belief about the distribution of "s1" and "s2" separately? We assume, that s1 and s2 are from normal distribution with the same variance. We measure input[i] together with its SE $=0.2$, that acts as a weight in this "fixed effects model".

The straightforward model in JAGS that came to my mind is this:

    var
       input[2], #Input
       tau,
       s1,
       s2
    model{
    input[1] ~ dnorm(s1,25)
    input[2] ~ dnorm(s1+s2,25)
    s1 ~ dnorm(0,tau)
    s2 ~ dnorm(0,tau)
    tau~dgamma(0.001,0.001)
    }

As I understand it, it states  something like this:

A) Model the variable s1 as mean of normal distribution so the input[1] is a probable outcome from it.
B) Favor s1 and s2 such, that input[2] is a probable outcome from normal distribution with mean s1+s2 and SD = 0.2.

Because SD for both normal distributions is the same, model should choose s1 and s2 such that input[1] and input[2] are (on average) equally plausible random samples from N(s1,0.2) and N(s1+s2,0.2).

Condition B) should make the sampler favoring adjusting s2 over s1, since it is the only place that puts any stochastic constraints on s2 (except from a vague prior), in contrast to s1, which is already
constrained to make input[1] plausible result from N(s1,0.2)

So I would expect the model to yield the mean(s1) ≈ input[1], and mean(s2) ≈ input[2] - mean(s1)...

Now what I get is something completely different. Where is the error in my understanding?

For example, setting the input[1]<-0.3 and input[2]<-0.9 produces s1 =0.485±0,254 and s2 =0,321±0,174 (mean ± SD), where I would I expect s1 ≈ 0.3 and s2 ≈ 0.6!

I used 120000 adaptation and burn-in steps. I sampled 4000000 (4 million) samples that were thinned with factor 400.

Maybe I should use the `dsum` probability distribution (a unique feature of JAGS)?

But then, this model does not run:

    var
       input[2], #Input
       tau,
       s1,
       s2
    model{
    input[1] ~ dnorm(i1,25)
    input[2] ~ dsum(i1,i2)
    i1 ~ dnorm(0,tau)
    i2 ~ dnorm(0,tau)
    tau~dgamma(0.001,0.001)
    }

I've got the following error:

>     RUNTIME ERROR:
>     Inconsistent arguments for logDensity in distribution dsum

And I am really not sure how to fix it, besides the fact that this model doesn't let me specify SE of the input[2].


To start a new topic under Doing Bayesian Data Analysis, email [hidden email]
To unsubscribe from Doing Bayesian Data Analysis, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to model situation, when what we observe is a sum of two (unknown) variables?

Adam Ryczkowski
In reply to this post by Adam Ryczkowski
>You have a lot of details flying around in there, so I'll just throw out a quick reply about a few things that
>stand out.
>
>First, in this model
>
>    model{
>    input[1] ~ dnorm(s1,25)
>    input[2] ~ dnorm(s1+s2,25)
>    s1 ~ dnorm(0,tau)
>    s2 ~ dnorm(0,tau)
>    tau~dgamma(0.001,0.001)
>    }
>why do you want the priors of s1 and s2 to have an estimated precision? With only two values (s1 and
>s2) influencing tau, there's not much point to estimating it. Just set tau to a small constant.

I try to model Bayesian Network Meta-Analysis (NMA) for Hazard Rate (HR) data (survival analysis, proportional hazard rates model). This type of NMA is different to the one in e.g. Gelman A.,Carlin J.B.,Stern H.S.,Rubin D.B. - Bayesian Data Analysis(2003) in that all data I have is the estimate of the _difference_ of log Hazard Rate between groups (as opposed to estimates of each treatment group separately).

So what data tell me is the measured sum (actually the difference) between latent effects of treatment A vs B, (and B vs G, G vs X, A vs X and so on) together with reported SE of the estimate.

The model above is a minimal model that illustrate the principle of the problem of NMA fixed effects for HR:
Knowing bunch of values with standard errors of differences between pair of latent (unknown) parameters describing different combinations of treatments, what is the best estimate of the difference between treatment X and Y? Is the difference greater than zero?
Loading...