Contingency tables

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Contingency tables

paulscutti

Dear Dr Kruschke,


I read your book Doing Bayesian Data Analysis and I found it very well-written and terribly interesting, something which I felt compelled to share on Goodreads (see my short review here: http://www.goodreads.com/book/show/9003187-doing-bayesian-data-analysis).


I was wondering if you can help me with some results I am getting when applying Bayesian analysis to contingency tables. Tables 1 and 2 show the results of a simple Chi-Square test for a sample of 129 people. The idea behind this test is to understand whether there are significant differences in how male and female consumers took different paths in the types of websites they visited while researching a particular product on the Internet.


Usually I look at the adjusted residual to see where the differences are, and in this case there are three instances where the adjusted residuals are outside the ±1.96 threshold.


I used the same data with your code (PoissonExponentialJagsSTZ.R), but I am not getting any parameter with values credibly different than zero (see the code at the end of this email in case you would want to look at the results for yourself).


Is this because the sample is quite small (particularly for all the categories other than “Search > Brands”)?

If that is the case, can I trust the Chi-Square test even when using it properly (i.e. less than 20% of cells with expected count less than 5; sample size not greater than 300, etc)?.

I am looking forward to hearing from you.


 

Kind regards,

Paul Scutti


 


Table 1: Types of Internet paths by sex


 

 

 

Sex

Total

 

 

 

Male

Female

Path

Search > Brands

Count

25

41

66

Expected Count

30.7

35.3

66.0

Adjusted Residual

-2.0

2.0

 

Search > Brands > Aggregator

Count

11

3

14

Expected Count

6.5

7.5

14.0

Adjusted Residual

2.5

-2.5

 

Search > Aggregator > Brands

Count

4

10

14

Expected Count

6.5

7.5

14.0

Adjusted Residual

-1.4

1.4

 

Brands

Count

10

4

14

Expected Count

6.5

7.5

14.0

Adjusted Residual

2.0

-2.0

 

Brands > Search

Count

5

6

11

Expected Count

5.1

5.9

11.0

Adjusted Residual

.0

.1

 

Search > Brands > Research

Count

5

5

10

Expected Count

4.7

5.3

10.0

Adjusted Residual

.2

-.2

 

Total

 

Count

60

69

129

Expected Count

60.0

69.0

129.0


Table 2: Chi-Square tests


 

Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

13.120a

5

.022

Likelihood Ratio

13.553

5

.019

Linear-by-Linear Association

1.331

1

.249

N of Valid Cases

129

 

 

a. 1 cells (8.3%) have expected count less than 5. The minimum expected count is 4.65.


 

 

# THE DATA.


# Specify data source:


dataSource = c( "PathsSex" , "CrimeDrink" , "Toy" )[1]


 


# Load the data:


if ( dataSource == "PathsSex" ) {


  fileNameRoot = paste( fileNameRoot , dataSource , sep="" )


  dataFrame = data.frame( # from Snee (1974)


    Freq = c(25,11,4,10,5,5,41,3,10,4,6,5) ,


    Sex  = c("Male","Male","Male","Male","Male","Male","Female","Female","Female","Female","Female","Female"),


    Paths = c("Search > Brands","Search > Brands > Aggregator","Search > Aggregator > Brands","Brands","Brands > Search","Search > Brands > Research","Search > Brands","Search > Brands > Aggregator","Search > Aggregator > Brands","Brands","Brands > Search","Search > Brands > Research") )


  y = as.numeric(dataFrame$Freq)


  x1 = as.numeric(dataFrame$Sex)


  x1names = levels(dataFrame$Sex)


  x2 = as.numeric(dataFrame$Paths)


  x2names = levels(dataFrame$Paths)


  Ncells = length(y)


  Nx1Lvl = length(unique(x1))


  Nx2Lvl = length(unique(x2))


  normalize = function( v ){ return( v / sum(v) ) }


 


}


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Contingency tables

John K. Kruschke
Administrator
Hi. Just some quick thoughts:

With the small N and large number of cells, it would take big differences for them to show up as a credibly nonzero.

I see that at least one of the expected counts in the chi-square analysis is less than 5, so your NHST software will throw you a warning that the p value might be off (more than usual).

In a Bayesian analysis, you don't have to look at individual cells only. Instead look at contrasts (differences) between cells, and interaction contrasts, in any way that makes sense. (This approach is a bit of a fishing expedition, so you would want to follow up with some confirmatory research, but there is also some shrinkage because of the hierarchical prior.)




John K. Kruschke, Professor
Doing Bayesian Data Analysis
The book: http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/

The blog: http://doingbayesiandataanalysis.blogspot.com/





On Wed, Aug 14, 2013 at 8:38 PM, paulscutti [via Doing Bayesian Data Analysis] <[hidden email]> wrote:

Dear Dr Kruschke,


I read your book Doing Bayesian Data Analysis and I found it very well-written and terribly interesting, something which I felt compelled to share on Goodreads (see my short review here: http://www.goodreads.com/book/show/9003187-doing-bayesian-data-analysis).


I was wondering if you can help me with some results I am getting when applying Bayesian analysis to contingency tables. Tables 1 and 2 show the results of a simple Chi-Square test for a sample of 129 people. The idea behind this test is to understand whether there are significant differences in how male and female consumers took different paths in the types of websites they visited while researching a particular product on the Internet.


Usually I look at the adjusted residual to see where the differences are, and in this case there are three instances where the adjusted residuals are outside the ±1.96 threshold.


I used the same data with your code (PoissonExponentialJagsSTZ.R), but I am not getting any parameter with values credibly different than zero (see the code at the end of this email in case you would want to look at the results for yourself).


Is this because the sample is quite small (particularly for all the categories other than “Search > Brands”)?

If that is the case, can I trust the Chi-Square test even when using it properly (i.e. less than 20% of cells with expected count less than 5; sample size not greater than 300, etc)?.

I am looking forward to hearing from you.


 

Kind regards,

Paul Scutti


 


Table 1: Types of Internet paths by sex


 

 

 

Sex

Total

 

 

 

Male

Female

Path

Search > Brands

Count

25

41

66

Expected Count

30.7

35.3

66.0

Adjusted Residual

-2.0

2.0

 

Search > Brands > Aggregator

Count

11

3

14

Expected Count

6.5

7.5

14.0

Adjusted Residual

2.5

-2.5

 

Search > Aggregator > Brands

Count

4

10

14

Expected Count

6.5

7.5

14.0

Adjusted Residual

-1.4

1.4

 

Brands

Count

10

4

14

Expected Count

6.5

7.5

14.0

Adjusted Residual

2.0

-2.0

 

Brands > Search

Count

5

6

11

Expected Count

5.1

5.9

11.0

Adjusted Residual

.0

.1

 

Search > Brands > Research

Count

5

5

10

Expected Count

4.7

5.3

10.0

Adjusted Residual

.2

-.2

 

Total

 

Count

60

69

129

Expected Count

60.0

69.0

129.0


Table 2: Chi-Square tests


 

Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

13.120a

5

.022

Likelihood Ratio

13.553

5

.019

Linear-by-Linear Association

1.331

1

.249

N of Valid Cases

129

 

 

a. 1 cells (8.3%) have expected count less than 5. The minimum expected count is 4.65.


 

 

# THE DATA.


# Specify data source:


dataSource = c( "PathsSex" , "CrimeDrink" , "Toy" )[1]


 


# Load the data:


if ( dataSource == "PathsSex" ) {


  fileNameRoot = paste( fileNameRoot , dataSource , sep="" )


  dataFrame = data.frame( # from Snee (1974)


    Freq = c(25,11,4,10,5,5,41,3,10,4,6,5) ,


    Sex  = c("Male","Male","Male","Male","Male","Male","Female","Female","Female","Female","Female","Female"),


    Paths = c("Search > Brands","Search > Brands > Aggregator","Search > Aggregator > Brands","Brands","Brands > Search","Search > Brands > Research","Search > Brands","Search > Brands > Aggregator","Search > Aggregator > Brands","Brands","Brands > Search","Search > Brands > Research") )


  y = as.numeric(dataFrame$Freq)


  x1 = as.numeric(dataFrame$Sex)


  x1names = levels(dataFrame$Sex)


  x2 = as.numeric(dataFrame$Paths)


  x2names = levels(dataFrame$Paths)


  Ncells = length(y)


  Nx1Lvl = length(unique(x1))


  Nx2Lvl = length(unique(x2))


  normalize = function( v ){ return( v / sum(v) ) }


 


}





If you reply to this email, your message will be added to the discussion below:
http://doing-bayesian-data-analysis.12272.x6.nabble.com/Contingency-tables-tp5000715.html
To start a new topic under Doing Bayesian Data Analysis, email [hidden email]
To unsubscribe from Doing Bayesian Data Analysis, click here.
NAML

Loading...