yrieithydd

If you'd told me 3 weeks ago that I'd be voluntarily learning statistics and even enjoying it, I wouldn't have believe you. I blame

emperor. I think I owe

leonato an apology for being horrible about his subject. Unlike when I did (small amounts of) statistics in GCSE Maths, I have data I am interested in analysing and therefore stats has a point! I'm still not convinced it's Maths though and think that maybe it would be better taught in a subject other than Maths, like Geography when you're doing a project involving data (which is why I picked geography as our coursework involved data!). But I suppose the problem with that is that not everyone in a geography class will necessarily have the same amount of maths.*

On the whole I'm following it, but the following has left me confused as it seems absurd. A test result has been worked out to be -1.74 (this is for a Wilcoxon sign-rank test with sufficient pairs (35) for the distribution of W (the test result) to be almost normal). The author (Butler) writes:

We can ignore the sign, which is normally, negative. We know that values of 1.96 and 1.64 are required at the 5 per cent level for a non-directional or directional test, respectively. Therefore if our test was non-directional we must conclude that no significant difference has been demonstrated. If, on the other hand, we had made a direction prediction, we could claim significance at the 5 per cent level.

As I understand it, this is saying that if we were just trying to say that there was a difference between the two groups we cannot, but that if we had predicted that one group would be bigger/have done better than the other we cannot. This seems completely stupid because surely it ought to be easier to demonstrate a difference than which direction it is! Saying, we cannot be sure that it's different, but we can be sure it is higher makes no sense. Where have I gone wrong? I've stared at this too much now and am confused.

There's also an irritating point where the reasoning is left out!
U₁=N₁N₂ +(N₁(N₁+1)/2)-R₁

and

U₂=N₁N₂+(N₂(N₂+1)/2)-R₂

However, it can be shown, by rather tedious but fairly elementary algebra, that
U₂=N₁N₂-U₁.

This is the Mann-Whitney U-test. R₁ and R₂ are the sum of the ranks of the results (i.e. the results of the two groups are pooled and the results given a rank (with tied being the mean of the relevant number of ranks so, if the third, fourth and fifth results are the same, the all get the rank 4 and the next one is 6). I know that R₁+R₂ = ((N₁+N₂)(N₁+N₂))/2 because the sum of the ranks obviously has to add up to the triangular number for the total number of things ranked. But I can't make this be of help in working out why the simpler formula works. It's not really relevant, but I'm intrigued.

*With us Maths was setted from year 9, but for years 8 and 9 Geography was banded with History and Science,** so that there were three supposedly equal upper classes (a,b,c) and two supposedly equal lower classes(d,e). Thus it is likely that people from the top maths set where spread out over the groups a-c. At GCSE, I don't know about those who only took geography, I guess they were set, but 9 of us did both history and geography and so were the one geography set in the free option time so couldn't be setted!

**Due to timetabling constraints like the number of teachers available

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Most Popular Tags

++rowan - 4 uses
adventbookclub - 6 uses
asnc - 6 uses
bike - 3 uses
bikes - 5 uses
candlemas - 3 uses
christiainity - 11 uses
christian - 3 uses
christianity - 46 uses
christmas - 4 uses
collect - 3 uses
cricket - 5 uses
cuc - 2 uses
cymraeg - 6 uses
doom - 4 uses
easter - 5 uses
festivals - 7 uses
film - 4 uses
geeky - 10 uses
gender - 6 uses
gip - 4 uses
harry potter - 4 uses
heresy - 3 uses
latex - 4 uses
library - 3 uses
llf - 3 uses
lsm - 5 uses
mary - 4 uses
mass - 3 uses
memememe - 22 uses
pedantry - 4 uses
plays - 4 uses
politics - 6 uses
printing - 2 uses
randomness - 24 uses
religion - 3 uses
remembrance - 3 uses
requiem - 3 uses
ringing - 6 uses
sermons - 8 uses
serving - 3 uses
ship - 5 uses
social justice - 9 uses
ssj - 9 uses
statistics - 2 uses
thesis - 3 uses
trains - 6 uses
twitter - 3 uses
vocations - 3 uses
work - 3 uses

Flat | Top-Level Comments Only

From:

leonato.livejournal.com

[Using best stats lecturer voice]

...As I understand it, this is saying that if we were just trying to say that there was a difference between the two groups we cannot, but that if we had predicted that one group would be bigger/have done better than the other we cannot. This seems completely stupid because surely it ought to be easier to demonstrate a difference than which direction it is! Saying, we cannot be sure that it's different, but we can be sure it is higher makes no sense. Where have I gone wrong? I've stared at this too much now and am confused...

This is to do with the messy difference between one and two-sided p-values.
Lets say we are testing some number X against a Normal(0,1) distribution. We want to know if X is extreme compared to N(0,1). We define this by saying X is extreme if > 95% of all other possible samples are closer to 0 than X is.

If we know X > 0 say (a directional assumption) then we find the number Z such that 95% of N(0,1) is < Z. This is 1.64. We say X is extreme if X > 1.64

If we don't know X > 0, so X might be negative, we find Y such that the probability of a sample from N(0,1) being less than -Y or greater than +Y is 95% and say X is extreme if X < -Y or X > +Y. Y turns out to be 1.96.

The point being (and aren't you glad I got to a point...) is that defining X to be positive is assuming something is known about X. We then need less additional information to show that X is extreme because negative X has been discounted a priori. Hence we can say X is extreme if > 1.64 rather than > 1.96.

Of course what this actually shows is that naive Fisherian statistics is a load of bollocks ;-)

yrieithydd.livejournal.com

maove [shifts fingers to right keys] naive Fisherian statistics?

caliston.livejournal.com

Welcome to stats. It's not as scary as it seems, honest. :-)

There's an error in your formula for R1+R2, which should be R1+R2 = (N1+N2)(N1+N2+1)/2.

Then if you do the tedious algebra (evaluate U1+U2 and sub in the expression for R1+R2) you'll find it works. As you've written it you get stray N1/2 + N2/2 terms.

Not knowing anything about the Mann-Whitney test (it was GCSE when I last did stats with real numbers), I had to read a bit before I understood what you meant, and found http://faculty.vassar.edu/lowry/ch11a.html useful.

Welcome to stats. It's not as scary as it seems, honest. :-)

Yeah, right! Unfortunately, reading the chapter on the &chi² test has left me slightly more confused than before doing so (having had

emperor and his supervisor explain it. Actually, that's not quite true. The complication is that Excel does the looking up of the probablity for you whereas the book expects you to use a table. What the book hasn't given though is help on explaining the test coherently to an audience more scared of stats than I!

There's an error in your formula for R1+R2, which should be R1+R2 = (N1+N2)(N1+N2+1)/2.

Indeed. That's what I meant to type, but got so lost in <sub></sub> that I lost it and having got a { instead of a > my proofreading was all taken up with sorting that out! (and was time limited by the library closing!)

Then if you do the tedious algebra (evaluate U1+U2 and sub in the expression for R1+R2) you'll find it works. As you've written it you get stray N1/2 + N2/2 terms.
Aaah. The trouble was I started trying to do this before I remembered the formula for R1+R2 and stuck the expressions for U1 and U2 into the simple formula too soon and then once I'd remembered the formula I was stuck because I couldn't get R1 and R2 to be positive on the same side!

Because I was stupid and removed brackets wrongly I think.

Statistics

Statistics

no subject

no subject

no subject

no subject

no subject

Profile

February 2026

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags