![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Mmmm, reading about statistics has reminded me that saying `average' is not technically as there are (at least?) three measures of the average in statistics. Thus I decided that I ought to avoid using the word and explain why to my non-technical audience. In doing this, I found myself confused about what the thing I was referring to as `the average' actually was.
I think I'm being confused because I'm dealing with frequencies. The data that I am analysing are the figures for occurrences of my preposition(s) across the texts of my corpus. My thirteen texts are very different lengths, ranging from 1967 words to 66608 words, though those are extremes, the second longest is 15804 words and the second shortest 3079 words. I have produced the frequencies by dividing the number of occurrences by the number of words and multiplying by 1000 which gives the number of occurrences per 1000 words. I have done this for each of the texts in turn and for the corpus as a whole. It is this figure I have described as the `average for the corpus'. This obviously is not the median or the mode (neither of which make any sense for my data AFAICS) but is it the mean? It was not found by adding up the totals of a set of figures and dividing it by the number of examples. Am I right to just call it the average?
I think I'm being confused because I'm dealing with frequencies. The data that I am analysing are the figures for occurrences of my preposition(s) across the texts of my corpus. My thirteen texts are very different lengths, ranging from 1967 words to 66608 words, though those are extremes, the second longest is 15804 words and the second shortest 3079 words. I have produced the frequencies by dividing the number of occurrences by the number of words and multiplying by 1000 which gives the number of occurrences per 1000 words. I have done this for each of the texts in turn and for the corpus as a whole. It is this figure I have described as the `average for the corpus'. This obviously is not the median or the mode (neither of which make any sense for my data AFAICS) but is it the mean? It was not found by adding up the totals of a set of figures and dividing it by the number of examples. Am I right to just call it the average?
no subject
Date: 2005-10-05 11:33 am (UTC)Actually, maybe you were right in the post, and should say 'frequency'.