enter the fray: our reader discussion forum
Search in:
Advanced
View:FlatThreaded
And still you got the math wrong
by Rrhain
+1 Reply

Two huge, glaring errors in the analysis.

1) Your description of how the group of 10 sexually active women are divvied out to the 100 sexually active men is inaccurate.

If there are 100 women, 90 of whom have never had sex and 10 of whom have had sex, that doesn't mean they've all had the same number of partners. There could be unique pairings of one man and one woman for nine of the women and the tenth woman could then have sex with the remaining 91 men.

Note, this would not change the median or the average, but it would change the standard deviation (and only a little bit of a pun there). It seems you assumed each woman would have an average number of partners. Instead, the number of sexual partners can be distributed any way you want so long as each of the 10 sexually active women have at least one partner.

2) You've equivocated the median and the mode. Yes, the median is the value such that half of the data points are below and half are above. Thus, the median might not exist as an actual data point if there are an even number of data points. In that case, you take the middle point between the two in the middle. That is, if I have $10 and you have $100, the median is $55, an amount neither of us have.

But you brought up the concept of "typical" and that isn't the median. That's the mode. The mode is the most common value in a data set. If there are multiple values that appear just as often, then the group has multiple modes. If we have six people, one with $10, two with $20, and three with $100, then the average is $58.33, the median is $60, but the mode is $100.

Each value is important. But since we know that the average is heavily affected by outliers, we don't rely on that single number to tell us anything. That's where things like standard deviation and variance come in. It lets us know just how varied around the average things are.

The problem with the median is that it doesn't tell you just how much spread there is. If there are three people, one with $49, one with $50, and one with $51, the median is $50. But if those three people have one with $0, one with $50, and one with $100, the median is still $50 and we don't have any way of seeing how varied the population is.

The problem with the mode is that it is simply the most common result. This is the seeming paradox that most people don't have the most common outcome. If we have 101 people, the first 100 of which have that much money ($1 for person 1, $2 for person 2, etc.) and person 101 has $50, then the mode is $50, even though 98% of people don't have the mode.

That's why good studies report mean, median, and mode as well as standard deviation.

the "typical" is certainly NOT the mode
by maroci
No way anyone means mode when they say typical.
Re: the "typical" is certainly NOT the mode
by klrwulf

I wold have to agree with the first poster's argument that typical is Mode or most common. After all, if only one person in a room has $50 (both the mean and median), but 200 people have $10, then the typical person in the room has $10.

Also, if I remember my statistics and research classes correctly, mean or average isn't what is considered normal. Normal (for a bell curve) includes not only the mean, but one standard deviation to either side of the mean.

Re: the "typical" is certainly NOT the mode
by lloyd667

I agree that mode is one plausible meaning of typical. More generally, however, I think what people choose as typical depends on the distribution they have in mind. It's always possible to come up with screwy distributions where the mean, mode, or median are obviously not typical. To make a point, Ellenberg does so in his column.

My favorite example is the Cauchy distribution. It looks rather like the normal distribution: symmetric and bell shaped, but with fatter tails. But, it does not have a mean! Obviously, then, the mean would not be a good measure of "typical". If you saw the distribution graphed out, you would have no problem picking out the typical value. That would be the median and mode, which are equal and at the center of the distribution, just where you would expect to find them.

Klrwulf's memory fails him. A normal distrubution (which is what people usually mean by a "bell curve", but lots of other distributions have a bell shape too, such as the familiar "Student's t" and of course the infamous Cauchy) has a mean, median, and mode which are all the same. Perhaps he is thinking of statistical inference, where, depending on the application, one or two standard error bounds are sometimes used to judge statistical "significance".

Re: And still you got the math wrong
by SlateReader
Thank you, very interesting. One more post like this and I'm going to have to go out and buy a copy of "A Mathematician Reads the Newspaper." (At least I remember the title.) :-) Is that still the best book of its kind?

Re: the "typical" is certainly NOT the mode
by SlateSurfer

>>After all, if only one person in a room has $50 (both the mean and median), but 200 people have $10, then the typical person in the room has $10.

In this example the median would be $10, as well as the mode. The mean is about $10.2. All three of these numbers, in this case, give a fairly accurate representation of the "typical" person, because the population you've described is fairly uniform.

Re: the "typical" is certainly NOT the mode
by RonB52
I'm still recovering from having learned that most people have an above-average number of fingers.
Re: And still you got the math wrong
by Rrhain

There are a lot of books out there that help to make you a more critical reader of information with regard to mathematical silliness. I haven't read the book you mention, but you might want to look at Less than Zero and Innumeracy.

From the latter, a difficult question:

Suppose we have a medical test that is 98% accurate. You take the test and it comes back positive. Can you trust the results of that test?

The answer: There isn't enough information to tell. It depends upon the expected distribution of actual positives in the population at large.

For example, suppose that only 0.5% of the population actually has the trait the test is looking for. If we have 10,000 people (to make the math easy), that means we have 50 people who have the trait.

A 98% accurate test will detect, on average, 49 of those since 98% of 50 is 49.

But don't forget about the false positives. Of the 9950 people who don't have the trait, we expect 2% of them will test positive, or 199.

Thus, if we tested everybody, we'd expect to find 248 positive results, of which only 49 are accurate. That means if you get a positive result, there's only about a 20% chance of it being right.

Thus, we can't just look at the single number of "98% accurate." We have to have some idea of the context in which the test is taking place. And it's because of that problem that medical tests so often are run at least twice: If you get a positive result the first time, you want to run it a second time to see if you get it again since two false positives would be pretty rare.

All too often, though, popular reports of technical material gloss over all this analysis, primarily because they are incapable of doing the analysis which would let them know how things really work. They see that big, flashy number and latch onto it.

View as RSS news feed in XML