It’s Thursday and that means it’s time for another round of Bad Chart Thursday. This week, rather than make fun of a bad chart, I was inspired to write a bit about bell curves and more specifically Sam Harris’ […]
Not about the math, but the literature (Oh, the Humanities!) …
Wasn’t the entire point of These aren’t the droids you’re looking for that they were the droids the Imperial Stormtroopers were seeking? In other words, it was a lie. Is Harris tacitly admitting to everyone who understands Jedi mind-games that he is in fact a Sexist Pig? Do we need to do math to prove it, or can we just take him at his word?
P.S. I hope this is followed up by several hundred comments properly analyzing the tails of the bell curves (i.e. integrating from -∞ to -1 the men’s and women’s estrogen-vibe as a function of the difference in the means for each, then (by counting trolls), determine the best fit. It’s been decades since I’ve done calculus in earnest, but I think we could determine both the estrogen-vibe concentrations and the troll cut-off by making the sum of the integrals and the ratio of them match real-world troll statistics. This program of research should provide the fundamental theorems of the new sciences of statistical trollology and estrogen-vibrology.
Is it possible that Sam Harris made mistakes, that maybe he’s ignorant of the science that says the apparent difference in the genders is mostly cultural/experiential, not genetic, and that he’s not a sexist pig a heart?
Possible? Yes. Likely? No.
The tell? He doubled-down rather acknowledging, and apologizing for, making the original ignorant comments.
The double-down is always the tell.
#1) That doesn’t excuse it.
#2) If it oinks like a pig and rolls in the mud…
#3) Nobody is calling for a BBQ.
He likely is ignorant of the science. But after the response to his initial comments he could have taken the opportunity to rectify this before posting his full response.
However, he seems to prefer to defend his position rather than consider that it could be wrong. To me this indicates that he has internalized these views to some extent.
What is sexism if not internalized — and inaccurate — views about gender?
This seems to be, sadly, almost like a theme song, or a running gag?, or maybe some sort of viral disease, among the self appointed “leaders” of atheism. Self appointed, because, apparently, being there first privileges you to a) never be wrong, b) know how to do the job better, and c) magically know when a problem is real, or just imaginary. I am sure, back in the days of the alchemists, there where similar self appointed “experts” complaining about how the new crowd where totally silly for suggesting that maybe all the mumbo jumbo they where doing didn’t actually work all that well, and maybe they where, like.. missing something, or something. This is just more of the same, from people who have been so “certain” that social issues shouldn’t be part of the movement that they can’t even see the very social issues waving its collective… johnsons in their faces.
I don’t get what your point is. Like, “sexist pig” is just a subjective label some people think applies. You might not, but what’s the practical upshot either way? His ignorance of the science discounting genetics as a cause of gender differences is inexcusable, and therefore sexist, and it’s important to talk about. Why is it important to you that no one call him a pig in the course of discussing it?
I’m not trying to defend Harris or anything, but I think at least some of this article contradicts the point it was originally trying to make. It starts off talking about how men and women are not, psychologically, any different, and then says
“Lived experience does make us women different in ways that are so large that we can see them without the need of a scientific study. We can all see that men tend to like Sam Harris much more than do women. It is far more likely that something about the experiences of men versus women in our culture is the explanation for the differences rather than some psychological difference in our brains.”
Does it matter if the differences Harris was talking about were genetic or cultural? In his own words, he was “talking about a fondness for a perceived style of religion bashing,” which is not necessarily something that stems from a genetic source. Harris was just saying that men and women are different, which this article agrees with. Now, calling it the “estrogen-vibe” may have been somewhat of a misnomer, seeing as it is more of a cultural/experience thing rather than a hormonal one, but there is a difference between the genders.
I guess what I’m trying to say is, this article feels unnecessarily nitpick-ey for a topic that really doesn’t need its nits picked. Just say “yes, actually he is sexist” and you’re good.
Now what he could have said is that women have been rather less interested in attending meetings run by the old atheist establishment because it was a rather hostile sexist environment.
Or he could have just asked the journalist what the evidence was for the original assertion that more women buy his books than men.
Every book bought in this house goes on my amazon account. And this is surely the norm in most houses due to the idiotic way Amazon works. If we bought books separately then we would have to have a separate Kindle for each account. So if you looked at Amazon data it would appear that there is a huge skew in the purchases of books with one person buying 100% and the rest zero.
PS what is an SJW?
SJW stands for social justice warrior, it is used (usually in a derogatory manner) to describe a certain type of online activist. It suggests that their efforts are superficial and overly broad.
There is an attempt to reclaim the idea but I’m not sure it will help.
There is a specific image of a SJW, though. Usually the big problem is the tendency for white people to speak for POC, and they get quite hostile if you tell them they’re wrong about your experiences. (I know Indians who have received death threats over this kind of thing.) There’s also a certain degree of US-centrism that I find irritating.
The problem is that this legitimate criticism is invariably taken over by racists, MRAs, and the rest of the scum of the internet.
I wouldn’t suggest all social justice blogs (especially those that are overly casual) are all equally committed or even correct. Believe me, as someone who spends time on Tumblr I understand the over-reach of the SJ blog. As you suggest, the term has been co-opted by racists and MRAs so as to make it even worse.
Yeah, but it’s pretty bad when you have people who take their memes seriously. One of these days, I’ll dedicate a blog posting on my Facebook page to it.
For me, it’s nothing new; Indians have been tangling with animal rights groups since the 80s. But surprisingly enough, the Ted Nugents of the world sided with the animal rights activists in those cases.
Yep, like mrmisconception said, it’s a slur often slung at online activists and feminists like Skepchick. It’s most often used to construct strawman arguments for why activists are just being outragey for the sake of outrage rather than actually accomplishing anything. Case in point: https://twitter.com/RichardDawkins/status/482781472961859584
I put it in as a bit of a joke (but perhaps a bad one). It’s meant to be read with an eyeroll.
I think most of us got that it was a jab at some phrase that members the New Misogyny use. It got a chuckle from me, even before I saw the explanation.
Next article by Sam Harris: “I’m Not the Not-Bell-Curves-Understanding Pig You’re Looking For.”
Sam Harris, turn in your nerd card. Do you even Star Wars? (Because, you know, those were the droids they were looking for.)
Plus, as of the Clone Wars cartoon, we now have Force gods. Three of them. (There was a fourth, but the recent torpedoing of the Expanded Universe ends that.) But that’s another thing entirely.
So the fourth is no longer with us?
She was only in novels that are no longer canon.
The cartoon is still canon, though. But the three of them are all dead. You can probably find it on YouTube. (The fact that the Holiday Special can be found on YouTube is proof Lucas has little control there.)
I was recently analyzing the results of a GWAS (genome wide association study) on intelligence, where (like all such studies), they make genes and environment independent and additive. I started thinking about how does that affect things when we know that genes and environments interact?
The example I started thinking of was that of stereotype threat.
The classic example is PoC doing poorly because they are expected to do poorly and if they don’t do poorly they call attention to themselves and get beat down. It is pretty well established, even if people object to it.
The premise of GWAS studies is that genes and environments are independent. What that means is if one set of genes does poorly in the same environment that another set of genes does well in, it is the fault of the inferior genes. This is an artifact of the analysis, because gene-environment interactions are assumed to be zero; that is gene effects are independent of environment effects, there is no interaction (in the model, not IRL).
This of course holds for XX vs XY interactions, and has been demonstrated to hold for mathematical ability. In other words, women exhibit stereotype threat and do poorly at math when they are in the presence of people who expect them to do poorly at math because they are women.
Are behaviors in the Atheist and Skeptic community that Sam Harris is trying to rationalize as being du to XX vs XY really due to genetics, or are they do to a genetic-environment interaction, as in differential stereotype threat for XX vs XY?
We could do an experiment. Count up the numbers of death threats, rape threats, and other threats of violence that XX people receive and that XY people receive.
My guess is that the writers on Skepchick have each received at least an order of magnitude more threats of various types (even after going to considerable effort to reduce them) than has Sam Harris.
I would be interested in seeing a graphic of “threats” vs number of X chromosomes for major figures in the Atheist and Skeptical community. It would probably need to be a log scale to keep the numbers on the same page.
Good comment, but I wish you hadn’t equated woman=XX and man=XY. Transphobia can be just as unintentional as sexism.
I meant no transphobia, and apologize if any was perceived. The OP was about “estrogen-vibes” and “essential” characteristics of people (which I appreciate are not genetic). The Sam Harris meme is that the “estrogen-vibe” is genetic and due to elevated estrogen due to more active lady bits.
A graph that includes threats against every category of person would illustrate the extent that privilege shields people from threats. I know that people who are trans receive many threats too.
As a cis, white, straight, male, I can’t remember having ever received a rape or death threat. (no, that is not an invitation to start send them to me).
Maybe someone who knows more statistics than I do could do a multivariate analysis. and include a few dozen categories. I am pretty sure that my demographic; cis, white, straight, male would be in the lowest threat category. If someone could automate that and do it on twitter, you could get a real-time measure of which kind of privilege is trending.
Ask not for whom the Bell Curve tolls, it tolls for thee, Mr. Harris.
Among ethnic minorities, the phrase ‘bell curve’ is particularly hated because of its association with ‘race and IQ’ bullshit.
Jamie, this is an incredibly well-written explanation. I laughed AND I learned. I laughearned. Learghed.
Rebecca, I really would like to see a comparison of rape and death threats received as a function of number of X chromosomes. My guess is that you are the all time leader.
Among the menz, maybe PZ would be the leader, but he is probably 2 orders of magnitude lower.
This would be objective data demonstrating that the “equal environment” hypothesis is fos.
At the risk of sounding like Richard Dawkins, what happens online isn’t the worst death threat an atheist ever received. But since AHA is still a woman, it’s still fully within our hypothesis our hypothesis that men, lacking any proper arguments with which to debate a feminist, will resort to intimidation tactics.
I agree with Rebecca. This is a really well-written piece of work! Jamie, you did a masterful job of intermixing the humor and the statistics. Probably my favorite BCT ever!
Legumes! Am I doing it right?
Hi – a friend pointed me to this post. I am also a data, stats, policy, and economics nerd so the material here interests me.
So obviously I think you’re right that “estrogen-vibe” is a ridiculously vague, stupid quantity. But for the sake of argument let’s say it’s measurable (as you’ve already done in your graphs). I’m not quite understanding how your criticism of Harris doesn’t rely on some really strong assumptions about that distribution. I can think of three:
1. You assume it’s normally distributed across the population.
2. You assume the dispersion across the population is the same, and
3. You assume the cut-off for Harris-obsession is pretty far in on the distribution.
None of these seem like the obvious assumptions to make, but if any of them are tossed Harris could be exactly right (again, assuming the underlying theory of estrogen-vibe is sensible and measurable!).
It’s not so much that I disagree with Harris. I don’t know enough about behavioral differences across sexes to have a strong feeling on it one way or another. I’m just confused about your argument. There’s nothing here that really demonstrates Harris is misunderstanding anything.
^ The above should read “it’s not so much that I DON’T disagree with Harris”.
Granted, that’s functionally equivalent to “it’s not so much that I disagree with Harris”, but I wanted to signal that my hunch is that there’s a lot wrong with what he’s suggesting 🙂
Hi, welcome to Skepchick.
Since you are new here I thought I would let you know that Bad Chart Thursday is a running gag. Jamie uses it to show how you can manipulate perceptions using bad data represented poorly on a chart.
Most of the time she tackles poorly thought out arguments being passed off as truths but here she chose a topic that was already being discussed and regarding a non-existent thing called estrogen-vibe. It was a tweak at Sam Harris’ nose as much as a dissection of bell curves.
1. Assuming a normal distribution is not a poor first attempt to understand a laughably false quantity such as ‘estrogen vibe’. (yes the assumption of normality should be tested as soon as a reliable measurement can be made, but I expect that to take a long time like… forever.) There are also ways to process data to obtain a normal distribution even on non-normally distributed data, but without knowing the fictitious distribution it is difficult to compensate.
2. Since the ‘estrogen vibe’ is fictitious, I would imagine the dispersion is reasonably comparable.
3. Ahh, in this point you may have found the Achilles heal of a fabulously written piece. I assume the author was generous with the placement of Harris-obsession cut-off in deference to his outsized ego. But the author is likely saved by the fact that it is a safe assumption that the separation in means between the male in female distributions of ‘estrogen vibe’ are fairly identical (i.e. non-existent)
I hope my understanding has been of service.
Could you suggest some curves for being sexist pigs? Does it have a strong gender bias, or is it, like cognitive differences, minor? Is it related to chromosome ownership? What are the actual causes of these differences?
If Dawkins, Harris, Hitchens, Shermer are all sexist pigs, of varying degrees, and fall in different areas of their bell curves than, say, PZ Myers, do you have any suggestions why this might be?
I think it has to do with biological determinism. If you start to think “atheism is Darwin” and people tell you over and over “evopsych is Darwin”, then by the transitive property, you get “atheism is evopsych”.
But then you have the simple fact that biological determinism really has little to do with how religious one is; after all, skull-measuring antedates Darwin (the link in this syllogism) by quite a few years.
I LOVE this chart. It summarizes everything about Harris.
I read it more as Sam Harris claiming that he’s only liked by men who are more than a couple of standard deviations off the norm. If instead of looking at the population close to the mean you look at people who are, say, more than 3 standard deviations from the norm a small difference between the genders will produce a disproportionate imbalance.
Whether you interpret that as “only the trolls”, “only the estrogen-haters” or “just the wacko fringe”, I find it hard to see as complimentary to Sam. But maybe to him it is – “I speak to my fellow gynophobes, the rest of you can GTFO”. Why he’d bother saying that to a wider audience I’m not sure, but I expect he has a rigorously logical explanation that my estrogen-addled brain just can’t grasp.
I have read a couple of responses to Sam Harris’ recent statements, and I have a few comments.
1. Everyone knows he is tactless. I am an atheist, male, and I am pretty straightforward about condemning nonsense when I see it. I do however recognize the point at which your attitude and demeanor is detrimental to your cause. I believe Sam Harris is beyond that point, and this whole situation is just one example.
2. His response was poor. From saying: “I have a wife and daughters, how could I say something sexist?” To his general misunderstanding of why people are mad at him. This is similar to what Richard Dawkins recently went through where he was backpeddling through poorly thought out Twitter posts.
3. What I will say however, is that I do have a distaste for the way non-traditional social science models are attacked for being intrinsically sexist.
A biological justification for discrimination (whether it be racial or sexual) has been a tool in the asshole’s arsenal for centuries. It has been used to subjugate people, justify horrendous genocide or slavery, etc. It is encouraging to see that any possible attempt to imply similar value judgments based on sex or race is quickly attacked, as it rightfully should be.
But I don’t think that is what is happening here.
The leap from Sam Harris saying: “I think women find my abrasive style more distasteful than men do” to “ergo, women are gentler, and should be relegated to submissive social roles” is not a leap that Sam made.
It is understandable why that leap WAS made by some people however, as patriarchal societies have (and still do) use that very argument constantly as justification for sexual discrimination and segregation.
To me he was attempting to make a large scale observation, much like someone observing differences is fan bases. Both men and women are often turned off to his ideas because of his abrasiveness, his observation was that based on attendance to his talks and interactions he has, he believes this to be the reason why he sees more men than women.
So did he make conjecture based on anecdotal observations? Yes. But what if he is right? Simply observing an empirical generalization to be true in one case doesn’t mean he is sexist.
I tend to dislike the majority notion that humans are tabula rasa, and all gender roles are products of socialization and culture. Because if you look at other primate species (and other animals in general), it becomes clear that having the female and male brains be 100% identical and function identically makes no sense in evolution. The value judgments on superiority and inferiority ARE socialization, and ARE a product of culture, that is without a doubt. I think the massive variety of cultures and personalities shows how socialization can alter innate positions to nearly any degree, but the ability for socialization to “move the needle” doesn’t say anything at all about where that needle starts.
If population level behaviors have both biological and social contributing factors, only addressing a social factor is ignoring part of the problem and therefore missing a more effective solution.
4. To comment on her link, she links a pop science book (a good one) as a means of refuting observations made by evolutionary psychologists. She then backtracks and says
“and when differences are found, it’s unclear whether they are themselves innate or a product of our culture and experiences. If differences exist at all they are quite small and can only be seen in the aggregate.”
That is my whole point. Evo-psych is only around 25-30 years old as a formal science discipline, and the nature of Evo-psych (when specifically applied to humans) makes it really really tricky to actually prove anything, because we can’t observe human evolution and look at the brains of previous generations. So we have to rely on population level empirical generalizations and comparative biology to make predictions about our own psychology. You look at the concepts of anisogamy and fitness variance among sexes. In these species, whichever sex (usually male, but often female) has a wider fitness variance is more aggressive as it has to compete for mates, whereas the sex with the more valuable (in reproductive terms) sex cell has a narrow fitness variance and does not benefit as much from competing. So looking at population level statistics on violent crime and aggressive behavior, you can compare humans (sexually dimorphic, anisogamous, where males have higher fitness variance) and say: “well if it was another species I would say this completely jives with evolutionary biology that the males of this species are more aggressive.”
Now the complicating factor is that humans have an immensely intricate social and cultural structure, which can completely override any and all biological impulses, even the most basic ones (sexual reproduction, hunger, self-preservation.) So while socialization can completely erase any effect on an individual (suicide cults, unics, celebates, suicide bombings, etc. etc.) for the average person, the effect of socialization is probably not that extreme.
So Evo-psych is a really difficult field because it’s extremely difficult to tell where biology ends and culture and socialization begin, but you can be certain in saying that biology has a non-zero effect on behavior. Once everyone agrees on that, the conversation can be more open about the various degrees of influence that each party plays.
Because if you look at other primate species (and other animals in general), it becomes clear that having the female and male brains be 100% identical and function identically makes no sense in evolution.
Bit of a blanket statement, there. I can’t speak for other species, but for human beings it runs contrary to biology.
So while socialization can completely erase any effect on an individual (suicide cults, unics, celebates, suicide bombings, etc. etc.) for the average person, the effect of socialization is probably not that extreme.
You might want to look into WEIRD.
in the Muller-Lyer illusion, most people in industrialized societies think line A is shorter than line B, though the lines are equally long. But in small-scale traditional societies, the illusion is much less powerful or even absent. […]
Textbooks also frequently describe people as valuing a wide range of options when making choices, being analytical in their reasoning, being motivated to maintain a highly positive self-image, and having a tendency to rate their capabilities as above average. Again, the review article contends, this picture breaks down for people from non-WEIRD societies: These groups tend to place less importance on choice, be more
holistic in their reasoning, and be less concerned with seeing themselves as above average.
While the authors of the actual paper are careful not to deny biological explanations, they do state:
Thus, our thesis is not that humans share few basic psychological properties or processes; rather we question our current ability to distinguish these reliably developing aspects of human psychology from more developmentally, culturally, or environmentally contingent aspects of our psychology given the disproportionate reliance on WEIRD subjects. Our aim here, then, is to inspire efforts to
place knowledge of such universal features of psychology on a firmer footing by empirically addressing, rather than a priori dismissing or ignoring, questions of population variability.
Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. “The weirdest people in the world?.” Behavioral and brain sciences 33.2-3 (2010): 61-83.
Ah, those charts take me back. Anyway, I can’t help but share my favorite scientific paper on sex differences.
The gender similarities hypothesis stands in stark contrast to the differences model, which holds that men and women, and boys and girls, are vastly different psychologically. The gender similarities hypothesis states, instead, that males and females are alike on most— but not all—psychological variables. Extensive evidence from meta-analyses of research on gender differences supports the gender similarities hypothesis. A few notable exceptions are some motor behaviors (e.g., throwing distance) and some aspects of sexuality, which show large gender differences. Aggression shows a gender difference that is moderate in magnitude.
It is time to consider the costs of overinflated claims of gender differences. Arguably, they cause harm in numerous realms, including women’s opportunities in the workplace, couple conflict and communication, and analyses of self-esteem problems among adolescents. Most important, these claims are not consistent with the scientific data.
Hyde, Janet Shibley. “The gender similarities hypothesis.” American psychologist 60.6 (2005): 581.
Harris then posted a response to all the criticism where he assures us all that he is totally not a sexist in a piece he calls “I’m Not the Sexist Pig You’re Looking For.” Sure, some of his statements may seem sexist, but according to Sam Harris it’s totally not sexist if it’s based in scientific fact and everyone just knows that science says ladies don’t like Sam Harris because of their estrogen-vibe.
This makes me wonder what the heck Harris thinks sexism even *is*. I mean, his remarks are blatantly sexist, and if he doesn’t understand that, I question how much he even “gets” sexism.
Sure, Harris is wrong and sexist, that’s why the whole concept of targeted audiences is just bogus ::eyeroll::
??? What the hell are you trying to insinuate? It may help to actually flesh out your ideas and opinions, rather than leaving very vague, rather empty comments. YOU SURE SHOWED US!
What I’m “insinuating” is that this is much ado about nothing. Why are there films labelled “chick flicks”? Why do men buy Tom Clancy novels? Why do politicians draft policy to “target women voters”? Why to market analyzers segment products by demographics?
Because different things appeal to different groups of people, in aggregate, as Harris pointed out. And BTW, his use of “critical” has been equivocated to mean “critical faculties” whereas he meant it as tenor of debate. Whether you think that’s sexist or not, at least get his meaning right to begin with.
Because people are fucking apes. That doesn’t mean that we shouldn’t try to change things.
Yeah, because as everybody knows, marketers and corporations are manged by emotionless robots which are immune to cultural conditioning, right?
It doesn’t matter what their conditioning is, the fact is that if the targets didn’t exist their performance would suffer, the product wouldn’t be bought, the movie not watched, the candidate not voted into office. The market, as it were, would have spoken. Now, you’re free to say that “the market” operates on motivators that are determined both by nature or nurture. Incidentally, I think that’s just what Harris said. Or, you’re free to say it’s just nature or just nurture. It’s your opinion to give. Did someone solve the nature/nurture debate while I was sleeping?
I just don’t think it’s right to accuse someone of sexism because, in his opinion (which is what it was, after all) various demographics are attracted to different things and that it’s partly based on nature. He may be wrong! That’s kind of how opinions work.
I had an idea on how to help with all the misogyny, hate and death threats on line, but it will take cooperation from ad providers like Google.
The idea is that when a site posts a link or a threat or hate speech against a specific person, that the ad provider to that site diverts ad revenues to that person for all views of that hate speech.
This wouldn’t be a big deal for sites that are not using hate speech to generate clicks, but if a site is using hate speech to generate clicks, it would backfire by diverting the click revenue to the victim of their hate speech (or the victim’s designated charity).
It would also give feedback as to how much hate speech there is going on, and from where.
You know, Sam Harris has gotten lots of flak for this “estrogen-vibe” comment, as though it were some sort of vague, ill-defined concept with no scientific basis. Hogwash, I say! Sam Harris is a scientist, you guys, he wouldn’t do something so disingenuous as to refer to endocrinology to make his argument seem grounded in biology while in fact only using “estrogen” as a metaphorical stand-in for culturally prescribed feminine traits.
“Estrogen-vibe” has an obvious meaning. The bonds in every molecule, including the estrogens, have natural vibration frequencies. These natural vibration frequencies are important for understanding the biochemical properties of the molecule. Sam Harris is obviously referring to a recent paper which characterized the vibration frequencies of two of the estrogens from quantum mechanical calculations. See, you guys, it’s science! Specifically, chemistry.
I’ll admit to some puzzlement as to how the vibration frequencies of the estrogens relate to the rest of what Sam Harris was talking about, though…
This comment … A+
This isn’t a recent paper, it is from 2008.
Any paper published during the time I’ve been working on my Ph.D. is recent as far as I’m concerned. Let me have this, I need to believe it to keep from despair.
Oh, my. All those boob shaped graphs.
Men everywhere are being corrupted.
Boob-shaped? Surely boob-shaped graphs would be much more platykurtotic than those.
Although I suppose it all depends how lonely they are.
Sam Harris is going to be on Air Talk with Larry Mantle on KPCC (http://220.127.116.11/programs/airtalk/) in Southern California on Monday. It’s a call-in show that comes on from 11am – 1pm PDT on public radio. Maybe someone in the area who’s more well-spoken than I would like to call in? The number’s 866-893-5722.
I found this article funny but I must admit I’m a bit lost by your main point. What exactly does Sam Harris not seem to understand about the statistics of Bell Curves? It seems more like your saying that his estimates of the distribution of ‘estrogen vibe’ in the population is inaccurate. Isn’t that different from saying he doesn’t understand Bell Curves?
It strikes me that this article is intended as a polemic. However, on the off-chance that you did want to say something serious about statistics and probability theory, allow me to point out a few misconceptions:
1. You say “Many researchers have searched for cognitive and psychological differences between the genders and taken as a whole, researchers have found little to no difference in cognitive ability or psychological differences between men and women and when differences are found, it’s unclear whether they are themselves innate or a product of our culture and experiences.” The second part of this sentence is indeed true, but the first part is false. (Or, let me say – more cautiously – that it is not obviously true.) As far as I can see, most empirical studies do find that gender is a statistically significant factor in mathematical and language abilities, for example, even after controlling for cross-national variations. Briefly, boys tend to score higher in mathematics, on average, and the variance of their mathematics scores is higher. Moreover, the tails of the distribution of mathematics scores tends to be dominated by boys. The reverse appears to be true, when it comes to language scores. Also, these results are apparently not explained by traditional proxies for gender equality. (Some studies dispute these findings, but they seem to be in the minority.)
2. You say “The only way you would be able to see differences between the psychologies of men and women is if those differences were quite large.” Here you appear to be talking about differences between the means of the two sub-populations. This is not generally true, since it depends on the standard deviations of the two sub-populations. Specifically, the standard error of the estimate for the mean of a normally distributed population is proportional to its standard deviation. Hence, if the means of the two sub-populations are different enough, or if their standard deviations are small enough, you will be able to reject the null hypothesis that their means are equal, using only a modest sample. However, you won’t be able to reject the null hypothesis if the means of the two sub-populations are very similar, and their standard deviations are large. I agree that this is likely to be the case for most data where the sub-populations are selected on the basis of gender.
3. It strikes me that your biggest error is to think that any statement about gender-based cognitive/psychological differences must be a statement about the means of the distributions of some trait. (This is borne out by your illustrations, which depict two distributions with the same variance, but shifted means.) Actually, you are much more likely to see differences between the two distributions when you compare their higher-order moments. You would especially like to compare the tail probabilities for the two distributions. Tests that do so require modest samples, compared with tests that compare sample means. That is to say, if it is the case that males inhabit the tails of the distribution for some cognitive ability/psychological trait, then it is likely that statistical tests with reasonable sample sizes will be able to detect this.
4. One final point where I think you’re mistaken. Let us say, for the sake of argument, that Harris is right, in that men are “more attracted to this style of communication than women are.” Let us also assume that the standard deviations of the extent to which men and women are “attracted to this style of communication” are large. That is to say, men are – as Harris claims – generally more likely to prefer his style of communication, but there is a substantial variation in this preference across the populations of both men and women. Then two things are likely to happen: (i) Estimates of the mean preferences of men and women won’t reveal any gender-related differences (due to the problem with the standard errors of sample means mentioned earlier); and yet (ii) There will be more men than women attending Harris’ talks (due to the fact that the true – but unobservable – means of the two populations are different).
Actually… The studies show that “math ability” and the like are heavily influenced by the conditions of the testing. Literally, reminding women that they tend to do worse than guys causes them to do worse, and reminding men that they “do better”, causes their scores to rise slightly. Do both, and you end up with a huge gap. Language is an even dumber one. Something like 90% of the “best” linguists on the planet are male, yet, supposedly, women as better at language. This doesn’t make logical sense, at all. But, honestly, it might make sense if you recognize that men are more likely to get hired, so may be more promoted as linguists, where men, as a general trend, just from my own observations, tend to think they have “better things to do” than waste time doing a lot of reading, i.e., unless they are linguists they don’t actually bloody use their language skills a lot, unlike readers.
Just those factors, by themselves, are enough to skew the results, and its not like you can control for them, unless you find tens of thousands of men and women, who all happen to read the same amount, and things. Same problem with math – men barely use those skills, and women…. yeah, no inherent bias, which throws a monkey wrench into the works when studying the problem…
Literally, reminding women that they tend to do worse than guys causes them to do worse, and reminding men that they “do better”, causes their scores to rise slightly.</blockquote
Not only this, but merely reminding women that they are women (e.g., by first asking questions that encourage them to reflect on their gender) can cause women to do worse on math tests, so strongly entrenched and internalized is the cultural notion that “girls are bad at math”. (Honestly I’m loath to type that phrase even in scare quotes, considering how easy a message it is to reinforce. Girls: you are great at math. Or you can be, if you choose it!)
Girls: you are great at math. Or you can be, if you choose it!
That’s what I keep telling my daughter but I don’t think she’s buying it.
A female friend of mine in college, who was extremely smart, studying chemical engineering, and in an elite honors program, once told me that she consciously chose to act less intelligent than the men around her, particularly ones she was romantically interested in, so she wouldn’t intimidate them and they’d like her better. As a naive male freshman with feminist sensibilities but little real-world experience, it was an eye-opening moment for me to see first-hand how a woman I liked and respected might see her own intelligence as an impediment rather than a strength, at least in some circumstances, in ways that would never be true for me. Of course, in her case she was consciously aware of what she was doing, but clearly this decision process acts subconsciously as well, in people with or without my friend’s level of self-awareness.
Your daughter’s lucky to have a parent who lets her know she doesn’t have to be bad at math in order to be accepted. You’re tackling a lot of cultural inertia there!
kagehi wrote “Something like 90% of the ‘best’ linguists […] are male, yet, supposedly, women are better at language. This doesn’t make logical sense at all.”
No, unfortunately you’re labouring under a misapprehension. If the distribution of languistic abilities for males has a higher variance than the distribution for females, then it is quite possible for men to have poorer language skills on average AND for the most gifted linguists to be men.
This is in fact a very common statistical phenomenon. For example, active fund managers perform worse than passive fund managers, on average, yet the best performing funds over any period are all active. Why is that? Active funds hold riskier portfolios, which means that the variances of their returns are higher than is the case for passive funds.
Of course, such a statistical explanation need not provide the correct interpretation of the evidence you cited – your explanation in terms of employment bias may well be the correct one. But the your evidence does not expose an inherent logical problem with an explanation that invokes inherent gender-based differences in linguistic skills (as loath as we may be to countenance it).
No, you’re “labouring under a misapprehension,” which is that just because 90% of people working in linguistics are men that they are somehow “the most gifted linguists.” The point kagehi is making that you seem loathe to acknowledge is that there are structural barriers in place that set up these differences, and in many cases exaggerate these differences, which cannot be attributed to some intrinsic difference between men and women. So it’s not that somehow magically the most gifted people in linguistics happen to be men and they are filling those jobs, it’s that the way academia and education more broadly is structured sets up a situation where men are hired for positions more than women. You are confusing the quantity of people in a particular position with the quality of their talent.
You should probably reread my last message. I don’t claim that there are not “structural barriers” that create and/or exaggerate gender imbalances; I specifically allow that such an explanation for the existence of gender imbalances may be correct. My point is simply that kagehi’s inference is incorrect. In particular, I show that it is possible for average female linguistic ability to exceed average male linguistic ability AND for the top 90% of all linguists to be male. So, kagehi’s claim that these two observations don’t “make logical sense” when taken together is false. This is simply a fact of probability theory.
Now, your preferred explanation for the phenomenon described by kagehi may correct. But, the data he provided does not establish that. Of course, the data also fails to establish that this particular gender imbalance has a biological cause.
Seems to be a semantic argument here, is this better Hardy – there is no automatic reason to assume that such a bias is real, other than its prevalence in academia, but there is vast amounts of evidence implying a likely alternative explanation, which is sufficiently pervasive across all disciplines and skills, to suggest that its **unlikely** to be an accurate conclusion.
Since you object to the suggesting that its just pure BS, without going into why.
The quote I start my blog post on Theory of Mind with is:
In any great organization it is far, far safer to be wrong with the majority than to be right alone. — John Kenneth Galbraith
Pretty obvious that if women get death threats for being good at math, it is safer to pretend to suck at math. The easiest way to pretend to suck at math is to actually suck at math.
@Hardy Hulley: No, look, I agree with Kagehi – you need to show the data for this. Yes it is theoretically possible to have different shaped distributions as you say, but male & female distributions for the vast majority of attributes (constructs) are similar.
@Jack99: If you agree when Kagehi says “Something like 90% of the ‘best’ linguists […] are male, yet, supposedly, women are better at language. This doesn’t make logical sense at all,” then I’m afraid you’re wrong. It is quite possible for women to perform better than men, on average, and yet for the best performers to be men. This is because the higher-order moments of the distributions of language skills for men and women are more important for determining the right-hand tail of the overall distribution than the means of the individual distributions. This is not my opinion, it is a (simple) probability-theoretic fact.
Note that I have not claimed that gender *is* an important determinant of language ability; I’m simply pointing out that two facts adduced by Kagehi are not mutually inconsistent. In other words, Kagehi has drawn a false inference (which he/she shouldn’t feel bad about, because many non-statisticians make the same error).
Okay, so what does the data say? Well, the latest analysis, based on 10 years’ of PISA data finds persistent and statistically significant differences in both the language and mathematics gender gaps. The sample is important, because it is international, and therefore accounts for the effects of gender-based policy variations across different countries. Moreover, since it is based on tests of 15 year-olds, which is the highest age of mandatory schooling across all participating nations, the PISA data also accounts for possible biases associated with variations in school participation rates across different countries.
The main findings are:
1. In mathematics, boys perform better than girls, on average; however, the left-hand tail (i.e. the worst performers) reveals little difference between boys and girls, while the right-hand tail (i.e. the best performers) is dominated by boys.
2. In language, girls perform better than boys, on average; however, the left-hand tail (i.e. the worst performers) is dominated by boys, while the right-hand tail (i.e. the best performers) shows little difference between boys and girls.
3. In the cross-section (i.e. when comparing the results across different countries), the gender gaps for mathematics and language are inversely related (i.e. countries that manage to decrease the mathematics gap increase the language gap, and vice versa). Interestingly, there are no exceptions to this pattern!
Point 3 is extremely important from a policy point-of-view, since it suggests that differences in performance between boys and girls are not corrected by gender equality and empowerment programmes. In fact, the data suggests that countries with such programmes exhibit higher gender gaps in mathematics.
You can find a video discussion of the results at https://www.youtube.com/watch?v=m9WxvT-82Xg. There you will also find a link to the published paper (which is available for free). It appeared in the scientific journal Plos One in 2013.
The problem with statistics, even accurate ones, is that they can lie, while telling the truth. For example – how many of those “countries” are female dominated, or neutral on gender, instead of part of the vast majority, which have been, and continue to be, male dominated? This is the critical problem with trying to measure such things. There is literally no way, at all, to create a situation in which to collect such statistical data, in which there is not **already** a bias in the data set. Its like trying to study how wet something can get, when rained on, while 50 feet under water. It doesn’t matter how you collect your “statistics”, they are never going to be accurate.
To put it simply, to acquire an accurate assessment of the *real* differences, you would have to do the unethical and immoral – remove thousands of people from society, into a lab, from birth, and have them grow up with “only” those skills you are testing for, while removing any possible accidental biases that might arise from literature source (in the case of language, for example), as well as any, and all possible other sources, by which they might be prone to bias their own behavior, or have it biased, by the expectations of outside influences.
There is no way to collect statistics, from any kind of meaningful category of nations, without running head first into the **preexisting** cultural biases, which, by existing, alter the baseline, right from moment one, into a framework that would, how ever unintentionally, alter the perceptions, and expectations, of the very people you want to test.
Show me how you **ever** successfully correct for that, without the inherent margin of error that it causes, presuming you can even reliably define the margin of error, without a neutral baseline to start with, and.. maybe I will agree with your assessment. But… the actual evidence suggests that the bias persists, all across the board, even as the presumption of just what the statistics **should show** has drifted closer and closer together, narrowing the perceived gap.
The point being, there may be one, but you first have to show that, in fact, that gap is real, not artificially induced, by the continued perceptions and assumptions, which plague every single one of the nations in the studies. Until you can show that the data is not, in effect, poisoned at the well, you have no grounds to claim that the differences detected are “innate” characteristics, instead of statistically anomalies, arising from uncontrolled, nah.. uncontrollable variables. And, none of these studies are capable of showing this to be the case, and past experience from prior “studies”, which failed to even acknowledge such bias, have been rendered worthless, because “culture” changed, and made the assumptions change with them.
Your hubris is in assuming that they won’t change again, that you have somehow controlled for the uncontrollable, *and* that time will some how prove out the differences as real, and not a product of a nearly universal set of cultural assumptions, which are prevalent even among people that insist they have left them long behind, but who, never the less, cannot escape the influence of the wider culture, which has not given them up, and the influence of which cannot be avoided entirely (and perhaps not even significantly).
@kagehi said “Until you can show that the data is not, in effect, poisoned at the well, you have no grounds to claim that the differences detected are ‘innate’ characteristics, instead of statistically anomalies,…” Please don’t put words in my mouth – I never claimed the results of the study in question were evidence of “innate characteristics.” That is your interpretation.
“The point being, there may be one, but you first have to show that, in fact, that gap is real, not artificially induced, by the continued perceptions and assumptions, which plague every single one of the nations in the studies.” You’re clutching at straws, my friend. When somebody presents empirical evidence of this sort that you wish to refute, you have the following options: (i) replicate the study, and demonstrate that you don’t obtain the same results; (ii) identify a methodological flaw in the analysis; or (iii) explain exactly what sort of bias could bedevil the results, and demonstrate it. Vague nonsense about “…continued perceptions and assumptions…” is just pseudo-science.
In truth, the results obtained by the study in question make it quite robust to claims of bias. In particular, what type of bias could generate a negative relationship between the gender gaps for language and mathematics? Be specific if you want to tackle that question – no “culture” mumbo-jumbo.
Unfortunately, the rest of what you wrote only reveals a poor understanding of statistical inference. For example, the stuff about “baselines” is completely irrelevant, and implies that you really don’t understand the type of analysis conducted in the study. I suggest you actually read the article, and pay attention to the authors’ interpretation of the results, and their resulting policy recommendations. You may find yourself agreeing with them.
Just going to say this, since Hornbeck already explains it way better that I can. If you can’t determine, or factor for, or control, the variables that confound you statistics, then your statistics are useless when determining what underlies the system.
Your own challenge to me is, absurdly, “Of course not, the statistics are only showing the current conditions of the system.” – i.e. “I never claimed the results of the study in question were evidence of “innate characteristics.”” But, we are not talking about whether or not the current state exists, or is real. We are talking about whether or not there are innate reasons, base purely on the gender of the individuals, without any other confounding factors, that result in the gaps seen.
My argument is, in a nutshell, that its bloody meaningless to talk about how likely it is that your car will get wet in the rain, if your “statistics” are based on a community where 90% of the population walks every place, and all of their cars remain parked in a garage 24/7, something like 200 out of 365 days a year. Of course the statistics are going to say, “Cars don’t get wet when it rains.”, if that is the conditions under which you are bloody collecting the evidence. Only, what we have here is the opposite situation, **no one** is keeping their car in a garage, so, of course, the statistics are saying, “When it rains, everyone’s car gets wet.” And, you actually seem to think this means anything, when the question is, “What if it was parked in a garage?”
Or, in other words, the questions isn’t, “Do these biases exist, period.”, which you rightly say they do, but, “Would they still exist, if you changed the bloody variables, so that the whole entire system wasn’t stacked against women, from how much they get paid, compared to men, to what they are told, via the mythologies of society, about their math skills, to how they are told to act, to think, to react, etc. If you removed the variables, what would be the result?”
All you do is keep babbling, “But…. with the variables still there, there is this huge gap!”
No one cares, because its not about the state of the system, as it exists, its about the state of the system, without the confounding variables. There is clear evidence that, when removed, the “gap” starts disappearing, but we have no way to completely remove the variables, so we have no valid statistical data to say if the gap **is** still there, when/if you remove all of the external factors. But, there is sufficient evidence to suggest that, with what ones we **can** remove dealt with, the gap all but vanishes anyway. And then, you show up again, and start rambling about, “Yes, but, when we flat out refuse to adjust for, or remove those variables, at all, there is this massive gap!!!” Argh!!!
1. In mathematics, boys perform better than girls, on average
Meta-analytic findings from 1990 (6, 7) indicated that gender differences in math performance in the general population were trivial, d= –0.05, where the effect size, d, is the mean for males minus the mean for females, divided by the pooled within-gender standard deviation. […]
Effect sizes for gender differences, representing the testing of over 7 million students in state assessments, are uniformly <0.10, representing trivial differences … Of these effect sizes, 21 were positive, indicating better performance by males; 36 were negative, indicating better performance by females; and 9 were exactly 0. From this distribution of effect sizes, we calculate that the weighted mean is 0.0065, consistent with no gender difference …. In contrast to earlier findings, these very current data provide no evidence of a gender difference favoring males emerging in the high school years; effect sizes for gender differences are uniformly <0.10 for grades 10 and 11 …. Effect sizes for the magnitude of gender differences are similarly small across all ethnic groups …. The magnitude of the gender difference does not exceed d= 0.04 for any ethnic group in any state.
So much for that part.
the left-hand tail (i.e. the worst performers) reveals little difference between boys and girls, while the right-hand tail (i.e. the best performers) is dominated by boys.
Greater male variance is indicated by VR > 1.0. All VRs, by state and grade, are >1.0 [range 1.11 to 1.21 …]. Thus, our analyses show greater male variability, although the discrepancy in variances is not large […]
For whites [in grade 11 in Minnesota], the ratios of boys:girls scoring above the 95th percentile and 99th percentile are 1.45 and 2.06, respectively, and are similar to predictions from theoretical models. For Asian Americans, ratios are 1.09 and 0.91, respectively. Even at the 99th percentile, the gender ratio favoring males is small for whites and is reversed for Asian Americans. If a particular specialty required mathematical skills at the 99th percentile, and the gender ratio is 2.0, we would expect 67% men in the occupation and 33% women. Yet today, for example, Ph.D. programs in engineering average only about 15% women.
Hyde, Janet S., et al. “Gender similarities characterize math performance.” Science 321.5888 (2008): 494-495.
Ok, so there is some evidence for variance, but it’s not on the scale that would explain real-world gender disparities and there might be a strong cultural effect driving it, instead of biology.
2. In language, girls perform better than boys, on average
Didn’t I mention this earlier? [scrolls up] Oh, no, that meta-analysis glossed over the detail. Time for a fresh one!
Located 165 studies that reported data on gender differences in verbal ability. The weighted mean effect size was +0.11, indicating a slight female superiority in performance. The difference is so small that we argue that gender differences in verbal ability no longer exist. Analysis of tests requiring different cognitive processes involved in verbal ability yielded no evidence of substantial gender differences in any aspect of processing.
Hyde, Janet S., and Marcia C. Linn. “Gender differences in verbal ability: A meta-analysis.” Psychological Bulletin 104.1 (1988): 53.
I don’t see any mention of variability there, alas, but I will note that while the general idea that men exhibit more variance than women has been around for a hundred years, the magnitude of the variance seems inversely proportional to the sample size, and there are a few external factors that can have the same effect. Boys tend to disproportionately drop out of school, for instance, which will artificially boost variability (as low-scorers will be encouraged to stay, while high-scorers desire to stay).
So while that might be a nice study, when you gather together all studies you get a different picture.
@Hj Hornbeck: You quote extensively from Janet S. Hyde, et. al., “Gender similarities characterize math performance,” Science, 321:94-95, 2008. Unfortunately, you probably haven’t chosen the best article upon which to base your case.
1. First, the data obtained by Heyde et. al. (2008) appears to be a snapshot of NAEP scores for 10 US states (presumably for one year – they’re not clear about this). By contrast, the PISA data is a panel, where the time-series covers four years (2000, 2003, 2006, and 2009), and the cross-section consists of 75 countries. You will struggle to challenge the findings based on such a broad dataset, by appealing to results of a much narrower study.
2. You quote the following passage from Heyde et. al. (2008): “Meta-analytic findings from 1990 (6,7) indicated that gender differences in math performance in the general population were trivial, d=–0.05, where the effect size, d, is the mean for males minus the mean for females, divided by the pooled within-gender standard deviation.” Curiously, you omit the very next sentence: “However, measurable differences existed for complex problem-solving beginning in high school years (d=+0.29 favoring males), which might forecast underrepresentation of women in science, technology, engineering, and mathematics (STEM) careers.”
3. You quote the following passage from Heyde et. al. (2008): “Effect sizes for gender differences, representing the testing of over 7 million students in state assessments, are uniformly <0.10, representing trivial differences…" Here I'm concerned about how the authors interpret their results. The statistic they're referring to is Cohen's d-statistic (I apologise if I'm telling you something you already know), and they cite Cohen's book, which offers a heuristic to the effect that d-values less than 0.2 are small. Sure, but that only deals with economic significance; what about statistical significance (which is probably more important in this case)? The authors are completely silent on this issue.
4. You quote the following passage from Heyde et. al. (2008): "Of these effect sizes, 21 were positive, indicating better performance by males; 36 were negative, indicating better performance by females; and 9 were exactly 0. From this distribution of effect sizes, we calculate that the weighted mean is 0.0065, consistent with no gender difference…." This looks like poor methodology. First, the authors had data for 10 grades from 10 states, yet they've calculated only 36+21+9=66 d-statistics; you would expect 100 such values. Second, they report that the weighted average of the individual d-statistics is 0.0065. However, averaging d-statistics across completely different distributions is pretty meaningless. To see why, note that girls and boys may well develop at different rates from Grade 2-Grade 11, so that overall test score distributions could vary substantially over time. On top of that, the mathematics tests they write at different ages are completely different. I can't see how to interpret a composite statistic constructed as a weighted average across heterogenous distributions.
5. You quote the following passage from Heyde et. al. (2008): "In contrast to earlier findings, these very current data provide no evidence of a gender difference favoring males emerging in the high school years; effect sizes for gender differences are uniformly 1.0. All VRs, by state and grade, are >1.0 [range 1.11 to 1.21 …]. Thus, our analyses show greater male variability, although the discrepancy in variances is not large…” In keeping with their general indifference to statistical significance, the authors provide no p-values to accompany their results (which is very poor form in any empirical study). Coincidentally, the very next issue of Science published the study by Stephen Machin and Tuomas Pekkarinen, “Global sex differences in test score variability,” Science, 322:133-134, 2008. It repeated much of the analysis performed by Heyde et. al. (2008), using PISA data for 2003. However, the authors also had the good sense to disclose the statistical significance of their results. To summarise, for the U.S. alone, they found d-statistics for reading and mathematics of -0.32 and 0.07, respectively, with p<0.01, in both cases (i.e. both results are significant at the 1% level). They also found variance ratios of 1.17 for reading and 1.19 for mathematics, with p<0.01, in both cases. Finally, the d-statistics (gender gaps) for mathematics in the top 95% and bottom 5% were found to be 0.22 and -0.11, respectively (p<0.01).
7. Heyde et. al. (2008) simply announce that variance ratios between 1.11 and 1.21 are "not large." Really? Suppose boys' and girls' maths scores are normally distributed with the same mean, and a variance ratio of 1.2. Then a quick calculation reveals that boys will outnumber girls by about 3:1 in the top percentile of maths performers. Since those children are most likely to enter high-earning professions, how can anyone argue that such values are economically insignificant?
8. However, variance ratios are not even that interesting – we're really interested in the tails of the distributions. (Recall, that my initial summary of the results in Stoet and Geary (2013) said "…the left-hand tail (i.e. the worst performers) reveals little difference between boys and girls, while the right-hand tail (i.e. the best performers) is dominated by boys." I said nothing about variance ratios.) Unfortunately, Heyde et. al. (2008) undermine your argument a bit on this score, by reporting that boys outnumber girls by 2:1 in the top percentile of maths performers.
9. You write: "Ok, so there is some evidence for variance, but it’s not on the scale that would explain real-world gender disparities and there might be a strong cultural effect driving it, instead of biology." Here I agree with you. Even if boys outnumber by girls by 2:1 in the top percentile of maths performers, we should still expect to see around 33% of scientists, engineers and mathematicians being women. I also agree that the data is insufficient to infer a biological explanation for the gender gap in maths test scores – there is more likely a constellation of environmental factors at play (Stoet and Geary (2013) argue that boils down to issues with pedagogy). In any case, irrespective of the causes of the gender gap in maths, there is no doubt that it is a socially harmful phenomenon, and we have every incentive to eliminate it. Where I disagree with you is on the question of whether the gap exists in the first place, and on its economic and statistical significance; I think it is a robust empirical feature – especially in the tails of the distribution – and while denying its existence may be comforting, you can't solve the problem that way.
10. You write: "So while that might be a nice study, when you gather together all studies you get a different picture." I think you're wrong about this. As far as I can tell, the claim that the distributions of mathematics scores for boys and girls are identical represents a minority view. Very few serious studies make such a claim – not even Heyde et. al. (2008), when you consider their evidence on the tails.
Unfortunately, you probably haven’t chosen the best article upon which to base your case.
Fair enough, I’ll use yours instead.
Curiously, you omit the very next sentence: “However, measurable differences existed for complex problem-solving beginning in high school years (d=+0.29 favoring males), which might forecast underrepresentation of women in science, technology, engineering, and mathematics (STEM) careers.”
I omitted it because it was irrelevant. The corollary of “regression to the mean” is that “a subset or sample can vary significantly from the mean.” Finding one subset of mathematical skill that favors men does not contradict the theory that there are no sex differences overall. What it does do is demonstrate the variability in the dataset, which is associated with cultural explanations (as the realist position assumes the difference is universal and thus should not vary), so at best you’ve just shown cultural factors can cause Cohen’s d to vary by about 0.29. This will come in handy later.
Sure, but that only deals with economic significance; what about statistical significance (which is probably more important in this case)? The authors are completely silent on this issue.
You don’t seem to understand what statistical significance means. Suppose I demonstrate to a very high statistical significance that flipping a specific coin will result in heads 51% of the time. Should I stop using it for flipping? Probably not, as most coin flips are one-off events, and the consequences of having a slight bias are negligible. Statistical significance only tells us the certainty those results weren’t arrived at by chance, it says nothing about how those results came about or what they mean in daily life.
Also, what I linked to was a two-page summary published in Science. There wasn’t enough room there to cover the subtleties. Hyde not only has covered your basic argument about variance, in fact, she did it over thirty years ago.
Assuming that engineering requires a high level of spatial ability, can the gender difference in spatial ability account for the relative absence of women in this profession? The above findings of such a small gender difference would appear to argue that the answer is no. However, the question has now shifted from a discussion of overall mean differences in the population to differences at the upper end of the distribution. And relatively small mean differences can generate rather large differences at the tails of distributions, as the following sample calculation will show. Assume, conservatively, that the gender difference in spatial ability is .40 SD. Using z scores, the mean score for males will be .20 and the mean for females will be —.20. Assume also that being a successful engineer requires spatial ability at least at the 95th percentile for the population. A continuation ofthe z-score computation shows that about 7.35% of males will be above this cutoff, whereas only 3.22% of females will be. This amounts to about a 2:1 ratio of males to females with sufficient ability for the profession. This could, therefore, generate a rather large difference although certainly not as large a one as the existing one.
The disparity would become even larger if one considered some occupational feat, such as winning a Nobel prize or a Pulitzer prize, that would require even higher levels of the ability. For example, suppose that spatial ability at the 99.5th percentile is now required. The same z-score calculations indicate that .85563% of males and .27375% of females would be above that cutoff, for an approximate 3:1 ratio of males to females.Once again, though, this is not nearly, a large enough difference to account for the small proportions of women winning Nobel prizes.
Hyde, Janet S. “How large are cognitive gender differences? A meta-analysis using w² and d..” American Psychologist 36.8 (1981): 892.
This is a finding she repeats nearly thirty years later: “Gender differences in math performance, even among high scorers, are insufficient to explain lopsided gender patterns in participation in some STEM fields.”
This brings me to your points 7 through 9. Note that in order to do the above calculations, Hyde assumed those d values were entirely explained by biological factors. As you pointed out above, though, they’re within the range of what can be generated by social factors. So the central assumption behind her calculation is not established, and thus we can wave the entire thing away with Hitch’s Razor: that which can be asserted without evidence can be dismissed without it. The same applies to your calculations.
First, the authors had data for 10 grades from 10 states, yet they’ve calculated only 36+21+9=66 d-statistics; you would expect 100 such values.
If all ten states returned scores for all ten grades, that is. Look at the N values on the main chart; Grade 4 has 763,155 samples, while Grade 5 has 929,155. Is it more likely there was a massive demographic bump that caused an increase in the birth rate by 160,000 over those ten states, or that some of them didn’t return test scores below Grade 5? This guesswork is confirmed in the chart on page two, which breaks down the results by state and grade. Wyoming has six gold squares, indicating that it returned only six grades worth of data.
However, averaging d-statistics across completely different distributions is pretty meaningless. To see why, note that girls and boys may well develop at different rates from Grade 2-Grade 11, so that overall test score distributions could vary substantially over time.
Hold up, I thought you were arguing for an overall sex difference? Now you’re shifting the goalposts, quietly switching your hypothesis to another one that argues for transitory sex differences during development. It doesn’t help that your own dataset is drawn an even narrower age range (“the exact age for inclusion is 15 years and 3 months to 16 years and 2 months”), so by taking this tack you’ve also defanged your own citation.
In keeping with their general indifference to statistical significance, the authors provide no p-values to accompany their results (which is very poor form in any empirical study).
The p-value represents the odds of the null hypothesis being true, instead of the testing hypothesis; in this case, it’s that there is no correlation between sex and math scores. Hyde’s hypothesis is that there is no correlation between sex and math scores.
Hyde’s hypothesis is the null hypothesis, thus it carries no p-value. And as I hinted at before, p-values are overrated.
Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude –not just, does a treatment affect people, but how much does it affect them.
-Gene V. Glass
The primary product of a research inquiry is one or more measures of effect size, not P values.
These statements about the importance of effect sizes were made by two of the most influential statistician-researchers of the past half-century. Yet many submissions to Journal of Graduate Medical Education omit mention of the effect size in quantitative studies while prominently displaying the P value.
Sullivan, Gail M., and Richard Feinn. “Using effect size-or why the P value is not enough.” Journal of graduate medical education 4.3 (2012): 279-282.
But back to you:
As far as I can tell, the claim that the distributions of mathematics scores for boys and girls are identical represents a minority view.
And yet the very study you cite argues for gender similarities rather than differences. Don’t believe me? Ask yourself this: what are the effect sizes that paper found?
Stoet G, Geary DC (2013) “Sex Differences in Mathematics and Reading Achievement Are Inversely Related: Within- and Across-Nation Assessment of 10 Years of PISA Data.” PLoS ONE 8(3): e57988. doi: 10.1371/journal.pone.0057988
Don’t see them? That’s because the authors buried those numbers in a misleading chart, and hid the decryption key two paragraphs back:
For all analyses, we express sex differences in PISA score points. These scores are not “raw” scores, but result from a statistical analysis that normalizes student scores … such that the average student score of OECD countries is 500 points with a standard deviation of 100 points. The advantage of this is that scores become easily comparable and differences easily to interpret. For example, a 10 point difference between boys and girls reflects approximately 1/10th of a standard deviation.
There’s no indication of sample sizes or standard deviations for boys and girls separately, so we have to assume they’re roughly equal and approximate Cohen’s d. That makes the calculations easy here; just take the point difference and divide it by 100. So for the gender gap in median math performance, d is constant at roughly 0.1, and for median reading we get values that increase from 0.3 to 0.4, ish.
OK, so what does those numbers mean? Here’s a handy way to visualize them; adjust the slider to the effect size you want, and watch the distributions and numbers change. The percentage of overlap is the area where both distributions visually overlap, relative to all samples. My favorite number on that page is the “probability of superiority;” invented by KO McGraw and SP Wong in 1992, it asks a simple question: if you picked a random thing from sample A, and a random thing from sample B, what are the odds that A would be “superior” to B? I invented a similar-but-slightly-different metric less than a year ago, “predictability:” if I plucked a random value from the total pool of samples, how accurately could you predict which distribution it came from? My metric tends to be a smidge smaller, but in the same ballpark; calculate it by subtracting the percentage of overlap from 100%, then dividing the result by two.
Sorry, I got distracted. Point is: even a Cohen’s d of 0.3 isn’t much of a difference, and yet the overall gender differences in this paper struggle to hit that mark. For math, there’s a 92% overlap between boys and girls, a 53% probability of superiority, or a “predictability” of 52%. For the 2009 overall reading difference of 0.4, we find an overlap of 84%, a 61% probability of superiority, or a “predictability” of 58%. Not impressive.
But it gets worse, because those are overall differences. There’s very little genetic difference between human beings, so any variation we see in the numbers should be indicative of culture, not biology. So to get a true sense of who’s in change, we need to consider the variation between countries as well. A look at the relevant chart shows a huge amount of variation. Math differences, translated back into Cohen’s d, range between 0.32 and -.175; reading differences range between 0.7 and 0.05. That’s about 0.5 and 0.6, respectively, or twice the overall mean difference for each category. Culture not only has quite a bit more sway over scores than biology, it looks capable of obliterating any biological gap.
Except the picture is even worse than that. Which countries were sampled?
The number of countries contributing to the PISA data sets include both OECD and OECD-partner countries. The number of participating countries/regions (e.g., Hong Kong) has increased to 74 in 2009
“Organization for Economic Cooperation and Development.” Not only are were the sampled countries skewed towards the richest on the planet, they were skewed towards countries that engage in substantial economic trade with one another… and this would tend to homogenize their cultures. But sitting behind the assertion “the central value represents biological tendencies” is the assumption “all our samples were taken from heterogeneous cultures, representative of humanity as a whole.” That plainly isn’t true, and worse still, the paper conveniently provides evidence both of this homogeneity, and that a more heterogeneous sample would show a smaller gender gap.
It’s all in this chart. I’ll let the paper speak for itself here:
The OECD countries not only have higher overall scores, their mathematics gap, favoring boys, is more tightly clustered between ?5.5 and 17.5 points (M = 10.5,SD = 5.1). The two outliers are Iceland (HDI rank = 2, GGGI rank = 1) and Georgia (HDI rank = 61, GGGI rank = 40). In contrast, there is considerable variability in the non-OECD countries (between ?15.0 and 30.0 points, M = 5.4, SD = 10.5), with boys’ having higher mathematics achievement in some of them (e.g., Costa Rica) and girls having higher mathematics achievement in others (e.g., Albania).
The included “non-OECD” countries still skew towards the rich and trade-happy, but nonetheless form a more culturally heterogeneous sample. And they demonstrate both a smaller gender gap and greater variation when separated out from the official OECD countries.
I could go on (students demonstrate a much greater gender gap than the general population, for instance, and I never discussed sample sizes), but this comment is getting a bit long. So, let’s cut it short with a summary of what your own citation demonstrates:
– It demonstrates the gender differences in math scores are negligible, and small-ish when looking at reading scores.
– It demonstrates cultural influences have a greater influence than any biological ones, and that they’re capable of wiping out any biological difference.
– It is drawn from a pool of rich countries that have some degree of cultural homogeneity, and demonstrates the central scores are likely contaminated by cultural overlap, further minimizing the influence of biology.
The authors were oblivious to all this, too, as they approached their dataset from the assumption of difference, rather than similarity.
@Hj Hornbeck wrote: “… I will note that while the general idea that men exhibit more variance than women has been around for a hundred years, the magnitude of the variance seems inversely proportional to the sample size…”
The idea that variance estimates could be inversely proportional to sample sizes doesn’t make much sense. After all the standard variance estimator is unbiassed. Of course, the standard errors of estimated variances certainly do depend on sample sizes, which in turn means that the statistical significance of the estimates depend on sample sizes as well. But any perceived sample size bias is bound to be spurious.
What could happen (but you’d have to provide evidence for this) is that more modern studies are progressively using larger datasets, while *population* variances are simultaneously decreasing (for exogenous reasons).
You wrote: “Boys tend to disproportionately drop out of school, for instance, which will artificially boost variability (as low-scorers will be encouraged to stay, while high-scorers desire to stay).” That’s hard to believe. Drop-outs are bound to affect the left tails of performance distributions disproportionately, as the poorest students opt out of a school education. Since truncating left tails should reduce variances, the effect should be the opposite of what you describe.
In any case, survivorship bias is not a big issue with the PISA data, since schooling is mandatory until the age of 15 in all participating countries.