[Advice] Replicating & Relationships

I have a couple of questions, any advice would be much appreciated. Firstly: how effective are perpetrator programs, if the abuser wants to change? Secondly, I have been repeatedly abused and my parents relationship was awful; how do I figure out what on earth a healthy relationship even looks like? I don’t want to accidentally be abusive. Are there courses on that?

[Mod note: due to the implied content of this letter, I’m primarily looking at between-partner abuse]

Zinvarta on DeviantArt

I have so many thoughts! Unfortunately I committed to answering advice with other questions! So I’ll sub-divide this question into answers to the objective questions and questions you might ask yourself.

How effective are perpetrator programs, if the abuser wants to change?

This is a surprisingly complicated question!

A note on instrumental vs. expressive abuse:
We distinguish between two kinds of abuse, and consider instrumental abuse more dangerous.
Expressive abuse is the outlet, it’s the way to burn off feelings, it’s the person who punches the wall, who slaps their partner in the heat of an argument, or screams in their face. Expressive abuse is temporally close to the trigger. I usually think of it as naturally creating the cycle of violence.

Instrumental abuse is harm used to indicate power. I think about it as being much more manipulative. It’s the partner who shreds all of her boyfriends sports gear because he hung out with his friends to watch the game, or the man who hurts his partner’s pet and tells them it’s because they should know better than to make him mad.


Lundy Bancroft, author of the famous book Why Does He Do That? makes the strong claim that most abusive [men*] are not able to change. The Campbell Collaboration—functionally the Cochrane Collaboration, but for social science—agrees when it comes to court-mandated domestic violence, there is no visible effect. [pdf]

But of course, this isn’t your question, really. People who end up in court-mandated treatment are (1) likely to have a very clear cut, almost exclusively physical violence related case and are (2) mandated, and thus very unlikely to be motivated to change.

Fortunately, the Campbell Collaboration also assessed Cognitive Behavioral Therapy for men who [physically] abuse female partners. (Apologies for the heteronormativity here, I’m having a lot of trouble finding anything else). They…were not optimistic [pdf]:

This review included six randomized controlled trials from the USA involving a total of 2,343 participants.

Four of the studies compare a group of men who receive cognitive behavioural therapy with a control group who receive no treatment but are released on parole, carrying out community service or under supervision. The other two studies compare cognitive behavioural therapy with other forms of treatment (process-psychodynamic group treatment and facilitation group). Following the course of treatment (a period of up to 26 weeks and a follow-up period of 1-2 years) the level of repeated violence is measured.

The studies fail to provide a clear picture of the effect of cognitive behavioural therapy on physically abusive men, as they point in different directions. The individual circumstances surrounding each study can determine how the therapy is carried out and thereby the effect of the therapy. However, on the basis of the information available, it is not possible to determine which variations are decisive. As the studies point in different directions, the idea that certain variations of the therapy may have both a positive and negative outcome cannot be ruled out.

The review includes studies where enrollment in the CBT program is voluntarily as well as those where enrollment is compulsory. The findings of the review do not, however, show any clear correlation between voluntary participation and a positive outcome of the treatment or compulsory participation and a negative outcome of the treatment.

A different, less careful, meta-analysis by Babcock, et al [pdf] finds a minimal effect from treatment, though again they pool voluntary and compulsory attendance and look at arrested populations of men. As explained by Stith et al, 2012:

While there is some question about what these small effect sizes actually mean for women who have been assaulted by an intimate partner, Babcock et al. (2004) note that, using the most conservative result, the treatment effect based on partner report in experimental studies (d = .09), treatment is responsible for approximately one tenth of standard deviation improvement in recidivism. In other words, a man who is arrested, sanctioned by the court, and treated has a 40% chance of remaining nonviolent versus a 35% chance of remaining nonviolent for a man who is arrested and sanctioned but not treated

Stith et al (2012) which seems to currently not be available outside university libraries, also review a large number of other programs—all with very few studies—to suggest that sometimes they work. However, uniting variables in those seemed to be that the couples were planning to stay together and that abuse did not necessarily disappear, but significantly decreased.

In conclusion…I’m not sure if I’m answering your question, or if the research does. When people are arrested for abuse, they’re easy to measure, and for those people, treatment has a minimal impact, if any. But those are people who physically abused someone else (usually a partner). And someone of them (just over a third) seem to stop spontaneously.

But I think the thing, LW, the thing is that even if someone can get become a better person, you are not obligated to remain involved while they do. I can’t promise that the perpetrator you’re asking about will or won’t change, but I can promise that you don’t have to find out. 


LW, when you ask about forming relationships in the context of past abuse and seeing dysfunctional models of relationships, I think about the Seeking Safety program, which is evidence-supported.

Seeking Safety consists of 25 topics that can be conducted in as many sessions as time allows, and in any order. Examples of topics are Safety, Asking for Help, Setting Boundaries in Relationships, Healthy Relationships, Community Resources, Compassion, Creating Meaning, Discovery, Recovery Thinking, Taking Good Care of Yourself, Commitment, Coping with Triggers, Self-Nurturing, Red and Green Flags, and Life Choices.

There are frequently groups that work through the Seeking Safety model, and you’re likely to find one if you live near a large city. Alternatively, there’s a well-liked Seeking Safety book.

And finally, the questions:

How comfortable are you at saying no? Have you practiced this?
Saying ‘no’ to people is a skill! If you have someone you can trust, you could set up this as a thing you practice with them; where you plan to decline their suggested plans or idea of where to eat, etc.

Dialectical Behavioral Therapy has some things to say about this. [Example here]

Who’s on your team?
Who are the people outside your head that you could ask about your relationship? You don’t necessarily have to do what they say, or agree, but if you weren’t sure if something that occurred in a relationship was safe, who would you ask? One of them might be a therapist, but they could be friends or mentors, etc. Most cities have a hotline for intimate partner violence/domestic violence. You could call them to ask if you weren’t comfortable talking to someone else.

What are your tripwires?
What things would tell you ‘this is abuse’? It’s perfectly fine if you don’t have an answer for this yet! You might want to read over other people’s thoughts about what their red flags are. You might enjoy reading about green flags (note the difference between “good signs” and “requirements”) and working backwards.

What are things that make you uncomfortable but aren’t necessarily ‘abuse’?
You make it sound as though you have been around some unhealthy relationships. Sometimes, this causes people to have triggers that aren’t necessarily signals of abuse, but are things they cannot tolerate. Some people are fine having conversations where voices are raised and shouting ensues, some feel like crawling into a corner with their hands of over their ears at the slightest raised voice.

It’s useful to know what these are. You might want to change them or you might want to just avoid relationships that include these things.

Do you have a fuck off fund?
This a more fun way to say a savings account, but I’m serious about it. People with all sorts of protective factors end up in bad relationships, or feeling trapped in not-great-but-not-awful relationships. It happens. Do you have the savings to get yourself out?


*Bancroft talks almost exclusively about men as abusers, one of the major failings of the book.

[1] Lynette Feder, David B. Wilson: Court-mandated interventions for individuals convicted of domestic violence. Campbell Collaboration, 2008


Lord, Ross & Lepper?

This blog has been redesigned! You might have noticed the new tagline.

In 1979, Charles Lord, Lee Ross, and Mark Lepper assigned 48 undergraduates to two groups, based on their beliefs about the death penalty: did it work as a deterrent?

The students sat down with a researcher (blinded to their initial beliefs about the death penalty) and were asked to select index card containing information about a research study that investigated whether or not capital punishment results in an overall decrease in violent crime.

It’s kind of a sport to find the trick researchers have played on their unwitting sophomore psychology majors, and this one was it. Each round of index cards contained ten identical cards, such that the student had no real choice. They either were drawing from an identical hand of ten cards that summarized a study that supported capital punishment as a deterrent (pro-deterrence) or an identical hand that summarized a study that did not support death penalties as deterrents (anti-deterrence). That is, within each group, some of the students began by seeing a study that purported to agree with them, and some of them saw a study that disagreed. [These methods get hairy—there’s a chart below]

The student read the index card, then was given even more information:

The descriptions gave details of the researchers’ procedure, reiterated the results, mentioned several prominent criticisms of the study “in the literature,” listed the authors’ rebuttals of some of the criticisms and depicted the data in table form and graphically.

Given all the information, researchers asked the students to analyze the research: how methodologically sound was the study? How convincing did they find it? For each question, participants answered on a scale from -8 (completely unsound methods or completely unconvincing) to 8 (sound methods, or very convincing)

Then they repeated the whole procedure again, starting with drawing an index card with a study showing the opposite, getting more information, and then analyzing the research.

See the whole process below:

Simplification of Lord, Lepper, & Ross (1979) study design.

 So, some students who are Pro-Deterrence saw a study that disagreed, then a study that agreed, rating each. Some students who are Pro-Deterrence saw the opposite: first, a study agreeing with them, then disagreeing. Anti-Deterrence students were divided in the same way. Oh, and here’s an important bit: no matter what group, all students saw the same pro-deterrence study and the same anti-deterrence study. They were all asked to assess the methodology and persuasion of the same two studies, one of which agreed with their beliefs, and one which did not.

Did the converge on the truth? Did they find the same kinds of methodological holes in each study?


The results?

A single, main effect of initial belief on assessment of research, at p<.001.

That is, the participants discussed methodological holes and lack of persuasion only when the study disagreed with them. The research has since been replicated. And replicated. And replicated.

Screen Shot 2015-08-14 at 8.37.48 PM

As these comments make clear, the same study can elicit entirely opposite evaluations from people who hold different initial beliefs about a complex social issue.

This blog aims to do better. Sometimes it succeeds.

You can read the entire study here, and about attitude polarization generally here.

Looking at Cute Stuff on the Internet: A Modest Defense

More than a year ago, I made a tumblr for my boyfriend. It updates five to ten times a day with cute animals: mostly dogs (ahem, puppies), some kittens and rats (my weakness), and the occasional misfiled social justice post from my personal tumblr.

I’m spending hours on r/awww for research, I swear!

I will not pretend that I made this because I was trying to increase his productivity, butbutbut if I was trying to claim to be The Most Evidence Based with this gift…well, the science has something to say about that.

Six years ago, Sherman, Haidt (yes, that Haidt), and Coans did a study on infantile physical morphology (which maps onto perceived cuteness). Studies about the relationship between exposure to infantile physical morphology and fine-motor dexterity sound all stuffy and serious until you look at their research materials: namely, some pictures of baby animals and the game Operation. [Related: LOLmythesis]

To their credit, Sherman, Haidt, and Coan’s hypothesis does make sense: that when we see creatures with infant-like features, we can do fine-motor tasks better. Or to put it more simply: when we see things that look like babies, we can be more gentle with fragile stuff. For instance, babies.

To test this hypothesis, they ran two experiments.

In the first, participants (all women) were presented with cute and less cute animals. While previous research had used sketches of animals (Here, we scoff. High def fluff, please.) Sherman, Haidt, and Coan used photos of puppies and kittens (the cute condition) and adult dogs and cats (the less-cute condition).

How to determine cuteness, you ask?

You mean besides just looking at their teensy faces and floppy ears? (Yes, that’s exactly what you mean.)

A panel of 17 rated the animals on dimensions of interestingness and cuteness. In photos of puppies/dogs and kittens/cats, high-cuteness correlated with high-interestingness. More on this later. All of the participants in both cute and not-cute conditions were given a similar slideshow, beginning with pictures of house interiors, followed by cute or less-cute animals, followed by more house interiors.

Then, everyone played (solo) games of Operation. Participants in the cuter condition performed better but with a moderate effect size, but I wasn’t quite convinced. Remember the cute-interesting correlation? What if better performance—if this replicates—on Operation is because things that pique your interest cause you to be more attentive and careful? And we have a single gender group in this experiment; even assuming the hypothesis about cute causing more carefulness is correct, what if it’s only true of women?

…And there we have Experiment 2.
This time, a mixed-gender group, with both conditions equalized in terms of interestingness, excitingness, and enjoyableness. This necessitated adding in lions and tigers to the mix of adult animals, but puppies and kittens were the cuteness condition. And…again, the cuteness condition did far better on the dexterity task, though the effect did decrease slightly.

But that’s not even the end of the research! Nittono, Fukishima, Yano, and Moriya are the authors of an available, delightfully-named paper called The Power of Kawaii. Their three part research set replicates the Sherman, Haidt, and Coan Experiment 2 (finding a near-identical effect size), as well as examining the relationship between looking at baby animals and performance on a visual search task (i.e. Can you find as many X as possible in this sea of Y?) and on a global-local task. (Example here)

The results are mixed. Cute animals have a small positive effect on skill in visual searching. However, cute animals didn’t help with global tasks, potentially because they’re only improving narrow focus and carefulness. More from Nittono, et al:

While Sherman and Haidt proposed that cuteness cues motivate social engagement, the current findings show that the effect of cuteness goes beyond the tasks that suggest social interaction. This study does not deny the view that cuteness is related to embodied cognition and sociality motivation. Rather, this study provides further evidence that perceiving cuteness exerts immediate effects on cognition and behavior in a wider context than that related to caregiving or social interaction.

And that, friends, is why you should waste spend more time looking at cute things. Go on, improve your productivity! Your fine-motor dexterity! Your attentional focus!


Related: Cuteness and Disgust: The Humanizing and Dehumanizing Effects of Emotion

Lesbians and the Parable of the Late Client

In social work classes (and also in my undergraduate clinical psychology/counseling psychology classes) there is the Parable of The Late Client.* It looks something like this:

There’s a therapist whose client is always late. Not five minutes late, but fifteen or twenty minutes late to the majority of sessions. The therapist finds this disrespectful and resistant, and in the grand tradition of Freud, decides to address with the client what this means about how the client views the therapist-client relationship. What is the client’s relationship to authority? What things do they believe about the therapeutic process? Does the client need to push away everyone who tries to help her? What does that mean? So the next time the client walks in late, the therapist brings it up.


…the bus is late. The client cannot afford to travel except by public transportation, and the bus is often late. She could catch the much earlier bus, but that would mean leaving work early each day she had therapy, and she can’t afford that either.

The first point of the parable is to pay attention to environmental factors** and to be extraordinarily careful in attributing fault to the client. Client appears to be ‘resistant’ to medication? Can they afford them? Client is late—do they have control over their ability to arrive?

The second point is about—at least, in the tellings I’ve heard—making sure to look around at what’s leading to the client’s concern…and what risks and strengths could shape the future of the client’s life. So, I spend a lot of time looking at case studies And it seems like there are two kinds of risk factors: the Intrinsically Risky and the Societally Risky.

Take lead.

Lead exposure is bad. It’s near impossible to bend and twist ‘has lead exposure’ into anything but a risk factor for future health. Exposure to toxic metals is not good for you, and will probably measurably impact your life. Also bad: domestic violence, iodine shortages, not being able to afford food, really, poverty of all kinds, and a bunch of other things. These are not putting you at risk purely because society thinks they’re bad and then treats you poorly, they are things that will harm you no matter how people feel you experiencing them.

Okay, but take being a lesbian.***

In theory, being a cis lesbian should be….probably a protective factor. Lesbians are less likely to contract HIV/AIDS, significantly less likely to have an unintentional pregnancy, less likely to be killed by a partner.

But right now, lesbians are more likely to be suicidal, more likely to self harm, and experience a lack of social support than their heterosexual counterparts, and all of that makes the reaction to ‘in case study, client is a lesbian’ be ‘risk factor’

Just taking a wild, speculative, swing at things, I’m going to guess this is not a feature of ‘wanting to sleep with women’ and possibly more a feature of discrimination, lack of acceptance, and ambiguous attribution issues. (In fact, this study, linked to previously, suggests that lesbian and bisexual women had slightly lower levels of depressive symptomatology than their heterosexual counterparts.)

And I’m sure there are other qualities that are like that: risks only because we make them out to be. I’d love to arrive a place where orientation isn’t one of those—it feels likely that that will happen within my lifetime. And then we’ll move on to the next Societally Risky, but not Intrinsically Risky feature.

*I don’t know if it’s been named as such, but it’s appeared in a variety of textbooks and classes across more than one university.
**the social work version of this told it as “psychotherapists don’t pay attention to environmental factors, but we do.
*** For the purposes of this discussion, we’re talking cis-lesbians. I know this is frustrating, but the research only looks at cis lesbians, and paragraphs got unwieldy when I did anything else.

Gratitude, In Research

“Gratitude is the positive emotion one feels when another person has intentionally given, or attempted to give, one something of value.” (McCullough, Kilpatrick, Emmons, & Larson, 2001)

It drives prosocial behavior—actions that increase the net well-being of a group—even more than just being in a good mood does (Bartlett & DeSteno, 2006). It also occasionally prompts research that offers this analysis:

As predicted, a planned contrast revealed that participants in the gratitude condition felt more grateful (M 5 3.08, SD 5 1.08) than did those in the amusement (M 5 2.72, SD 5 1.09) and neutral (M 5 2.52, SD 5 0.84) conditions, F(1, 102) 5 4.54, prep 5 .88, d 5 0.52.2 Similarly, those in the amusement condition felt more amused (M 5 3.58, SD 5 1.20) than did those in the gratitude (M 5 2.52, SD 5 0.99) and neutral (M 5 2.40, SD 5 1.14) conditions.

But it’s driven by intention; we’ll be more grateful to those who intended to help us than those who accidentally improved our lives. Here, intent is fucking magic. ….Or at least, it is for the 126 men who predicted that they’d be more grateful to an intentional benefactor after reading short stories about being helped. Sometimes we settle for approximations of answers, as produced by psychology students. (Tesser, Gatewood, Driver, 1968)

And gratitude is protective; a thankful disposition is negatively correlated with depression: the more grateful you are, the less likely you are to exhibit subclinical or clinical levels of depression. But gratitude exhibits odd relationships: it shows no relation to level of anxiety. Recognizing good in your life might make you anxious for the future, or comfortable that you’ve achieved The Good Life. (Watkins, et al., 2003)

Some of us signal gratitude by reciprocating when someone has been thoughtful:

Although much of the early empirical work focused on gratitude as a mechanism for exchanging costly benefits (one could call this an economic perspective), recent evidence suggests that gratitude often serves a broader social function, namely, promoting relationships with responsive others. We recently demonstrated that the two most robust predictors of gratitude were the perception that the benefactor was being responsive to the needs and wishes of the recipient (i.e., thoughtful), and liking the benefit. (Algoe & Haidt, 2009)

And some of us express our gratitude by writing blog posts about the research of our emotions on Thanksgiving.

So today, I am grateful for Jesse and Miri and Robby and Chana and Mitch. For whimsy and foot massages and open access journals. For libraries (and books I can download for free)and hot water and candles. For late nights lost in good conversation and quiet early mornings. For watching extraordinary people accomplish extraordinary things. For curiosities and joyous occasions, and the people who share will share theirs with me.

About Face (Validity)

So say you’re looking to measure depression. It’s very important that the people in your research are depressed, rather than anxious or sad or grieving, but you don’t have time or money to pay a psychiatrist to spend hours interviewing each person who requests participation in your study.

Being an enterprising sort, you decide to create a scale—a questionnaire that you can distribute to everyone who responds to your generic Participate In Interesting Research About Stuff flier. Participants who score in the Depressed Zone will then get an interview with a psychiatrist, thus decreasing the total number of hours of her time that you pay for. You understand Likert scales and careful item selection, and you run a few pilot tests, and in the end, you have this:

Screen Shot 2014-10-15 at 9.48.10 AM

We now take a brief detour to explain reverse scoring. Some of the items (psychspeak for each question/statement) would be scored backwards. Answering ‘Strongly Disagree’ to items #1 and #2 would be in direct conflict with strong disagreement with item #3. So to score this test, we don’t just count up the number of answers in each category—we reverse the coding system for items. People who agree that they are equal to others, have a number of good qualities, and disagree that they’re a failure will all go in the Probably Not Depressed basket. People who don’t agree with the first two statements and agree with the third go in the Probably Depressed, Seek Help basket.

This is a common technique to force the participant to read each question, and give additional information to researchers. In a questionnaire without reverse coding, when you have someone who has Strongly Agreed with every statement they could have actually agreed strongly with each component (suggesting they’re severely depressed). But they could also be one of those jerks who just answered every question the same.  Reverse-coding controls for jerks.

But enough about this picky detail of psychometric design. You have a measure for depression!

Except, this isn’t a measure of depression. It’s a measure of self esteem. Rosenberg’s Self Esteem Scale, in fact.

Which brings me to the other picky detail of psychometric design I want to talk about: face validity. I’ve usually heard face validity  explained as the answer to this question:

On the face of it, does this scale measure the actual thing we’re trying to measure?

Having low self esteem does correlate with depression, sure. We might even go as far as to say that ‘has lack of self esteem’ is often a component of being depressed. But measuring self-esteem is not the same as measuring depression. Okay, but mental health is fairly fuzzy in terms of definitions. Let’s get more concrete.

Having low social economic status (SES) tracks very closely with poor diet. If find a group of people with very low income, they’re almost definitely going to have poor nutrition. But, if you spend ten minutes asking adults about their monthly income, you have not collected data on their nutritional intake. You’ve got priors and you can speculate with some amount of surety, but the thing you have measured is still monetary. Additionally, if you write a paper about the relationship between income and method of transportation, and your data for income sounds like “eats more than two fruits per day” the reviewers will giggle and write sarcastic notes when they return your study.

Returning to Rosenberg’s Self Esteem Scale, what we have is something that seems to be face invalid. If you have some amount of psychopathology (psychspeak for mental illness) training and you’re asked if it seems like the scale above is measuring depression, you’d probably say no, or not quite.

And of course, you could do some amount of empirically testing the scale too. You could see if it correlated with other measures of depression: with Becks Depression Inventory, or the HAM-D. You could see if it was uncorrelated with things that aren’t depression. You don’t want your measure of depression to be correlated with being sad (a brief state, where depression is more like a trait). But in the end, it’s possible that both of these could be true, and you’d still have a measure that answered in terms self-esteem, rather than depressiveness. That’s where face validity comes in. Is it all those things: uncorrelated with unrelated concepts, correlated with related concepts and measures, and does it sound like it’ll measure the thing we want?

And…I promised myself I would make this post more than just geeking out about psychometrics, so here we are. I think the issue of face validity is what a bunch of non-specialist arguments over psychology boil down to. Some people look at an IQ test and say, no, that’s not measuring intelligence, intelligence clearly includes all these other components! To which pro-IQ test people fold their arms and glare back with no, that stuff is outside the concept-space of intelligence, this is just measuring intelligence. Or, “that’s not measuring for ADHD, it’ll catch any hyper kid!” vs. “This is about ADHD, those kids you think are ‘just hyper’ have pathological attentional issues!” And at some further point, everyone’s stomping around arguing about which is the map and which is the territory and I start to have sympathy for Szasz.

Related: Streetlight Psychology

Rosenberg, M. (1965). Society and the adolescent child. Princeton, NJ: Princeton University Press.

The Goldberg Paradigm and Elegant Gender Research

[CW: discussion of gender research and reactions therein. More research analysis than social justice.]

Recently, I’ve been having a number of conversations about methodology in psychology. I’ve been mentioning that psychology of gender research has tended to have excellent methodology, relative to the baseline. In response, I’ve gotten shocked laughter, nervous giggles, and utter confusion. I expect that some of this comes from researcher fears about one’s research being misinterpreted for great big arguments on the internet, defenses of sexism, etc.

For instance, imagine this entirely fictional scenario in which you take a careful look at Norwegian households and chores. (pdf, all in Norwegian until p. 223) You notice that the households with more parity in housework sharing are also households that break up more frequently. You write two hundred and twenty nine pages,* carefully thinking through causes for the results, including statements like

“Untraditional couples, where he does the most of the housework, may hold a less traditional or more modern view about marriage, whereby marital dissatisfaction more easily leads to marital break-up. If so, the division of housework is no “cause” of later divorce”

and after all of that work, you get deep media analysis like:

In what appears to be a slap in the face for gender equality, the report found the divorce rate among couples who shared housework equally was around 50 per cent higher than among those where the woman did most of the work.” [Telegraph]

Ladies, you may want to think twice before asking your husband to help out around the house. [thanks, HuffPo]

And then there was the pushback: who is this guy? There could be so many other mediating variables! (The pushback was impressively thoughtful, but I still regularly hear that research says women should do housework/chore-sharing will ruin your relationship). In short, people might not have strong and divisive opinions about the truth of research investigating how people pee in public restrooms. Said research is unlikely to be a question they’ve wrestled with, a formative part of their identity, or purporting to answer a question of how one should handle a highly-valued relationship, in the way gender relations is.

But, write a study showing that women are slightly better at a female-associate skill, that men are better at a male-associated skill, that there’s no difference in gender performance on a specific-gender-associated skill, that traditionally structured relationships succeed in specific ways, that non-traditionally structured relationships succeed in specific ways….do any of those things, and people will have Opinions and other people will have Disagreements, all of them asserting that they’re experts. I think this knowledge: that your research will be endlessly misinterpreted and picked apart in the public sphere, keeps psych of gender researchers constructing careful, airtight, designs and cautious discussion sections.

So, let’s talk about some really fantastic gender research: the Goldberg Paradigm.

The Goldberg paradigm is elegantly, beautifully, simple. It’s the sort of experimental design that throws me in to raptured lectures and hand-waving.

Picture this: you create one written profile of a potential job candidate. However, you assign it two different names: one male, one female. Then, armed with you two different job candidates, you hand the profiles to a number of department chairs, and see what happens. In fact what happens (and has prompted over a hundred replications since) is that the female-named resumes get less response. They’re less likely to be called, less likely to be viewed as qualified, less likely to be offered mentorship. This effect appears without a difference between men and women potential-employers.

The research done by Goldberg originally took place more than thirty years ago, but a more recent test of the paradigm showed that it continues to replicate. Variations are also used to test religious bias and prejudice against ‘non-American’ names.

So what makes the Goldberg Paradigm so cleanly beautiful? In short, it handles lots of tricky confounds. In many aspects of workplace treatment, disentangling other factors of communication style and halo effect  makes for an impossible game. Is it that employers favor men or that other, simple, psychology serves people who push hard for raises, and men are more likely to push for raises? The Goldberg-based resumes create two identical career histories, remove any appearance or behavioral variables, and see what happens.

After taking a minute to think about variables other than implicit gender bias that could result in this sort of robust effect, here’s the only ones I came up with:

Names: there’s some name research that perhaps certain names are associated with certain skills. If the male name on the Goldberg resume is more congruent with the job in question than the female name, eh….maybe? I’m skeptical of this research already, and have a hard time imagining that this effect, if it even exists in the wild, could be contributing to every single replication that found the effect.

Job-Gender congruency: For the most part, these were for male dominated fields: lab manager, researcher. If you were to do this with a nurses job, I would expect the results to reverse or disappear. (However, I want to point out more gender research, somewhat old, suggesting that perceptions of men doing ‘women’s’ jobs and women doing ‘men’s’ jobs are treated differently, and are far more likely to be rewarded for ‘token’ status.)

That’s….not so bad, as far as confounds in psychology go. And sure, some of that has to do with the subject at hand. “Does gender impact the interpretation of concrete details and qualifications on resumes” is a bit of an easier game than “How does gender impact subtle social status signaling in conversations in the workplace?” It’s just slightly harder to get the latter.  The former can give us some sense of the landscape, without building so much on tenuous connections and oceans of confounds.

And since I started by talking about poor interpretations of psych of gender research, let’s talk about what happens here, what to make of the results. They’re sometimes used to argue that ‘being a man is equivalent to X years of experience”. Which…is more rhetorical than purely correct, because of course it’s not that easy to quantify. What I think the cost that’s being indicated by the resume test is that of attributional ambiguity.

Imagine this: you’re half the Goldberg paradigm come to life. You, a woman, submit your application materials to a lab where you’d like to work as a research assistant. Several days later, you hear that a candidate has been selected–a man. This is the fifth time you’ve submitted materials, and you feel quite as qualified as the other applicants. On one hand, it might be that you’re unlucky. Or there’s something about your resume that’s off putting–a typo, or just failing to properly advertise yourself. It might have been that each of those times you applied, you were competing against an extraordinary candidate, who was hired each time.

Or, it might be that you’re a woman, and accidentally on the receiving end of some unconscious bias. You have no idea, and what’s more, you have no way to find out. You might be able to change some aspect of the application, or get more experience before applying to that sort of job again, but you also don’t know if that’s what works. You might be wasting extra work on nothing. And that, perhaps, is the most damning thing the Goldberg paradigm points at.

*This also puts me in mind of the Rind, et al controversy.

Goldberg, P. (1968). Are women prejudiced against women? Transaction, 5, 316-322.
Fidell, L.S. (1975). Empirical verification of sex discrimination in hiring practices in psychology. In R.K. Inger & F.L. Denmark (Eds.), Woman: Dependent or independent variable? (pp. 774-782). New York: Psychological Dimensions.
Rudman, L.A. and Glick, P. (2010). The social psychology: How power and intimacy shape gender relations. New York: The Guilford Press.