The Static-99 and Storytelling

[C.N. Rape, sexual assault, sex offenders, and almost nothing but that.]

The Static-99 is a ten question test intended to determine whether or not a sex offender will reoffend. I’m basically on board with the idea of trying to predict this better than we can get from our guessing/intuition. I did not, however, have high hopes for this Buzzfeed article on the Static-99.  I opened, the article expecting one of those anti-stats, people can do it better than any test, sorts of articles. Not so!

The Buzzfeed article tells a good story—man expectes to get out of prison, is committed to a psychiatric institution based on a predictive test (with the implication that the test is a betrayal of justice)—but I wanted more information:

The Static-99 helps decide which offenders are the riskiest, and looms large over civil commitment proceedings. It weighs a variety of facts about a sex offender’s past in order to predict the likelihood of future offenses.

More precisely, what the Static-99 predicts — with modest accuracy, at best — is the risk that men within a group of sex offenders will commit a new sex offense, compared to other members of that group. Experts agree that it’s a useful tool for managing sex offenders in prison — assessing which of them need higher levels of security, for example. But the way the test is used in civil commitment — to help make high-stakes decisions about offenders’ liberty after they have served their criminal sentences — is highly controversial. [emphasis mine]

You can’t just say it’s modest accuracy! What does that mean? Forgive me, Buzzfeed, but I cannot assume you are writing a statistically literate article. Cat pictures? Nobody compares. Statistical analysis? I’m gonna need some convincing. And—

The value of keeping monsters like Shriner locked away is clear. But few sex offenders are as obviously dangerous as he is. In general, rates of reconviction are low: Only about 5% of sex offenders are convicted of a new sex crime within five years of release.

Right, okay. But it’s also true that in general, it’s hard to prosecute rape and sexual assault. My city pursues criminal charges on roughly 1% of the sexual assault evidence collection kits which are filed. (That would be cases wherein the victim of sexual assault appears at a hospital within 1 or 5 days* of the assault, which is only a small subset of instances of rape.) Buzzfeed, you have so many articles on how hard it is to pursue conviction on rape cases. (Those articles from a fast skim of the last four months) Low re-conviction rates for sex offenders aren’t strong evidence that they’re safe! They might be, but this is not the evidence that will make me believe it. Low re-conviction is the expected outcome where it’s rare to convict at all.

But all of this is not to advocate for the Static-99. When I hear that the current test works like this:

Static-99 scores do not predict the severity of potential future offenses, however. Rapes involving extreme violence and the abuse of young children are lumped together with crimes like voyeurism and indecent exposure.

…I end up a little nervous and a little suspicious

My partner, Jesse, talks about the doing vs. trying distinction. If there’s an action which, when done perfectly, would improve the situation, it is important that in trying to implement it, you’re not screwing everything up. For instance: given the small number of people who can enforce the laws, it might make sense to use data to determine who is more likely to commit a crime, and to use those determinations. Except that it turns out that people are racist and have a lot of biases that are near-impossible to avoid! In attempting to implement a much better system, you could end up creating a new, harder to spot mess.

I support the idea of the Static-99-like test. My impression is that right now, parole hearings and probation decisions could be improved if we used a coin-toss method, and a test that had predictive power about future reoffenses and the severity of said offenses would be excellent. I’m wary of civil commitment, and especially wary of overconfidence given by feeling as though you’ve ‘proved’ that someone is dangerous.

Buzzfeed is telling a story, or a series of them, and I’ll admit, they’re stories that give me a visceral reaction. But the answers I’m drawing, and that I hope readers are drawing, is that this might be a failure of stats and trying.

A Fun or Maybe Terrifying Fact:

“In 1983, the United States Supreme Court decided the landmark case of Barefoot v. Estelle. In this case, the court decided behavioral scientists could competently predict that a specific criminal would engage in future violent behavior “with an acceptable degree of reliability.” What was that acceptable degree of reliability in 1983? About 33%.”



*In things which are shitty for male rape survivors, cis men will (usually) not be able to give physical evidence beyond 24 hours after a rape has occurred. If you are a cis man reading this, I would still suggest going in to a hospital for a kit because exceptions can occur.

Just Like Us

[Related: Empathy Isn’t Everything]

I have such mixed feelings about the “the homeless are just like us!” or “I am homeless but I have a law degree and speak 4 languages” (ie, this page) campaigns.

On the one hand, “people I normally ignore and resent are human like me*” is a fairly effective strategy. Expecting that the homeless guy you walk by has characteristics besides a ripe smell and modeling them as being like you—assigning them theory of mind—is probably going to mean you treat them better. You might be more inclined to think about them as a population in your city, consider them when forming opinions about public structures and funding, etc. If you see yourself, or people you value as being able to become homeless, you might be more concerned about social issues around homelessness.

But…the majority of people who are homeless are not like the picture at the link. They are not necessarily charming or have obvious high status components when you first meet them. And they still are homeless. I don’t think the system is suddenly fixed just because people who speak four languages are suddenly in houses. I anticipate that if people expect that behind everyone without a home is a story of brilliance or status with a plot arc that Hollywood would envy, they’ll be disappointed and uninterested when they find people who don’t have that. I worry that these campaigns are setting up a dichotomy between people who “deserve” to be homeless and people who are obviously homeless only because of a weird Princess Diaries style fluke.

A case I once worked on required filling out a lot of forms about smearing feces and urine with very little other known information. Whether or not that person also had a modelling career, they were going to be homeless and I would prefer they weren’t.

*see also: “women are our wives and sisters”

‘Foods Containing DNA’

There’s been a lot of hullaballoo about a GMO study recently. Specifically, this bit:

A recent survey by the Oklahoma State University Department of Agricultural Economics finds that over 80 percent of Americans support “mandatory labels on foods containing DNA,” about the same number as support mandatory labeling of GMO foods “produced with genetic engineering.”

And it seems like everyone on my Facebook feed has paused to giggle about those stupid people who are so invested in how important labeling GMOs is, while being scientifically illiterate. So, in the interest of being focused on the less-fun details of the study, I leave room for that here.

Screen Shot 2015-01-18 at 11.12.01 AM



I pulled the details of the survey here.

Every month (!!) the FooDS survey opens and closes, asking people about what makes them pick certain foods (between December and January, the value of taste and price decreased). There are also ad hoc questions, which is where we get snarky.

The question was “Do you support or oppose the following government policies?” 

There’s one framing, where you see that 82% of the 1,000 participants said yes, they did support labeling when GMOs were present. And 80% of respondents then went on to the next question and said that yes, they supported labeling products with DNA present.

And sure, it’s fun to giggle at, and wow, does it make the labeling lobby look silly. But in that graphic that everyone’s passing around, let’s look at the other components of the question.

Do you support or oppose the following government policies?

A tax on sugared sodas.
A ban on the sale of marijuana
A ban on the sale of food products made with trans fat
A ban on the sale of raw, unpasteurized milk.
Calorie limits for school lunches
Mandatory calorie labels on restaurant menus
Mandatory labels on foods containing DNA.
Mandatory labels on foods produced with genetic engineering
A requirement that school lunches must contain two servings of fruits and vegetables
Mandatory country of origin labels for meat.

The full graph looks like this, actually:

Screen Shot 2015-01-18 at 11.12.01 AM

You’ll notice that every other government policy is actually a debated governmental policy.*

Tricking study participants is a time-honored tradition of psych methodology, but you have to trick them effectively, and I’m not convinced this is anything but a gotcha question. If you ask people to support or oppose a governmental policy and then bury one non-policy question in a bunch of actual policies they might have heard of, you are not actually doing excellent science. You are creating a popular Facebook graphic.

In fact, if you look at the chart, the number of people who support what could be called “general labeling of food” tracks closely, hovering between 69% and 86%. It seems more plausible that people, reading quickly through an online questionaire either got as far as “government policy which supports labeling—” and marked their support. They’re the sort of people who support availability of information to the people! (In fact, we do know that identity signaling impacts answering on survey questions, and that it takes strong incentives to get people to check answers that they know are correct but that support the ‘other side.’)

And say that, as the Facebook graphics seem to imply, that the high support for labeling food with DNA comes from scientific illiteracy, well, I’m still not convinced. (We are keeping in mind that if, as the study claims, the sample was representative in age of the population, ~12% of the respondents went to high school before DNA was taught, yes?) In a study where you, the participant, are assuming you’re supporting or opposing a governmental policy, and you see an acronym you don’t recall, do you support labeling foods that contain it, or no? Keeping in mind of course, that it’s likely you supported labeling in the questions that came before, I’d expect you to indicate your support.

* Citation: tax on sugared soda, ban on raw milk, ban on marijuana, ban on trans fats, max calories in school lunches, school lunches and quantity of fruits and vegetables (link is summary of current requirements, which is currently up for debate, as discussed in previous link), country of origin labels for meat, labeling of GMOs, calories on restaurant menus.



Lesbians and the Parable of the Late Client

In social work classes (and also in my undergraduate clinical psychology/counseling psychology classes) there is the Parable of The Late Client.* It looks something like this:

There’s a therapist whose client is always late. Not five minutes late, but fifteen or twenty minutes late to the majority of sessions. The therapist finds this disrespectful and resistant, and in the grand tradition of Freud, decides to address with the client what this means about how the client views the therapist-client relationship. What is the client’s relationship to authority? What things do they believe about the therapeutic process? Does the client need to push away everyone who tries to help her? What does that mean? So the next time the client walks in late, the therapist brings it up.


…the bus is late. The client cannot afford to travel except by public transportation, and the bus is often late. She could catch the much earlier bus, but that would mean leaving work early each day she had therapy, and she can’t afford that either.

The first point of the parable is to pay attention to environmental factors** and to be extraordinarily careful in attributing fault to the client. Client appears to be ‘resistant’ to medication? Can they afford them? Client is late—do they have control over their ability to arrive?

The second point is about—at least, in the tellings I’ve heard—making sure to look around at what’s leading to the client’s concern…and what risks and strengths could shape the future of the client’s life. So, I spend a lot of time looking at case studies And it seems like there are two kinds of risk factors: the Intrinsically Risky and the Societally Risky.

Take lead.

Lead exposure is bad. It’s near impossible to bend and twist ‘has lead exposure’ into anything but a risk factor for future health. Exposure to toxic metals is not good for you, and will probably measurably impact your life. Also bad: domestic violence, iodine shortages, not being able to afford food, really, poverty of all kinds, and a bunch of other things. These are not putting you at risk purely because society thinks they’re bad and then treats you poorly, they are things that will harm you no matter how people feel you experiencing them.

Okay, but take being a lesbian.***

In theory, being a cis lesbian should be….probably a protective factor. Lesbians are less likely to contract HIV/AIDS, significantly less likely to have an unintentional pregnancy, less likely to be killed by a partner.

But right now, lesbians are more likely to be suicidal, more likely to self harm, and experience a lack of social support than their heterosexual counterparts, and all of that makes the reaction to ‘in case study, client is a lesbian’ be ‘risk factor’

Just taking a wild, speculative, swing at things, I’m going to guess this is not a feature of ‘wanting to sleep with women’ and possibly more a feature of discrimination, lack of acceptance, and ambiguous attribution issues. (In fact, this study, linked to previously, suggests that lesbian and bisexual women had slightly lower levels of depressive symptomatology than their heterosexual counterparts.)

And I’m sure there are other qualities that are like that: risks only because we make them out to be. I’d love to arrive a place where orientation isn’t one of those—it feels likely that that will happen within my lifetime. And then we’ll move on to the next Societally Risky, but not Intrinsically Risky feature.

*I don’t know if it’s been named as such, but it’s appeared in a variety of textbooks and classes across more than one university.
**the social work version of this told it as “psychotherapists don’t pay attention to environmental factors, but we do.
*** For the purposes of this discussion, we’re talking cis-lesbians. I know this is frustrating, but the research only looks at cis lesbians, and paragraphs got unwieldy when I did anything else.

Sparing the Rod, Reading the Research

[content warning: this whole article is about corporal punishment.]

Via the ever-charmingly named holygoddamnshitballs on tumblr. I saw this quoted piece from CNN.

The only person you can legally hit in the United States is a child.

Hit your partner, and you’ll be arrested for domestic violence. Hit another adult, and you’ll be arrested for assault. But hit a 4-year-old, and you can call yourself a “loving father”. That’s completely screwed up.

It should be against the law for a fully grown adult to slap, hit, spank, punch, switch, whoop, whip, paddle, kick or belt a defenseless child in the name of discipline. But it is legal, and new research in the Journal of Family Psychology suggests that the average 4-year-old is hit 936 times a year.

If study after study conclusively proves that hitting your kids doesn’t work as a disciplinary method, and worse, it has long-term damaging impact to their psychology and makes your kids more aggressive, why do we as a society allow it?

And while I find corporal punishment appalling, 936 times as an average amount of violence per year seemed astronomical to me. That’s very, very frequent, or a series of very long spankings. I assumed something had gone wrong. (In my defense, people on tumblr get research wrong a lot.)

So, I tracked down the study. I sort of assumed it would be some issue of inaccurate self-report by children being extrapolated out to how many times a child was hit a year. But….actually this was a pilot study using audio recorders. (It’s worth noting that it was a small study, averaging only 12.95 hours of recording per family.) This was going to be much more accurate data then I’d have expected from spanking research at all.

Even more confusingly, each time I poked about at methods and sample, I found things that made me wonder if a larger, longer sample wouldn’t find even more instances of corporal punishment (CP in the research). The only part that might indicate a sample skewed towards higher physical punishment was that it selected for parents of 2-to-5 year olds who admitted they yelled in anger at least twice a week. Perhaps someone with a two-to-five-year-old can chime in if this seems excessively frequent? I imagine that between the Terrible Twos and still having someone dependent on you for everything, this isn’t more than one standard deviation above average.

But at the same time, consider this: the parents had some inkling that this was research about child discipline. They were interviewed over the phone prior to the research study, as well as possibly given a series of questionnaires prior to beginning the data collection. (I’m hoping the researchers waited until after, but the wording did not specify.) This makes me expect lower amounts of CP, with parents assuming that the weird psych people giving them audio recorders weren’t going to be enthusiastic about parents hitting their children.* Additionally, spanking publicly is socially frowned upon (hence, spanking in bathrooms or promising a spanking at home) and I expect that adding in audio recorders made the home seem less private. Further, the mothers were more educated than the population at large, which makes me wonder if they’re a sample with higher impulse control than average.

So, all in all, I’m leaning towards this being a fair, but possibly skewed lower than reality sample. But, a quibble in terms of reporting: the number, which is reported at Raw Story and CNN as the average number of times a four year old child is hit per year isn’t an average at all. The median number of times a child in the study was hit per week was 18, which when multiplied out, to 52 weeks, is 936 per year. (I’m unclear if in this section of the research, the writers were describing the subset of data for parents using corporal punishment, or the numbers on all parents in the study. The answer to that question would further clarify.)

The researchers noted that most parents who did use CP also failed to follow proponents’ guidelines about how to use spanking/hitting. Approximately half the time the punishment was given while audibly angry (advised against) and the vast majority—more than 90%—of the punishments were for non-serious offenses, mainly violating social norms. Most advice for parents who are going to use spanking is to use it very selectively, and only for serious offenses. (Here’s a well-respected psychologist explaining use of ‘effective’ physical punishment, for comparison.)

But most interestingly to me, 73% of children were back to misbehaving after ten minutes. This isn’t the reason I object to hitting your kids, but you can even make a fairly strong argument that it doesn’t work as a behavioral modifier, violence aside. I don’t pretend to be an expert, but on the simple end, time-outs seem like they would do better behavior-modification, if only because they remove the child from their current trajectory in a way hitting a child does not. It’s much harder to go back to bothering your sister or making a mess after you’ve been left alone in a different place for five or ten minutes.

*I mostly assume this as a result of the meme that spanking-approval is a more common conservative standpoint than liberal one, and psychologists seem to generally be assumed to be liberal.

In Defense of Inkblots

Rorschach, Blot 10

[This post is partially written to be contrarian. Please do not abandon your nice therapist in favor of getting inkblotted.]

I was recently linked to this part of the Less Wrong Sequences, Schools Proliferating Without Evidence. It contains a bit of the attitude towards psychology I run up against in this community, and I want to talk about it.

“Remember Rorschach ink-blot tests? It’s such an appealing argument: the patient looks at the ink-blot and says what he sees, the psychotherapist interprets their psychological state based on this. There’ve been hundreds of experiments looking for some evidence that it actually works. Since you’re reading this, you can guess the answer is simply “No.” Yet the Rorschach is still in use. It’s just such a good story that psychotherapists just can’t bring themselves to believe the vast mounds of experimental evidence saying it doesn’t work—

—which tells you what sort of field we’re dealing with here.”

But but but…

If I could speak in defense of inkblots (words I never thought I’d say), well, they’re not helpful as predictive tests. We can all fairly well conclude that this is not how to determine risk for depression or somesuch. It’s probably a non-terrible way to determine if someone has a thought disorder like schizophrenia. Give them a thing to construct a story around, and see how they connect thoughts and ideas.

A card from the TAT
A card from the TAT

But projective tests like Rorschach inkblots and the Thematic Apperception Test (“here’s an ambiguous picture, what are the people in it thinking?”) CAN be helpful for getting clients who are uncomfortable in therapy to open up. Client John Doe isn’t sure about this whole “talking about his feelings” stuff, but get him started telling a story about an inkblot and he might relax into conversation more easily than if you ask him to talk about his latest self-harm experience or troubled childhood.

Handing a stranger your feelings, or telling them about trauma or mental experiences that you know are abnormal is hard. People can go months and years in therapy before mentioning that one time they were assaulted, or how yeah, they’re fine now but they were abused as a child. It takes a lot of work to force yourself to share something in a vulnerable place…and it’s all too easy to feel like you’ve let it go too long without sharing…so why not just wait until the next session? Or the session after that? How do you squeeze that information in between explaining how your previous week was and this week’s focus?

Inkblots are a performative way to introduce those discussions. You say how Inkblot #5 looks a bit like a cow chasing a butterfly and the therapist says oh, they remember you saying you grew up on a farm, how was that. And lo, five minutes later you’re talking about your childhood.

Look, inkblots are not going to magically inform the therapist about your chances of being sociopathic, or convey a ton about your personality. It’s a bit ridiculous that some therapists use them for this. On this, the Sequences and I agree. But, this usage is rare. The cards are expensive, at $125 for the ten plates, and all of the blots are now available online for anyone to read about, removing much of the mystery. But for as long as clients keep requesting them, and they keep being an avenue into comfortable discussion, I think inkblots will be hanging around.


Seeing the Ceiling

Say I want to settle the question of who’s more moral, atheists or the religious. I’ve got a lab and a grant and some spare time, (A girl can dream, can’t she?) so I set up the experiment that will solve the question once and for all.

Say I bring a bunch of religious people and a bunch of atheists into my lab. I’ve got two research assistants, one of whom plays as if they’re a participant too. So each time someone, religious or atheist comes into the lab, they sit down next to a stranger (my research confederate), and I call them both into the lab together. They sit down, sign lots and lots of consent forms, and do some silly tasks. None of these tasks matter, they’re just there to distract the participants from realizing what I’m actually paying attention to. At the end of doing all the questionnaires, the confederate stands up, and ‘accidentally’ lets and expensive looking watch fall. Without appearing to notice, they leave the room. I see the watch fall and in a distressed voice, ask the real participant if they don’t mind going to the next room and giving the ‘participant’ his watch back, since I’m so busy entering the data.

Now, the real participant could take the watch, walk into the next room, sprint past the fake participant, and leave with a nice new watch. They are, after all, holding onto a watch, I am looking at my computer and entering data, and the ‘participant’ appeared not to notice they’d dropped a watch.

They could.

But most of them don’t. It doesn’t matter if the participant is religious or an atheist, they tend to pick up the dropped watch, walk into the next room, and give it back to our confederate in this experiment.

Case closed, says I! Religious people and nonreligious people are equally moral! After all, they had equal rates of watch-stealing (that is, none at all).

Not so fast, says you. Practically nobody will steal a watch when you’re just sitting there watching* them! You’re there at your computer, the fake participant is in the next room, and you have their name from participant registration and the consent forms! This is a terrible measure of morality–you have to be fantastically immoral to fail this test! In fact, what you’ve done is determine that nonreligious people and atheists have equally low levels of Horribly Immoral and Brazen Watch Thieves.

In fact, says you (why you’ve come into my laboratory to shout at me, I’m unsure), atheists are more moral! If you made this study more complicated–made it easier to steal the watch without suffering consequences, fewer atheists would steal the watch. You’re wrong, says my religious lab assistant! Fewer religious people would steal the watch!

And all the while, I sit there in puzzlement, because I did this study, right? And I was testing for morality, right? Everybody agrees that stealing a watch is Bad and not stealing a watch is Good.** And my research assistants sit there in outrage, because OBVIOUSLY the [religious/atheists] would be more moral if you made the test harder!

This, dear readers, is the ceiling effect. My bar (or ceiling) for Moral Person is far far too low. Everyone returns the watches, but there’s no way to distinguish between the ones who give the watch back and then glare at puppies on the walk home and the ones who return the watch and wander over to the soup kitchen to volunteer.

Take another, real life example. Jacob says men are better at math than women. Elizabeth says this is clearly false. (Both of them are grievously oversimplifying ‘math’, but we’ll let them get away with it.)

Elizabeth points out that This Math Test (TMT, an official exam given to every high school student in our fictional universe) shows that men and women don’t differ significantly. Therefore, men and women are basically about the same in math ability.

Jacob disagrees. He claims that this test is too easy–that men and women do score the same on the TMT, but that doesn’t mean they have the same ablities–the test is too easy. After all, says he, standardized tests hardly examine the highest possible skill level–they cover basic material. He claims that Elizabeth is just demonstrating the ceiling effect–when you give people a really hard test, men outscore women. Jacob is actually right, but this gap is rapidly shrinking, and men also are overrepresented on the other end–with unusually low math performance.*** (Third section after the abstract, here)

And these ceiling debates play out in a number of parts of psychology research. (And in case you didn’t have enough architecture metaphors in your life, we also have the floor effect.) Here’s a more complicated version of the gender-ceiling issue. You can have sparkling methodology, a huge and representative sample base, but if you’re creating a test with a ceiling problem…you might get entirely unhelpful, or worse, misleading, answers.

This is the best and worst of psychology, for me. That there’s always just a little bit more than the research, always a little bit more to debate and argue and question. Maybe the study is too old, maybe you got a weird subset of the population. Maybe the rats are afraid of the gender you always use for research assistants. Maybe there’s a ceiling. Or a floor.

*sorry, this was unintentional.

**That one girl who stole the watch in order to sell it for medication to save her dying father was dismissed as an outlier. 

***Basically, men have higher variance of performance: they’re some of the best and worst performers. Women have a narrower bell curve of math performance.