“The mind is not designed to grasp the laws of probability, even though the laws rule the universe.”
― Steven Pinker
There is a famous experiment in statistics where a professor divides his class into two groups. In the first group, each student is told to toss a coin 200 times and record the sequence of heads or tails from each toss, while the second group is instructed to just write down what they believe to be a reasonable simulation of coin tosses. The resulting sequences are then given to the professor, who sorts them at a glance into two piles: the sequences he believes represent the fake distribution, and the ones that are the recordings of real coin tosses. These piles will be surprisingly accurate.
The statistician Tamas Varga originally conceptualized this experiment. Varga noted that almost all sequences of 200 coin tosses will contain at least one run of six straight heads or six straight tails, whereas the statistically uninitiated tend to avoid such long stretches, intuiting that they seem “too conspicuous.” Probability, statistics, and randomness in general are difficult topics to intuitively conceptualize. Michael Kouritzin, writing on detecting false random sequences, notes that “people usually have strong misconceptions of what random behaviour entails.” Or, as Cal State mathematician Mark Shilling put it, “Human beings make rather poor randomization devices.”
Another classic example problem in probability classes asks the student to imagine a jar filled with, say, 850 yellow and 150 blue marbles. If ten marbles are drawn at random, the student is asked, what is the probability that exactly four of them are blue? This problem is trickier than it appears at first glance ― it resists straightforward, elementary approaches to calculation and drives desperate math majors to make frantic late-night trips to Walmart in search of colored marbles. Fortunately, many giants have come before us and we can stand on their shoulders. The formula for this problem is known as the probability mass function, which is given by:
Here, n is the total number of “trials,” i.e., marbles pulled out of the jar, k is the number of “successes,” that is, the number of marbles we pull that match our “condition” (in this case, that are blue rather than yellow), and p is the absolute probability of our trial condition. Since there are 150 blue marbles in a jar of 1000 marbles, the probability of drawing a blue marble at random is 15%.
Plugging the appropriate values into the probability mass function, we find that the probability of choosing exactly four blue marbles when we draw ten total marbles is 4%, i.e., if we run this same experiment 100 times, we can expect to draw four blue marbles about four times. Expect is the key word here: we might actually get this result well more than four percent of the time, or we might get it less often. This is the catch with random events: repeat them often enough, and outcomes converge to their probabilities. But on any given draw just about anything can happen.
If you do this calculation for all possible outcomes ― from drawing no blue marbles to drawing only blue marbles ― you’ll get a range of outcomes known as a binomial distribution. Here is the binomial distribution of our hypothetical.
It’s clear to see that drawing one blue marble is the most likely outcome, and that makes intuitive sense since blue marbles make up only 15% of the total marbles (15% of ten is 1.5, so it makes sense we’re most likely to draw either one or two blue marbles). But remember, getting the exact, expected result will only occur some of the time. In this case, about 35% of the time we’ll draw a single blue marble. That also means that 65% of the time we won’t. In fact, we’re much more likely to draw either no blue marbles or exactly two blue marbles than we are to draw exactly one.
There is one last factor to note before diving in to the heart of this essay: at what point can we conclude that whoever’s drawing the marbles is cheating? When can we conclude that the guy pulling the marbles out of the jar is specifically trying to grab the blue ones?
This is harder to tease out than it may sound. You will need to repeat this experiment 1.3 billion times before you should ever pull exactly ten blue marbles out of the jar. But in the biological sciences, we regularly ascribe to chance events that have less than one in 1.3 billion odds of occurring. William Dembski’s Universal Probability Bound is one in 10ˆ150, or about the number of discrete events that have occurred in the history of the universe. One could also try to utilize the Chi-Squared Test to see to what degree the actual distribution deviates from the expected distribution. Or one could find or even invent an entirely different approach to this problem.
For my purposes in the rest of this essay, I’m relying on an intuitive approach: if the probability of an event is within two and a half standard deviations† from the mean (or is above two percent probability), I’m willing to accept it as random. If that threshold seems inappropriate to you, scale it however you’d like. Ultimately, the results are so clearly skewed into two unmistakable categories that this distinction will scarcely matter.
Even as Black Lives Matter has exploded onto the scene as a national socio-political movement in the aftermath of high-profile police shootings of black men, precise data on these encounters has been virtually nonexistent. The FBI, the organization we rightly expect to keep track of such information, has long relied on police departments to self-report this data, a task which carries clear incentives against accurate reporting. Lacking reliable data, it has been almost impossible to create an objective understanding of both the scope and nature of this problem: Black Lives Matter conceptualizes this problem as a “war against black people” whereas Blue Lives Matter, the reflexively-named police advocacy group, maintains that this notion is tantamount to slander.
In an effort to address this problem, the Washington Post has been collecting data about people shot and killed by police in the United States. Their raw data includes the age and race of the victim, what city and state the shooting took place, whether the victim was armed or showed signs of mental illness, and whether it could be determined if the victim was attacking or fleeing. To the best of my knowledge, this is the first attempt to create a comprehensive collection of such information. Unfortunately, it only extends back to January of 2015.
Large data sets about controversial topics are like catnip to me, so I dug in and started to organize and dissect the information. My goal was to determine which states, if any, showed disproportionate outcomes in killing black people. My method was to organize these shootings by state and compare actual outcomes to the range of expected outcomes based on local population demographics. All relevant demographic information, from national to state to county rates, were pulled from census.gov.
One small, unreasonable request before we dive in to the analysis. This topic is a political minefield. Not only have most people made up their mind about what the data will show, our feet are so firmly set in ideological cement that it’s difficult to conceive of anything that would enable us to stretch our necks far enough to gain a fresh perspective. As I shared with people my ideas for this project and the preliminary results, I was both dismayed and amazed at how many people told me with blind confidence what my analysis was showing ― often directly contradicting some peculiar result I had just presented to them. Some even suggested I not continue the analysis at all. But the day we are afraid of the truth is the day we can no longer claim to live in an enlightened society. I didn’t go into this analysis trying to prove any particular point or confirm anyone’s point of view. Rather, I wanted to be able to speak objectively about a topic that, to this point, has seemed to be anything but. So please do your best to leave your preconceptions behind, at least for a moment, and judge this research on its own strengths and weaknesses, rather than to what extent it conforms to what you already think. There is plenty in here to make everyone uncomfortable.
Visual aids are helpful. This is a map of all fatal police shootings since January 2015:
(Green pins represent white victims, red pins represent black victims, blue pins represent Hispanic victims, tan pins represent Native American victims, and pink pins represent Asian victims.)
Here is a map of only black victims.
(It should be noted that these pins are not dropped at precise longitude and latitude of the exact shooting site, but rather are just pinned to the city where each occurred.)
One might guess that the states with no black victims have a very small black population to begin with. That guess would be correct: .7% of the population of Montana is black, for example, while North and South Dakota each register 1.1%. By contrast, black people make up 37.3% of the population of Mississippi. Of the 1583 victims with identified race, 27.3% of the victims were black. This is more than twice the rate of the national population, 13.3%. As we’ve already seen, however, population demographics shift from state to state, and even city to city. To get a better sense of how the demographics of police shootings relate to the demographics of the local population, we have to zoom in.
We can conceptualize this problem in the same way as the jar of marbles problem. In a mixed population, were black people “selected” at a rate that is not explained by chance? Let’s look at a number of states and see how each fare.
In Arizona (n=80, k=3, p=.042, where n is the total number of shootings, k is the number of black victims, and p is the percent of black people among the state population), there have been 80 police shootings since January 2015. In three of those, the victim was a black person. Here is the binomial distribution of the related probabilities:
Given Arizona’s 4.2% black population, exactly three black victims is the most likely outcome. We see similar results in a number of other states, such as Georgia (n=41, k=16, p=.314):
Mississippi (n=15, k=4, p=.373) yielded fewer black victims than expected, though this is also within predictable variance:
And for each state where one might find fewer black victims than expected, there is a state where you can expect to find slightly more. In Alabama, for instance (n=35, k=12, p=.264), there were 12 black victims, or three more than expected:
Still, this is not small enough to reject a randomness hypothesis: with the selection size and the black population, one out of five “trials” would produce 12 or more black victims. Alabama produces slightly more black victims than we would expect, but it’s within range.
All in all, 26 states fell within expected statistical range, and 33 states were +/- 2 from the roll of a die (that is, if we sum the probabilities of k-2, k-1, k, k+1, and k+2, we get 16% or better). If there is indeed a consistent, systemic pattern of anti-black bias that bears out in police shootings, the impetus is on the proponents of that idea to explain why, more often than not, these events fall within the same distributions as the roll of a fair die or the flipping of a coin.
But if two thirds of the states had results that were within range of expected outcomes, that means one third of the states were not. And when those states missed the mark, they didn’t fall just barely out of the expected range. Rather, they were so far beyond reasonable expectation that my considerable talents of hyperbole and exaggeration fail me entirely. Look at Florida (n=103, k=40, p=.159):
The probability of this outcome happening at random is so low it doesn’t even register on the graph. The absolute odds? 1 in 1.6 million.
Illinois (n=41, k=23, p=.149), even with fewer total police shootings, somehow manages to fare worse. The odds of returning The Prairie State’s outcomes by chance are about one in 1.1 billion.
The worst state of them all? California (n=290, k=48, p=.067). California has the dubious distinction of killing 11% of all black people killed by police in America, but 45% of all Asians, and 41% of all Hispanics as well.
What’s going on in California?
This is a map of all black victims of police shootings in California since January 2015. Notice anything about the geographic distribution?
The black victims of police shootings in California are clustered around Los Angeles and San Francisco/Oakland. Breaking it down by the black populations of the respective counties, we find that rather than being a statewide phenomenon, the disproportionate shootings of black people is isolated in three counties: Alameda, Los Angeles County, and Santa Clara county. In turn, this implies that we could eradicate the insidious pathogenesis of racism from just three counties, we could bring the most problematic state back to a reasonable range.
|County||Black||Non-Black||Grand Total||Black Population||% Black Victims||BP|
|Contra Costa County||1||1||2||0.096||0.500||17.36%|
|El Dorado County||1||0||1||0.01||1.000||1.00%|
|Los Angeles County||18||48||66||0.091||0.273||0.00%|
|San Bernardino County||2||21||23||0.095||0.087||28.07%|
|San Diego County||1||12||13||0.056||0.077||36.46%|
|San Francisco County||2||6||8||0.057||0.250||6.40%|
|San Joaquin County||2||6||8||0.082||0.250||11.27%|
|Santa Clara County||4||10||14||0.029||0.286||0.05%|
Please note that to this point that I haven’t discussed other factors like criminality, gang membership, or socioeconomics. Each of those factors ― and many more ― could play a role in further analysis. Someone so inclined could zoom in further on Los Angeles county and evaluate which of these occurrences happened in a place like Compton (32.9% black) compared to the much more white and Asian cities in the San Gabriel Valley. (A concern here is that smaller geographic regions have fewer cases to examine, making the results less statistically significant. This, in turn, makes it more difficult to draw reasonable conclusions.)
As Karl Bialik notes, “The many factors that might contribute to the racial disparity in police killings are hard to disentangle…. identifying how much each factor contributes to the burden of police violence borne by black Americans isn’t possible based on the data available.” I agree with this assessment, but now we have a better idea of where to look for answers.
There is an additional question that I have not, to this point, adequately explored. While in most states, randomness adequately explains the outcomes of the sum of all fatal police shootings, randomness fails to account for the distribution of the most insidious form of police shootings: that is, when the victim is unarmed, not attacking anyone, and not attempting to flee. The data set notes 42 of these shootings where the race is noted. Of those 42, 16 were of black people.
This is certainly the most troublesome result to come from my analysis. Without knowing how many such encounters occur ― and what contexts produces them ― it’s impossible to draw strong conclusions, but the data here is very suggestive. I am reminded of the Implicit Association Tests which show that most people quickly associate black people with weapons (white people are more quickly associated with benign objects like spatulas). Though some research suggests that implicit associations don’t translate into real-life results, such sentiments become hard to believe when the category of unarmed, passive, and yielding victims produces such a strong disparity.
As I noted above, thirty-three states fall within reasonable range to infer random distribution based on population demographics. That is to say, in 33 states, there is not enough evidence to reject the null hypothesis that police shootings are random with respect to the race of the victim. This analysis ignores possible amplifiers like criminality and socioeconomic factors that could shift expected outcomes in either direction.
Seventeen states ― Ohio, California, Illinois, Florida, Pennsylvania, Maryland, Virginia, Oklahoma, New York, Louisiana, Indiana, Missouri, Michigan, Texas, Wisconsin, Minnesota, and New Jersey (in descending order) ― fall sufficiently outside of the random binomial distribution that we can infer that there are external factors at work. Additionally, the plurality of unarmed, passive, and yielding victims are black.
There are some additional caveats to note. The shortcomings of this data set are many. First of all, it only extends backwards 21 months. There can be an enormous amount of natural variance that we cannot detect for lack of better data. Likewise, this data set only records fatal shootings. What are the demographics of non-fatal shootings? Were black victims more or less likely to die? (Even that question could create two competing answers: if black victims were less likely to die, that might imply police were quicker to fire their weapons, escalating situations that ought not require lethal force; if they are more likely to die, it might imply police fire more bullets than average at black victims.) Furthermore, this in no way records an average or even typical encounter with police which likely plays out differently for black people than white people, and plays out another way still for urban people compared to rural. There are many more factors that have not been explored. All of these things could provide additional shading and context in the analysis of this issue.
But for the question, focused solely on the data that was directly examined, is there clear evidence that anti-black racism is a major, causal factor in fatal police shootings? For 33 states, that answer is no. For the remaining 17, the data suggests that the shootings cannot be explained as random phenomena. The data further suggests that in those 17 states, non-random shootings are concentrated in key urban areas, and reform efforts would be best applied to those counties rather than to states (or the nation) as a whole. How additional factors might skew these results is, by this analysis, an open question.
† Per Wikipedia: “In statistics, the standard deviation (SD, also represented by the Greek letter sigma σ or the Latin letter s) is a measure that is used to quantify the amount of variation or dispersion of a set of data values. A low standard deviation indicates that the data points tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values.” Approximately 98% of observations will fall with two and a half standard deviations from the mean.