How Anonymous Are Paper Ballots?

A new research report brings into question the degree of anonymity in paper ballots:  New Research Result: Bubble Forms Not So Anonymous <read overview> <report>

From the overview:

Today, Joe Calandrino, Ed Felten and I are releasing a new result regarding the anonymity of fill-in-the-bubble forms. These forms, popular for their use with standardized tests, require respondents to select answer choices by filling in a corresponding bubble. Contradicting a widespread implicit assumption, we show that individuals create distinctive marks on these forms, allowing use of the marks as a biometric. Using a sample of 92 surveys, we show that an individual’s markings enable unique re-identification within the sample set more than half of the time. The potential impact of this work is as diverse as use of the forms themselves, ranging from cheating detection on standardized tests to identifying the individuals behind “anonymous” surveys or election ballots.

The data is based on a sample of 92 ballots filled out at the same time, on the same form, using the same writing instrument:

To test the limits of our analysis approach, we obtained a set of 92 surveys and extracted 20 bubbles from each of those surveys. We set aside 8 bubbles per survey to test our identification accuracy and trained our model on the remaining 12 bubbles per survey…

Additional testing—particularly using forms completed at different times—is necessary to assess the real-world impact of this work. Nevertheless, the strength of these preliminary results suggests both positive and negative implications depending on the application. For standardized tests, the potential impact is largely positive. Imagine that a student takes a standardized test, performs poorly, and pays someone to repeat the test on his behalf. Comparing the bubble marks on both answer sheets could provide evidence of such cheating. A similar approach could detect third-party modification of certain answers on a single test.

The possible impact on elections using optical scan ballots is more mixed. One positive use is to detect ballot box stuffing—our methods could help identify whether someone replaced a subset of the legitimate ballots with a set of fraudulent ballots completed by herself. On the other hand, our approach could help an adversary with access to the physical ballots or scans of them to undermine ballot secrecy. Suppose an unscrupulous employer uses a bubble form employment application. That employer could test the markings against ballots from an employee’s jurisdiction to locate the employee’s ballot. This threat is more realistic in jurisdictions that release scans of ballots.

The finding raises potential concerns for states and election jurisdictions considering the merits of either making ballots available for public review or releasing them under freedom of information requests. We find reasons for concern with ballot anonymity and reasons for skepticism that the result will hold under additional research. Before concluding a number of serious implications, it is critical to do longitudinal studies, as recommended in the report, and to study several other challenging dimensions.  Considerations and directions include:

  • On a small sample, a 51% chance of the most likely individual being correctly identified may not be all that useful, not knowing which 51% are the correct identifications.
  • How does the probability of the detection of correct correspondent vary with the number of voters? 100, 200, 400, 800 etc.
  • Are there classes of voters that clump and are hard to distinguish and others that are fairly unique?  Is this similar to blood type classifications, with more types, but much less distinct classes? Or is it similar to DNA with many variations, but again nowhere near as distinct?
  • From looking at a lot of ballots in audits and recanvasses it is clear to me that people do make consistent marks in bubbles on a single ballot, with a single instrument, on a single day, however:
    • Do voters make the same marks over time and in different contexts?
    • To what extent do single voters or collective groups of voters fill in bubbles the same way from election to election?  I suspect it varies from person to person as well,. For me I suspect I am very inconsistent from election to election, except that I do tend to fill in complete bubbles – which would place me in a large class of voters difficult to distinguish individually.
    • Filling out an SAT or survey can be quite different than voting. In an SAT we think more and in different ways, under much more stress. In a survey we may hardly think or care at all.
  • In Connecticut we use felt tip pens in polling places. To what extent does such a thicker instrument make the classification more or less accurate? I would suspect the thicker the instrument the more difficult the classification in general.
  • In longitudinal studies (using forms filled out on different occasions, days, weeks, months, or years apart): How much more difficult is identification when the instrument varies? e.g. Felt tip pens can be drier or wetter, vary in thickness based on use.  Pencils can vary by sharpness, vary by manufacturer. Pens and pencil marks may vary in the way the instrument is be able to be gripped or is gripped on a particular occasion.
  • What good are past examples from one type of test/ballot type to another?  I suspect difficulties based on bubble size, bubble shape, rectangles, or connecting lines – even shape of ballot/test form, layout, lighting, sitting vs. standing etc.
  • For example, let us say an employer, union, government entity, criminal enterprise, or church wanted to use this method to test votes of individual employees/members, without their knowledge. What accuracy/confidence could they expect with samples from presumably a small subset of voters in a precinct when attempting to identify their ballots in a sea of ballots filled out by other voters?

More research is necessary before we can conclude the degree to which bubble analysis can be used to identify voters.  Even so there would be trade-offs between public the positive value and risks of public availability of ballots for review. There are mechanisms of election transparency short of public disclosure of complete paper ballots  – methods which could reduce risks but at some risks to credibility and transparency. Of course we could eliminate paper ballots all together and take the greater risks of errors, skulduggery, and lack of confidence of electronic voting like we have seen in recently in New Jersey, last year in Kentucky and several years ago in Sarasota.


