What in the World is Interpreted Code and What’s Wrong With It Anyway?

Earlier this month Leon County, Florida Supervisor of Elections Ion Sancho, invited computer experts to demonstrate the existence of a security flaw in Diebold optical scanners described in a report published on July 4, 2005.  The test was repeated in December in order to refute specific denials by Diebold. In statements to two different election officials Diebold claimed it was not possible to alter the outcome of an election in such a way that the perpetrator would not need passwords and the tampering would not be noticed during normal canvassing procedures.  Sancho set up the test environment on December 13, 2005 to prove these claims false.  The outside experts had no access to the optical scanner and the complete canvassing procedure was followed for 8 test ballots.  The result was that while the 8 paper ballots had a vote tally of 2 Yes and 6 No, all of the official reports - from the optical scanner on through to the publication of county results - showed an outcome of 7 Yes and 1 No.

Because of this design defect, which exists on all Diebold touchscreen machines (DRE) and optical scanners, the Secretary of State of California has demanded that the Diebold software be re-examined by the Independent Testing Authority (ITA), who originally certified that the systems were in compliance with the 2002 Federal Voluntary Voting System Guidelines.

This breach of security exploits an inherently insecure feature of the Diebold optical scanners and touch screens known as interpreted code. Below is a simplified diagram of a voting machine (or view in separate window).  Diebold equipment has several hardware components (printer, touch screen, smart card reader, buttons, etc).  These are represented by light blue boxes.  There is also memory, which is represented with dark blue boxes.  Some of the memory is read-only (ROM) and contains firmware.  Part of the programming in ROM (firmware) is an interpreter for the Diebold-specific language AccuBasic.  Also in the firmware is all the programming needed to interact with the hardware.  (For simplicity, interactions with the touch screen elements of the DRE are not shown.) 

At the beginning of Election Day, the voting machine (DRE or optical scanner) must print a Zero Total Report, which is signed by poll workers before the first vote is cast. The report is the official record that the “electronic ballot box” has not been stuffed before the election. Unfortunately, the programming in the ROM does not know how to print the Zero Total Report.  This is by design.  The firmware of the voting machine is “burned” into the ROM at the factory and is mass produced.  If the ROM did contain the details of how to print a Zero Total Report, there would need to be at least 51 versions of firmware (one version of ROM for each state and DC).

This is where the memory card and its interpreted code come in.

Among other things, the memory card contains 3 elements: the ballot definition (names of candidates, ballot position, etc.), the vote tallies (e.g. number of votes for John Doe for Senate) and a file of compiled AccuBasic tokens.  This last item is the interpreted code, which is the fundamental problem of the design.  

The firmware does not know the details of how to print the Zero Total Report. But it does know that the code to do this is on the memory card in a file with an extension of ABO.  The firmware also knows the code in the ABO file is stored under the name ELECTION_ZERO_REPORT.

Let’s follow along as the Zero Total Report is printed.  The poll workers push buttons on the front panel of the optical scanner or insert a supervisor’s smart card into the DRE.  This tells the voting machine to print the Zero Total Report (shown on the diagram as arrow number 1).  The firmware in turn yields control to the code contained under the name ELECTION_ZERO_REPORT in the ABO file of the memory card (represented in the diagram as arrow number 2)

The AccuBasic tokens are not human readable nor machine executable, but are halfway between. What exactly are these tokens?  Tokens are to programs what shorthand is to written prose.  The command PRINT is represented as a single token, which uses 1 byte of memory instead of the 5 bytes which the 5 letters of PRINT would occupy.  So if the voting machine’s central processing unit (CPU) cannot execute a token how does the PRINT token get anything to the printer?  The answer is through the interpreter.  

The interpreter translates the shorthand of the token into all the messy details needed by the CPU and the printer in order to print the phrase: “John Doe: 0” on the Zero Total Report (represented in the diagram by the 3 arrows all labeled 4).  There are 3 such arrows because the stream of AccuBasic tokens contained in ELECTION_ZERO_REPORT interacts with both the ballot definition and the vote tallies.  Unfortunately, the interaction with the vote tallies is unrestricted and the AccuBasic tokens contained in ELECTION_ZERO_REPORT can print anything on the paper tape report to be signed by the pollworkers. Finally, the formatted names and numbers are printed for the poll workers to sign  (represented in the diagram as arrow number 5).

The security test performed in Leon County demonstrated that the stream of AccuBasic tokens contained in ELECTION_ZERO_REPORT can misreport the vote tallies on the memory card. By using a $300 card reader (or any PC for the PCMCIA cards), the vote tallies can be pre-loaded so the votes in the Yes column equal +5 and the votes in the No column to equal -5. The Zero Total Report then lies by printing the memory contents are zero.  

The voting process began with a database containing:
Yes      No
+5         -5

As the 8 ballots in Leon County were scanned (or entered on the touch screen), the normal operation of the machine increments the vote tallies in the 2 database entries: Yes and No.  This normal operation added 6 votes to the -5 initially stored as the tally for the No column for a final result of 1 No and added 2 to the +5 initially stored as the tally for the Yes column for a final result of 7 Yes.  

As the voting process continues, the database contains:
Yes      No
6          4

And ends finally with:
Yes      No
7          1

Since there were 8 ballots cast, by 8 voters, a result of 7 to 1 in favor of the proposition would not call attention to the alteration – nothing appears amiss even though the voters actually cast ballots totaling 2 YES and 6 NO.   Further, the initial alteration of the memory has been obliterated by the normal operation of the voting machinery because the database tabulates the votes incrementally rather than showing a single record for each vote.  In short, the -5 starting point becomes -4 not individual records of -5 and +1.  This is similar to an odometer wheel as opposed to summarizing several, separate bookkeeping entries.  

The contents of the data file are then uploaded to a central tabulator (not shown).  Reports from the central tabulator (e.g. the county summary or precinct details) will show a reasonable result of 7 for and 1 against, because that process prints the contents of the database, which has already been altered at the DRE or Optical Scan polling station.

It is because of these kinds of issues that interpreted code is expressly prohibited by the 1990 and 2002 Voluntary Voting System Guidelines.  It is simply too difficult to secure the code if it is interpreted at the time of execution.  Since the code is interpreted at execution time and not before, code inspection and customary Logic and Accuracy testing would not detect manipulations such as the one above.

Even if the card, which was tested on Monday, was legitimate, the ability to swap the card out for a corrupted card by Tuesday morning means any prior testing was a wasted effort.  In that instance, the code tested before Election Day is not the code which runs on Election Day.  In a similar way, interpreted code makes it difficult to determine on Wednesday what code actually was executed on Tuesday; even if the altered memory card is available.  A detailed examination of the stream of AccuBasic tokens would be needed and even then you could not be certain exactly what was executed previously.

Where do we go from here?  First, all voting machinery using such prohibited interpreted code must be recalled. Then it must be determined if Diebold is the only vendor with this design defect.  Since the NASED/EAC system of independent test authority labs failed to note this defect in the Diebold equipment, it is likely a similar defect would go "unreported" if present in machinery from other vendors. And finally, the testing and certification process that allowed this unacceptable violation of security standards to be overlooked must be dramatically improved to protect the integrity of our election process.