Developers are regularly faced with the task of checking user input for accuracy. A considerable number of standardized data formats now exist that make such validation tasks easy to master. The International Standard Book Number, or ISBN for short, is one such data format. ISBN comes in two versions: a ten-digit and a 13-digit version. From 1970 to 2006, the ten-digit version of the ISBN was used (ISBN-10), which was replaced by the 13-digit version (ISBN-13) in January 2007. Nowadays, it is common practice for many publishers to provide both versions of the ISBN for titles. It is common knowledge that books can be uniquely identified using this number. This, of course, also means that these numbers are unique. No two different books have the same ISBN (Figure 1).

Validating credit card numbers for online payments

The theoretical background for determining whether a sequence of numbers is correct comes from coding theory. Therefore, if you would like to delve deeper into the mathematical background of error-detecting and error-correcting codes, we recommend the book “Coding Theory” by Ralph Hardo Schulz [1]. It teaches, for example, how error correction works on compact disks (CDs). But don’t worry, we’ll reduce the necessary mathematics to a minimum in this short workshop.

The ISBN is an error-detecting code. Therefore, we can’t automatically correct a detected error. We only know that something is wrong, but we don’t know the specific error. So let’s get a little closer to the matter.

Why exactly 13 digits were agreed upon for ISBN-13 remains speculation. At least the developers weren’t influenced by any superstition. The big secret behind validation is the determination of the residual classes [2]. The algorithms for ISBN-10 and ISBN-13 are quite similar. So let’s start with the older standard, ISBN-10, which is calculated as follows:

1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8 + 9x9 + 10x10 = k modulo 11

Don’t worry, you don’t have to be a SpaceX rocket engineer to understand the formula above. We’ll lift the veil of confusion with a small example for ISBN 3836278340. This results in the following calculation:

(1*3) + (2*8) + (3*3) + (4*6) + (5*2) + (6*7) + (7*8) + (8*3) + (9*4) + (10*0) = 220
220 modulo 11 = 0

The last digit of the ISBN is the check digit. In the example given, this is 0. To obtain this check digit, we multiply each digit by its value. This means that the fourth position is a 6, so we calculate 4 * 6. We repeat this for all positions and add the individual results together. This gives us the amount 220. The 220 is divided by 11 using the remainder operation modulo 11. Since 11 fits exactly 20 times into 220, there is a remainder of zero. The result of 220 modulo 11 is 0 and matches the check digit, which tells us that we have a valid ISBN-10.

However, there is one special feature to note. Sometimes the last digit of the ISBN ends with X. In this case, the X must be replaced with 10.

As you can see, the algorithm is very simple and can easily be implemented using a simple for loop.

boolean success = false;
int[] isbn;
int sum = 0;

for(i=0; i<10; i++) {
    sum += i*isbn[i];
}

if(sum%11 == 0) {
    success = true;
}

To keep the algorithm as simple as possible, each digit of the ISBN-10 number is stored in an integer array. Based on this preparation, it is only necessary to iterate through the array. If the sum check using the modulo 11 then returns 0, everything is fine.

To properly test the function, two test cases are required. The first test checks whether an ISBN is correctly recognized. The second test checks for so-called false positives. This provokes an expected error with an incorrect ISBN. This can be quickly accomplished by changing any digit of a valid ISBN.

Our ISBN-10 validator still has one minor flaw. Digit sequences that are shorter or longer than 10, i.e., do not conform to the expected format, could be rejected beforehand. The reason for this can be seen in the example: The last digit of the ISBN-10 is a 0 – thus, the character result is also 0. If the last digit is forgotten and a check for the correct format isn’t performed, the error won’t be detected. Something that has no effect on the algorithm, but is very helpful as feedback for user input, is to gray out the input field and disable the submit button until the correct ISBN format has been entered.

The algorithm for ISBN-13 is similarly simple.

x₁ + 3x₂ + x₃ + 3x₄ + x₅ + 3x₆ + x₇ + 3x₈ + x₉ + 3x₁₀ + x₁₁ + 3x₁₂ + x₁₃ = k modulo 10

As with ISBN-10, xn represents the numerical value at the corresponding position in the ISBN-13. Here, too, the partial results are summed and divided by a modulo. The main difference is that only the even-numbered positions—positions 2, 4, 6, 8, 10, and 12—are multiplied by 3, and the result is then divided by modulo 10. As an example, we calculate the ISBN-13: 9783836278348.

9 + (3*7) + 8 + (3*3) + 8 + (3*3) + 6 + (3*2) + 7 + (3*8) + 3 + (3*4) + 8 = 130
130 modulo 10 = 0

The algorithm can also be implemented for the ISBN-13 in a simple for loop.

boolean success = false;
int[] isbn;
int sum = 0;

for(i=0; i<13; i++) {
    if(i%2 == 0) {
        sum += 3*isbn[i];
    } else {
        sum += isbn[i];
    }
}

if(sum%10 == 0) {
    success = true;
}

The two code examples for ISBN-10 and ISBN-13 differ primarily in the if condition. The expression i % 2 calculates the modulo value 2 for the respective iteration. If the result is 0 at this point, it means it is an even number. The corresponding value must then be multiplied by 3.

This shows how useful the modulo operation % can be for programming. To keep the implementation as compact as possible, the so-called triple operator can be used instead of the if-else condition. The expression sum += (i%2) ? isbn[i] : 3 * isbn[3] is much more compact, but also more difficult to understand.

Below you will find a fully implemented class for checking the ISBN in the programming languages: Java, PHP, and C#.

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

While the solutions presented in the examples all share the same core approach, they differ in more than just syntactical details. The Java version, for example, offers a more comprehensive variant that distinguishes more generically between ISBN-10 and ISBN-13. This demonstrates that there are many ways to Rome. It also aims to show less experienced developers different approaches and encourage them to make their own adaptations. To simplify understanding, the source code has been enriched with comments. PHP, as an untyped language, eliminates the need to convert strings to numbers. Instead, a RegEx function is used to ensure that the entered characters are type-safe.

Lessons Learned

As you can see, verifying whether an ISBN is correct isn’t rocket science. The topic of validating user input is, of course, much broader. Other examples include credit card numbers. But regular expressions also provide valuable services in this context.

Ressourcen

[1] Ralph-Hardo Schulz, Codierungstheorie: Eine Einführung, 2003, ISBN 978-3-528-16419-5
[2] Concept of modular aritmetic on Wikipedia, https://en.wikipedia.org/wiki/Modular_arithmetic