BugChaser – The limits of test coverage

The paradigms now established in software engineering, such as Test-Driven Development (TDD) and Behavior-Driven Development (BDD), along with correspondingly easy-to-use tools, have opened up a new, pragmatic perspective on the topic of software testing. Automated tests are a crucial factor in commercial software projects. Therefore, in this context, a successful testing strategy is one in which test execution proceeds without human intervention.

Test automation forms the basis for achieving stability and reducing risk in critical tasks. Such critical tasks include, in particular, refactoring, maintenance, and bug fixes. All these activities share one common goal: preventing new errors from creeping into the code.

In his 1972 article “The Humble Programmer,” Edsger W. Dijkstra stated the following:

„Program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence.“

Therefore, simply automating test execution is not sufficient to ensure that changes to the codebase do not have unintended effects on existing functionality. For this reason, the quality of the test cases must be evaluated. Proven tools already exist for this purpose.

Before we delve deeper into the topic, let’s first consider what automated testing actually means. This question is quite easy to answer. Almost every programming language has a corresponding unit test framework. Unit tests call a method with various parameters and compare the return value with an expected value. If both values ​​match, the test is considered passed. Additionally, it can also be checked whether an exception was thrown.

If a method has no return value or does not throw an error, it cannot be tested directly. Methods marked as private, or inner classes, are also not easily testable, as they cannot be called directly. These must be tested indirectly through public methods that call the ‘hidden’ methods.

When dealing with methods marked as private, it is not an option to access and test the functionality they represent using techniques such as the Reflection API. We must be aware that such methods are often also used to encapsulate code fragments to avoid duplication.

public boolean method() {
    boolean success = false;
    List collector = new ArryList();
    collector.add(1);
    collector.add(2);
    collector.add(3);
    
    sortAsc(collector);
    if(collector.getFirst().equals(1)) {
        success = true;
    }
    return success;
}

private void sortAsc(List collection) {
    collection.sort( 
            (a, b) -> { 
                return -1 * a.compareTo(b); 
            });
}

Therefore, to effectively write automated tests, it is necessary to follow a certain coding style. The preceding Listing 1 simply demonstrates what testable code can look like.

Since developers also write the corresponding component tests for their own implementations, the problem of difficult-to-test code is largely eliminated in projects that follow a test-driven approach. The motivation to test now lies with the developer, as this paradigm allows them to determine whether their implementation behaves as intended. However, we must ask ourselves: Is that all we need to do to develop good and stable software?

As we might expect with such questions, the answer is no. An essential tool for evaluating the quality of tests is achieving the highest possible test coverage. A ​​distinction is made between branch and line coverage. To illustrate the difference more clearly, let’s briefly look at the pseudocode in Listing 2.

if( Expression-A OR Expression-B ) {
print(‘allow‘);
} else {
print(‘decline‘);
}

Our goal is to execute every line of code if possible. To achieve this, we already need two separate test cases: one for entering the IF branch and one for entering the ELSE branch. However, to achieve 100% branch coverage, we must cover all variations of the IF branch. In this example, that means one test that makes Expression A true, and another test that makes Expression B true. This results in a total of three different test cases.

The screenshot from the TP-CORE project shows what such test coverage can look like in ‘real-world’ projects.

Of course, this example is very simple, and in real life, there are often constructs where, despite all efforts, it’s impossible to reach all lines or branches. Exceptions from third-party libraries that need to be caught but cannot be triggered under normal circumstances are a typical example.

For this reason, while we strive to achieve the highest possible test coverage and naturally aim for 100%, there are many cases where this is not feasible. However, a test coverage of 90% is quite achievable. The industry standard for commercial projects is 85% test coverage. Based on these observations, we can say that test coverage correlates with the testability of an application. This means that test coverage is a suitable measure of testability.

However, it must also be acknowledged that the test coverage metric has its limitations. Regular expressions and annotations for data validation are just a few simple examples of where test coverage alone is not a sufficient indicator of quality.

Without going too much into the implementation details, let’s imagine we had to write a regular expression to validate input against a correct 24-hour time format. If we don’t keep the correct interval in mind, our regular expression might be incorrect. The correct interval for the 24-hour format is 00:00 – 23:59. Examples of invalid values ​​are 24:00 or 23:60. If we are unaware of this fact, errors can remain hidden in our application despite test cases, only to surface and cause problems when the application is actually used.

„… In a few cases, participants were unable to think of alternative solutions …“

The question here was whether error correction always represents the optimal solution. Beyond that, it would be necessary to clarify what constitutes an optimal solution in commercial software development projects. The statement that there are cases in which developers only know or understand one way of doing things is very illustrative. This is also reflected in our example of regular expressions (RegEx). Software development is a thought process that cannot be accelerated. Our thinking is determined by our imagination, which in turn is influenced by our experience.

This already shows us another example of sources of errors in test cases. A classic example is incorrect comparisons in collections, such as comparing arrays. The problem we are dealing with here is how variables are accessed: call by value or call by reference. With arrays, access is via call by reference, i.e., directly to the memory location. If you now assign an array to a new variable and compare both variables, they will always be the same because you are comparing the array with itself. This is an example of a test case that is essentially meaningless. However, if the implementation is correct, this faulty test case will never cause any problems.

This realization shows us that blindly striving for complete test coverage is not conducive to quality. Of course, it’s understandable that this metric is highly valued by management. However, we have also been able to demonstrate that one cannot rely on it alone. We therefore see that there is also a need for code inspections and refactorings for test cases. Since it’s impossible to read and understand all the code from beginning to end due to time constraints, it’s important to focus on problematic areas. But how can we find these problem areas? A relatively new technique helps us here. The theoretical work on this is already somewhat older; it just took a while for corresponding implementations to become available.

Cryptography – more than just coincidence

In everyday language, we use the word “coincidence” rather unreflectively. Phrases like, “I happened to be passing by here” or “What a coincidence to meet you here” are familiar to everyone. But what do we mean by that? What we’re actually trying to say is that we didn’t expect the current situation.

Coincidence is actually a mathematical term that we’ve adopted into everyday language. Coincidence means something unpredictable. Things like the exact location of any electron in an atom at a given moment. While the path I take to reach a particular destination can be arbitrary, preferences can be derived from probabilities, which then make the choice quite predictable.

Circumstances for such a scenario can be distance, personal well-being (time pressure, discomfort, or boredom), or external circumstances (weather: sunshine, rain). If I’m bored AND the sun is shining, I choose an unknown route for a bit of distraction and curiosity. If I’m short on time AND it’s raining, I choose the shortest route I know, or a route that’s as sheltered as possible. This means that the better you know a person’s habits, the more predictable their decisions are. But predictability contradicts the concept of chance.

It’s nothing new that mathematical terms with very strict definitions are temporarily adopted into our everyday language as a fad. I’d like to briefly address a very popular example, one already cited by Joseph Weizenbaum: the term chaos. In mathematical terms, chaos actually describes the fact that a very small change over very long distances significantly distorts the result, so that it can’t even be used as an estimate or approximation. A typical application is astronomy. If I point a laser beam from Earth to the moon, a deviation of just a few millimeters causes the laser beam to miss the moon by kilometers. To explain such facts to a broader audience in popular science, an association was used that if a butterfly flaps its wings in Tokyo, it can cause a storm in Berlin. Unfortunately, there are quite a few pseudoscientists who seize on this image and sell it to their peers as fact. This is, of course, nonsense. The flapping of a butterfly’s wings cannot create a storm on the other side of the globe. Just think of the impact this would have on our world, just all the birds that take to the air every day.

“Why did the mathematician’s marriage fail? His wife was unpredictable.”

But why is randomness so important in mathematics? Specifically, it’s about the broad topic of cryptography. If we choose combinations for encryption that are easy to guess, the protection is quickly lost. Here’s a small example.

Internet pages are stateless. This means that after a website is accessed and a link is clicked to go to the next page, all information from the previous page is lost. To still be able to provide things like an online shop, a shopping cart, and all the other necessary shopping functions, there is the option of storing data on the server in so-called sessions. This data often includes the user’s login. To distinguish between sessions, they have an identification (ID). The programmer then specifies how this ID is generated. One property of these IDs is that they must be unique; no ID can occur twice.

Now, one might think of using the timestamp, including the milliseconds, to generate a hash. The hash prevents anyone from immediately recognizing that the ID is created from a timestamp. A patient hacker, with a little diligence, uncovered this secret relatively quickly. Added to that is the probability that two users could create a session at the same time, which would lead to an error.

Now, one might come up with the idea of ​​assembling the SessionID from various segments such as timestamps + usernames and other details. Although increasing complexity offers a certain degree of protection, this is not true security. Professionals have methods with manageable effort to guess these ‘avoidable’ secrets. The only real protection is the use of cryptographically secure randomness. As a segment that cannot be guessed, no matter how much effort is put into it.

Before I reveal how we can address the problem, I would like to briefly discuss the typical attack vector and the damage it causes to SessionIDs. If the SessionID has been guessed by an attacker and this session is still active, the hacker can take over this session in their browser. This process is called session hijacking or session riding. The attacker who has managed to take over an active session is logged into an online service as a foreign user with a profile that does not belong to them. This allows them to perform all the actions that a legitimate user can do. It would therefore be possible to place an order in an online shop and have the goods shipped to a different address. This is a situation that must be prevented at all costs.

There are various strategies used to prevent the theft of an active session. Each of these strategies offers a certain level of protection, but the full strength is only achieved by combining the various options, as hackers are constantly evolving and looking for new opportunities. In this short article, we will only consider the aspect of how to generate a cryptographically secure session ID.

Almost all common programming languages ​​have a random() function that generates a random number. The implementation of this random number varies. Unfortunately, these generated numbers are not as random as they should be for attackers. Therefore, developers should always avoid this simple random function. Instead, there are cryptographically secure implementations for random numbers for backend languages ​​such as PHP and Java.

For Java programs, you can use the java.security.SecureRandom class. An important feature of this class is the ability to choose from various cryptographic algorithms [1]. Additionally, the starting value can be specified using the so-called seed. To demonstrate its use, here is a short code snippet:

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

As we can see, its use is quite simple and can be easily adapted. Generating randomness is even easier in PHP. To do so, simply call the function random_int ( $min, $max ); [2]. The interval can be specified optionally.
Thus, we see that the assumption of many people that our world is largely computable is not entirely true. In many areas of the natural sciences, there are processes that we cannot calculate. These, in turn, form the basis for generating ‘true’ randomness. For applications that require very strong protection, hardware is often used. These might be devices that measure the radioactive decay of a low-radiation isotope.

The fields of cryptography and web application security are, of course, much more extensive. This article is intended to draw attention to the necessity of this topic using a fairly simple example. In doing so, I have avoided confusing and ultimately alienating potential interested parties with complicated mathematics.

Resources

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.


Pathfinder

So that we can call console programs directly across the system without having to specify the full path, we use the so-called path variable. So we save the entire path including the executable program, the so-called executable, in this path variable so that we no longer have to specify the path including the executable on the command line. By the way, the word executable derives the file extension exe, which is common in Windows. Here we also have a significant difference between the two operating systems Windows and Linux. While Windows knows whether it is a pure ASCII text file or an executable file via the file extension such as exe or txt, Linux uses the file’s meta information to make this distinction. That’s why it’s rather unusual to use these file extensions txt and exe under Linux.

Typical use cases for setting the path variable are programming languages ​​such as Java or tools such as the Maven build tool. For example, if we downloaded Maven from the official homepage, we can unpack the program anywhere on our system. On Linux the location could be /opt/maven and on Microsoft Windows it could be C:/Program Files/Maven. In this installation directory there is a subdirectory /bin in which the executable programs are located. The executable for Maven is called mvn and in order to output the version, under Linux without the entry in the path variable the command would be as follows: /opt/maven/bin/mvn -v. So it’s a bit long, as we can certainly admit. Entering the Maven installation directory in the path shortens the entire command to mvn -v. By the way, this mechanism applies to all programs that we use as a command in the console.

Before I get to how the path variable can be adjusted under Linux and Windows, I would like to introduce another concept, the system variable. System variables are global variables that are available to us in Bash. The path variable also counts as a system variable. Another system variable is HOME, which points to the logged in user’s home directory. System variables are capitalized and words are separated with an underscore. For our example with entering the Maven Executable in the path, we can also set our own system variable. The M2_HOME convention applies to Maven and JAVA_HOME applies to Java. As a best practice, you bind the installation directory to a system variable and then use the self-defined system variable to expand the path. This approach is quite typical for system administrators who simplify their server installation using system variables. Because these system variables are global and can also be read by automation scripts.

The command line, also known as shell, bash, console and terminal, offers an easy way to output the value of the system variable with echo. Using the example of the path variable, we can immediately see the difference to Linux and Windows. Linux: echo $PATH Windows: echo %PATH%

ed@local:~$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin:/home/ed/Programs/maven/bin:/home/ed/.local/share/gem//bin:/home/ed/.local/bin:/usr/share/openjfx/lib

Let’s start with the simplest way to set the path variable. In Linux we just need to edit the hidden .bashrc file. At the end of the file we add the following lines and save the content.

export M2_HOME="/opt/maven"
export PATH=$PATH:$M2_HOME/bin

We bind the installation directory to the M2_HOME variable. We then expand the path variable to include the M2_HOME system variable with the addition of the subdirectory of executable files. This procedure is also common on Windows systems, as it allows the installation path of an application to be found and adjusted more quickly. After modifying the .bashrc file, the terminal must be restarted for the changes to take effect. This procedure ensures that the entries are not lost even after the computer is restarted.

Under Windows, the challenge is simply to find the input mask where the system variables can be set. In this article I will limit myself to the version for Windows 11. It may of course be that the way to edit the system variables has changed in a future update. There are slight variations between the individual Windows versions. The setting then applies to both the CMD and PowerShell. The screenshot below shows how to access the system settings in Windows 11.

To do this, we right-click on an empty area on the desktop and select the System entry. In the System – About submenu you will find the system settings, which open the System properties popup. In the system settings we press the Environment Variables button to get the final input mask. After making the appropriate adjustments, the console must also be restarted for the changes to take effect.

In this little help, we learned about the purpose of system variables and how to store them permanently on Linux and Windows. We can then quickly check the success of our efforts in the shell using echo by outputting the contents of the variables. And we are now one step closer to becoming an IT professional.


Successful validation of ISBN validation of ISBN numbers

Developers are regularly faced with the task of checking user input for accuracy. A considerable number of standardized data formats now exist that make such validation tasks easy to master. The International Standard Book Number, or ISBN for short, is one such data format. ISBN comes in two versions: a ten-digit and a 13-digit version. From 1970 to 2006, the ten-digit version of the ISBN was used (ISBN-10), which was replaced by the 13-digit version (ISBN-13) in January 2007. Nowadays, it is common practice for many publishers to provide both versions of the ISBN for titles. It is common knowledge that books can be uniquely identified using this number. This, of course, also means that these numbers are unique. No two different books have the same ISBN (Figure 1).

The theoretical background for determining whether a sequence of numbers is correct comes from coding theory. Therefore, if you would like to delve deeper into the mathematical background of error-detecting and error-correcting codes, we recommend the book “Coding Theory” by Ralph Hardo Schulz [1]. It teaches, for example, how error correction works on compact disks (CDs). But don’t worry, we’ll reduce the necessary mathematics to a minimum in this short workshop.

The ISBN is an error-detecting code. Therefore, we can’t automatically correct a detected error. We only know that something is wrong, but we don’t know the specific error. So let’s get a little closer to the matter.

Why exactly 13 digits were agreed upon for ISBN-13 remains speculation. At least the developers weren’t influenced by any superstition. The big secret behind validation is the determination of the residual classes [2]. The algorithms for ISBN-10 and ISBN-13 are quite similar. So let’s start with the older standard, ISBN-10, which is calculated as follows:

1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8 + 9x9 + 10x10 = k modulo 11

Don’t worry, you don’t have to be a SpaceX rocket engineer to understand the formula above. We’ll lift the veil of confusion with a small example for ISBN 3836278340. This results in the following calculation:

(1*3) + (2*8) + (3*3) + (4*6) + (5*2) + (6*7) + (7*8) + (8*3) + (9*4) + (10*0) = 220
220 modulo 11 = 0

The last digit of the ISBN is the check digit. In the example given, this is 0. To obtain this check digit, we multiply each digit by its value. This means that the fourth position is a 6, so we calculate 4 * 6. We repeat this for all positions and add the individual results together. This gives us the amount 220. The 220 is divided by 11 using the remainder operation modulo 11. Since 11 fits exactly 20 times into 220, there is a remainder of zero. The result of 220 modulo 11 is 0 and matches the check digit, which tells us that we have a valid ISBN-10.

However, there is one special feature to note. Sometimes the last digit of the ISBN ends with X. In this case, the X must be replaced with 10.

As you can see, the algorithm is very simple and can easily be implemented using a simple for loop.

boolean success = false;
int[] isbn;
int sum = 0;

for(i=0; i<10; i++) {
sum += i*isbn[i];
}

if(sum%11 == 0) {
success = true;
}

To keep the algorithm as simple as possible, each digit of the ISBN-10 number is stored in an integer array. Based on this preparation, it is only necessary to iterate through the array. If the sum check using the modulo 11 then returns 0, everything is fine.

To properly test the function, two test cases are required. The first test checks whether an ISBN is correctly recognized. The second test checks for so-called false positives. This provokes an expected error with an incorrect ISBN. This can be quickly accomplished by changing any digit of a valid ISBN.

Our ISBN-10 validator still has one minor flaw. Digit sequences that are shorter or longer than 10, i.e., do not conform to the expected format, could be rejected beforehand. The reason for this can be seen in the example: The last digit of the ISBN-10 is a 0 – thus, the character result is also 0. If the last digit is forgotten and a check for the correct format isn’t performed, the error won’t be detected. Something that has no effect on the algorithm, but is very helpful as feedback for user input, is to gray out the input field and disable the submit button until the correct ISBN format has been entered.

The algorithm for ISBN-13 is similarly simple.

x1 + 3x2 + x3 + 3x4 + x5 + 3x6 + x7 + 3x8 + x9 + 3x10 + x11 + 3x12 + x13 = k modulo 10

As with ISBN-10, xn represents the numerical value at the corresponding position in the ISBN-13. Here, too, the partial results are summed and divided by a modulo. The main difference is that only the even-numbered positions—positions 2, 4, 6, 8, 10, and 12—are multiplied by 3, and the result is then divided by modulo 10. As an example, we calculate the ISBN-13: 9783836278348.

9 + (3*7) + 8 + (3*3) + 8 + (3*3) + 6 + (3*2) + 7 + (3*8) + 3 + (3*4) + 8 = 130
130 modulo 10 = 0

The algorithm can also be implemented for the ISBN-13 in a simple for loop.

boolean success = false;
int[] isbn;
int sum = 0;

for(i=0; i<13; i++) {
if(i%2 == 0) {
sum += 3*isbn[i];
} else {
sum += isbn[i];
}
}

if(sum%10 == 0) {
success = true;
}

The two code examples for ISBN-10 and ISBN-13 differ primarily in the if condition. The expression i % 2 calculates the modulo value 2 for the respective iteration. If the result is 0 at this point, it means it is an even number. The corresponding value must then be multiplied by 3.

This shows how useful the modulo operation % can be for programming. To keep the implementation as compact as possible, the so-called triple operator can be used instead of the if-else condition. The expression sum += (i%2) ? isbn[i] : 3 * isbn[3] is much more compact, but also more difficult to understand.

Below you will find a fully implemented class for checking the ISBN in the programming languages: Java, PHP, and C#.

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

While the solutions presented in the examples all share the same core approach, they differ in more than just syntactical details. The Java version, for example, offers a more comprehensive variant that distinguishes more generically between ISBN-10 and ISBN-13. This demonstrates that there are many ways to Rome. It also aims to show less experienced developers different approaches and encourage them to make their own adaptations. To simplify understanding, the source code has been enriched with comments. PHP, as an untyped language, eliminates the need to convert strings to numbers. Instead, a RegEx function is used to ensure that the entered characters are type-safe.

Lessons Learned

As you can see, verifying whether an ISBN is correct isn’t rocket science. The topic of validating user input is, of course, much broader. Other examples include credit card numbers. But regular expressions also provide valuable services in this context.

Ressourcen

  • [1] Ralph-Hardo Schulz, Codierungstheorie: Eine Einführung, 2003, ISBN 978-3-528-16419-5
  • [2] Concept of modular aritmetic on Wikipedia, https://en.wikipedia.org/wiki/Modular_arithmetic

JPoint Moscow 2023

Test Driven: from zero to hero

In the software industry, it is a common agreement that the code base has sufficient test automation. Because this is necessary for a stable DevOps process and secure refactoring. But the reality is completely different. Almost every project I joined during my career didn’t have any lines of test code. If we think about the fact that after more than 40 years, 80% of all commercial software projects fail, we should not be surprised. But this doesn’t have to be like this. In this talk, we demonstrate how easy it is to introduce, even in huge projects, a test-driven approach. The technical setup is a standard Java project with Apache Maven and JUnit 5.

Working with JSON in Java RESTful Services using Jackson

Since a long time the Java Script Object Notation [1] become as a lightweight standard to replace XML for information exchange between heterogeneous systems. Both technologies XML and JSON closed those gap to return simple and complex data of a remote method invocation (RMI), when different programming languages got involved. Each of those technologies has its own benefits and disadvantages. A good designed XML document is human readable but needs in comparing to JSON more payload when it send through the network. For almost every programming languages existing plenty implementations to deal with XML and also JSON. We don’t need to reinvent the wheel, to implement our own solution for handling JSON objects. But choosing the right library is not that easy might it seems.

The most popular library for JSON in Java projects is the one I already mentioned: Jackson [2]. because of its huge functionality. Another important point for choosing Jackson instead of other libraries is it’s also used by the Jersey REST Framework [3]. Before we start now our journey with the Java Frameworks Jersey and Jackson, I like to share some thoughts about things, I often observe in huge projects during my professional life. Because of this reason I always proclaim: don’t mix up different implementation libraries for the same technology. The reason is it’s a huge quality and security concern.

The general purpose for using JSON in RESTful applications is to transmit data between a server and a client via HTTP. To achieve that, we need to solve two challenges. First, on the server side, we need create form a Java object a valid JSON representation which we can send to the client. This process we call serialization. On the client side, we do the second step, which is exactly the opposite, we did on the server. De-serialization we call it, when we create a valid object from a JSON String.

In this article we will use on the server side and also on the client side Java as programming language, to deal with JSON objects. But keep in mind REST allows you to have different programming languages on the server and for the client. Java is always a good choice to implement your business logic on the server. The client side often is made with JavaScript. Also PHP, .NET and other programming Languages are possible.

In the next step we will have a look at the project architecture. All artifacts are organized by one Apache Maven Multi-Module project. It’s a good recommendation to follow this structure in your own projects too. The three artifacts we create are: api, server and client.

  • API: contain shared objects which will needed on the server and also client side, like domain objects and interfaces.
  • Server: producer of a RESTful service, depends on API.
  • Client: consumer of the RESTful service, depends on API.

Inside of this artifacts an layer architecture is applied. This means the access to objects from a layer is only allowed to the direction of the underlying layers. In short: from top to down. The layer structure are organized by packages. Not every artifact contains every layer, only the ones which are implemented. The following picture draws an better understanding for the whole architecture is used.

The first piece of code, I’d like to show are the JSON dependencies we will need in the notation for Maven projects.

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>${version}</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-annotations</artifactId>
    <version>${version}</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>${version}</version>
</dependency>
XML

Listing 1

In respect to the size of this article, I only focus how the JSON object is used in RESTful applications. It’s not a full workshop about RESTful (Micro) Services. As code base we reuse my open source GitHub project TP-ACL [4], an access control list. For our example I decided to sliced apart the Role – Functionality from the whole code base.

For now we need as first an Java object which we can serialize to an JSON String. This Domain Object will be the Class RolesDO and is located in the layer domain inside the API module. The roles object contains a name, a description and a flag that indicates if a role is allowed to delete.

@Entity
@Table(name = "ROLES")
public class RolesDO implements Serializable {

    private static final long serialVersionUID = 50L;

    @Id
    @Column(name = "NAME")
    private String name;

    @Column(name = "DESCRIPTION")
    private String description;

    @Column(name = "DELETEABLE")
    private boolean deleteable;

    public RolesDO() {
        this.deleteable = true;
    }

    public RolesDO(final String name) {
        this.name = name;
        this.deleteable = true;
    }

    //Getter & Setter
}
Java

Listing 2

So far so good. As next step we will need to serialize the RolesDO in the server module as a JSON String. This step we will do in the RolesHbmDAO which is stored in the implementation layer within the Server module. The opposite direction, the de-serialization is also implemented in the same class. But slowly, not everything at once. lets have as first a look on the code.

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;

public class RolesDAO {

    public transient EntityManager mainEntityManagerFactory;

    public String serializeAsJson(final RolesDO role) 
            throws JsonProcessingException {
        ObjectMapper mapper = new ObjectMapper();
        return mapper.writeValueAsString(role);
    }

    public RolesDO deserializeJsonAsObject(final String json, final RolesDO role) 
            throws JsonProcessingException, ClassNotFoundException {
        ObjectMapper mapper = new ObjectMapper();
        return (RolesDO) mapper.readValue(json, Class.forName(object.getCanonicalName()));
    }

    public List<RolesDO> deserializeJsonAsList(final String json)
            throws JsonProcessingException, ClassNotFoundException {       
        ObjectMapper mapper = new ObjectMapper();
        return mapper.readValue(json, new TypeReference<List>() {});
    }

    public List listProtectedRoles() {

        CriteriaBuilder builder = mainEntityManagerFactory.getCriteriaBuilder();
        CriteriaQuery query = builder.createQuery(RolesDO.class);
        
        Root root = query.from(RolesDO.class);
        query.where(builder.isNull(root.get("deactivated")));
        query.orderBy(builder.asc(root.get("name")));

        return mainEntityManagerFactory.createQuery(query).getResultList();
    }
}
Java

Listing 3

The implementation is not so difficult to understand, but may at this point could the first question appear. Why the de-serilization is in the server module and not in the client module? When the client sends a JSON to the server module, we need to transform this to an real Java object. Simple as that.

Usually the Data Access Object (DAO) Pattern contains all functionality for database operations. This CRUD (create, read, update and delete) functions, we will jump over. If you like to get to know more about how the DAO pattern is working, you could also check my project TP-CORE [4] at GitHub. Therefore we go ahead to the REST service implemented in the object RoleService. Here we just grep the function fetchRole().

@Service
public class RoleService {

    @Autowired
    private RolesDAO rolesDAO;

    @GET
    @Path("/{role}")
    @Produces({MediaType.APPLICATION_JSON})
    public Response fetchRole(final @PathParam("role") String roleName) {
        Response response = null;
        try {
            RolesDO role = rolesDAO.find(roleName);
            if (role != null) {
                String json = rolesDAO.serializeAsJson(role);
                response = Response.status(Response.Status.OK)
                        .type(MediaType.APPLICATION_JSON)
                        .entity(json)
                        .encoding("UTF-8")
                        .build();
            } else {
                response = Response.status(Response.Status.NOT_FOUND).build();
            }

        } catch (Exception ex) {
            LOGGER.log("ERROR CODE 500 " + ex.getMessage(), LogLevel.DEBUG);
            response = Response.status(Response.Status.INTERNAL_SERVER_ERROR).build();
        }
        return response;
    }
}
Java

Listing 4

The big secret here we have in the line where we stick the things together. As first the RolesDO is created and in the next line the DAO calls the serializeAsJson() Method with the RoleDO as parameter. The result will be a JSON representation of the RoleDO. If the role exist and no exceptions occur, then the service is ready for consuming. In the case of any problem the service send a HTTP error code instead of the JSON.

Complex Services which combine single services to a process take place in the orchestration layer. At this point we can switch to the client module to learn how the JSON String got transformed back to a Java domain object. In the client we don’t have RolesHbmDAO to use the deserializeJsonAsObject() method. And of course we also don’t want to create duplicate code. This forbids us to copy paste the function into the client module.

As pendant to the fetchRole() on the server side, we use for the client getRole(). The purpose of both implementations is identical. The different naming helps to avoid confusions.

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;

public class Role {
    private final String API_PATH
            = "/acl/" + Constraints.REST_API_VERSION + "/role";
    private WebTarget target;

    public RolesDO getRole(String role) throws JsonProcessingException {
        Response response = target
                .path(API_PATH).path(role)
                .request()
                .accept(MediaType.APPLICATION_JSON)
                .get(Response.class);
        LOGGER.log("(get) HTTP STATUS CODE: " + response.getStatus(), LogLevel.INFO);

        ObjectMapper mapper = new ObjectMapper();
        return mapper.readValue(response.readEntity(String.class), RolesDO.class);
    }
}
Java

Listing 5

As conclusion we have now seen the serialization and de-serialisation by using the Jackson library of JSON objects is not that difficult. In the most of the cases we just need three methods:

  • serialize a Java object to a JSON String
  • create a Java object from a JSON String
  • de-serialize a list of objects inside a JSON String to a Java object collection

This three methods I already introduced in Listing 2 for the DAO. To prevent duplicate code we should separte those functionality in an own Java Class. This is known as the design pattern Wrapper [5] also known as Adapter. For reaching the best flexibility I implemented the JacksonJsonTools from TP-CORE as Generic.

package org.europa.together.application;

import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.core..JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.util.List;

public class JacksonJsonTools {

    private static final long serialVersionUID = 15L;

    public String serializeAsJsonObject(final T object)
            throws JsonProcessingException {
        try {
            ObjectMapper mapper = new ObjectMapper();
            return mapper.writeValueAsString(object);
        } catch (JsonProcessingException ex) {
            System.err.println(ex.getOriginalMessage());
        }
    }

    public T deserializeJsonAsObject(final String json, final Class object)
            throws JsonProcessingException, ClassNotFoundException {
        try {
            Class<?> clazz = Class.forName(object.getCanonicalName());
            ObjectMapper mapper = new ObjectMapper();
            return (T) mapper.readValue(json, clazz);
        } catch (JsonProcessingException ex) {
            System.err.println(ex.getOriginalMessage());
        }
    }

    public List deserializeJsonAsList(final String json)
            throws JsonProcessingException, ClassNotFoundException {
        try {
            ObjectMapper mapper = new ObjectMapper();
            return mapper.readValue(json, new TypeReference<List>() {
            });
        } catch (com.fasterxml.jackson.core.JsonProcessingException ex) {
            System.err.println(ex.getOriginalMessage());
        }
    }
}
Java

Listing 6

This and much more useful Implementations with a very stable API you find in my project TP-CORE for free usage.

Resources:

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

jConf Peru 2022

Rolling Stones on stage: release me

Everyone does it, some even several times a day. But few are aware of the complex interlocking mechanisms that make up a complete software release. This is why it sometimes happens that a package gets in the way of the automated processing chain.
With a bit of theory and a typical example from the Java universe, I show how you can take a little pressure out of the software development process in order to achieve lean, slightly automated processes.

To deal with standards in your own projects is not something bad. A well define release process based on common standards increase your productivity. Learn in this talk how you are able to simplify your daily work.

Preventing SQL Injections in Java with JPA and Hibernate

When we have a look at OWASP’s top 10 vulnerabilities [1], SQL Injections are still in a popular position. In this short article, we discuss several options on how SQL Injections could be avoided.

When Applications have to deal with databases existing always high-security concerns, if an invader got the possibility to hijack the database layer of your application, he can choose between several options. Stolen the data of the stored users to flood them with spam is not the worst scenario that could happen. Even more problematic would be when stored payment information got abused. Another possibility of an SQL Injection Cyber attack is to get illegal access to restricted pay content and/or services. As we can see, there are many reasons why to care about (Web) Application security.

To find well-working preventions against SQL Injections, we need first to understand how an SQL Injection attack works and on which points we need to pay attention. In short: every user interaction that processes the input unfiltered in an SQL query is a possible target for an attack. The data input can be manipulated in a manner that the submitted SQL query contains a different logic than the original. Listing 1 will give you a good idea about what could be possible.

SELECT Username, Password, Role FROM User 
   WHERE Username = 'John Doe' AND Password = 'S3cr3t';
SELECT Username, Password, Role FROM Users
   WHERE Username = 'John Doe'; --' AND Password='S3cr3t';

Listing 1: Simple SQL Injection

The first statement in Listing 1 shows the original query. If the Input for the variables Username and Password is not filtered, we have a lack of security. The second query injects for the variable Username a String with the username John Doe and extends with the characters ‘; –. This statement bypasses the AND branch and gives, in this case, access to the login. The ‘; sequence close the WHERE statement and with — all following characters got un-commented. Theoretically, it is possible to execute between both character sequences every valid SQL code.

Of course, my plan is not to spread around ideas that SQL commands could rise up the worst consequences for the victim. With this simple example, I assume the message is clear. We need to protect each UI input variable in our application against user manipulation. Even if they are not used directly for database queries. To detect those variables, it is always a good idea to validate all existing input forms. But modern applications have mostly more than just a few input forms. For this reason, I also mention keeping an eye on your REST endpoints. Often their parameters are also connected with SQL queries.

For this reason, Input validation, in general, should be part of the security concept. Annotations from the Bean Validation [2] specification are, for this purpose, very powerful. For example, @NotNull, as an Annotation for the data field in the domain object, ensure that the object only is able to persist if the variable is not empty. To use the Bean Validation Annotations in your Java project, you just need to include a small library.

<dependency> 
    <groupId>org.hibernate.validator</groupId>
    <artifactId>hibernate-validator</artifactId>
    <version>${version}</version>
</dependency>

Listing 2: Maven Dependency for Bean Validation

Perhaps it could be necessary to validate more complex data structures. With Regular Expressions, you have another powerful tool in your hands. But be careful. It is not that easy to write correct working RegEx. Let’s have a look at a short example.

public static final String RGB_COLOR = "#[0-9a-fA-F]{3,3}([0-9a-fA-F]{3,3})?";
 
public boolean validate(String content, String regEx) {
    boolean test;
    if (content.matches(regEx)) {
        test = true;
    } else {
        test = false;
    }
    return test;
}

validate('#000', RGB_COLOR);

Listing 3: Validation by Regular Expression in Java

The RegEx to detect the correct RGB color schema is quite simple. Valid inputs are #ffF or #000000. The Range for the characters is 0-9, and the Letters A to F. Case insensitive. When you develop your own RegEx, you always need to check very well existing boundaries. A good example is also the 24 hours time format. Typical mistakes are invalid entries like 23:60 or 24:00. The validate method compares the input string with the RegEx. If the pattern matches the input, the method will return true. If you want to get more ideas about validators in Java, you can also check my GitHub repository [3].

In resume, our first idea to secure user input against abuse is to filter out all problematic character sequences, like — and so on. Well, this intention of creating a blocking list is not that bad. But still have some limitations. At first, the complexity of the application increased because blocking single characters like –; and ‘ could causes sometimes unwanted side effects. Also, an application-wide default limitation of the characters could cost sometimes problems. Imagine there is a text area for a Blog system or something equal.

This means we need another powerful concept to filter the input in a manner our SQL query can not manipulate. To reach this goal, the SQL standard has a very great solution we can use. SQL Parameters are variables inside an SQL query that will be interpreted as content and not as a statement. This allows large texts to block some dangerous characters. Let’s have a look at how this will work on a PostgreSQL [4] database.

DECLARE user String;
SELECT * FROM login WHERE name = user; 

Listing 4: Defining Parameters in PostgreSQL

In the case you are using the OR mapper Hibernate, there exists a more elegant way with the Java Persistence API (JPA).

String myUserInput;
 
@PersistenceContext
public EntityManager mainEntityManagerFactory;

CriteriaBuilder builder =
    mainEntityManagerFactory.getCriteriaBuilder();

CriteriaQuery<DomainObject> query =
    builder.createQuery(DomainObject.class);

// create Criteria
Root<ConfigurationDO> root =
    query.from(DomainObject.class);

//Criteria SQL Parameters
ParameterExpression<String> paramKey =
    builder.parameter(String.class);

query.where(builder.equal(root.get("name"), paramKey);

// wire queries together with parameters
TypedQuery<ConfigurationDO> result =
    mainEntityManagerFactory.createQuery(query);

result.setParameter(paramKey, myUserInput);
DomainObject entry = result.getSingleResult();

Listing 5: Hibernate JPA SQL Parameter Usage

Listing 5 is shown as a full example of Hibernate using JPA with the criteria API. The variable for the user input is declared in the first line. The comments in the listing explain the way how it works. As you can see, this is no rocket science. The solution has some other nice benefits besides improving web application security. At first, no plain SQL is used. This ensures that each database management system supported by Hibernate can be secured by this code.

May the usage looks a bit more complex than a simple query, but the benefit for your application is enormous. On the other hand, of course, there are some extra lines of code. But they are not that difficult to understand.

Resources