Test First?

When I started test-driven programming over 10 years ago, I was aware of many different concepts in theory. But this approach of first writing test cases and then implementing them was somehow not the way I got on well with. To be honest, this is still the case today. So I found an adaptation of Kent Beck’s TDD paradigm that works for me. But first things first. Perhaps my approach is also quite helpful for one or the other.

I originally come from environments for highly scalable web applications to which all the great theories from the university cannot be easily applied in practice. The main reason for this is the high complexity of such applications. On the one hand, various additional systems such as in-memory cache, database and identity and access management (IAM) are part of the overall system. On the other hand, many modern frameworks such as OR Mapper hide complexity behind different access layers. As developers, we need to master all of these things. That is why there are robust, and practice proven solutions that are well known but rarely used. Kent Beck is one of the most important voices for the practical use of automated software testing.

If we want to get involved with the concept of TDD, it is important not to put too much weight on every character. Not everything is set in stone. What is important is the result at the end of the day. For this reason, it is essential to keep the objective of all efforts in mind in order to achieve personal added value. So let’s start by looking at what we want to achieve in the first place.

Success proves us right

When I first started out as a developer, I needed constant feedback on whether what I was putting together was really working. I mostly generated this feedback by spreading during my implementation countless console outputs on the one hand and on the other hand I always tried to integrate everything into a user interface and then ‘click through’ manually. Basically a very cumbersome test setup, which then has to be removed again at the end. If later bug fixes had to be made, the whole procedure started all over again. Everything was somehow unsatisfactory and far removed from a productive way of working. Somehow this had to be improved without having to reinvent yourself every time.

Finally, my original approach has exactly two significant weaknesses. The most obvious one is the commenting in and out of debug information via the console.

But the second point is much more serious. Because all the knowledge acquired about this particular implementation is not preserved. It is therefore in danger of fading over time and ultimately being lost. However, such specialized knowledge is extremely valuable for many subsequent process steps in software development. By this I explicitly mean the topic of quality. Refactoring, code reviews, bug fixes and change requests are just some of the possible examples where in-depth detailed knowledge is required.

For me personally, there is also the fact that monotonously repetitive work quickly tires me out and I would like to avoid it. Clicking through an application again and again with the same test procedure is a far away from what constitutes a fulfilling working day for me. I want to discover new things. But I can only do that if I’m not trapped in the past.

Conference Talk

But they dare to do something

But before I go into how I have spiced up my day-to-day development work with TDD, I have to say a few words about responsibility and courage. In conversations others told me frequently that I am right, but they can’t take action to follow my recommendations because the project manager or some other superior doesn’t give a green light.

Such an attitude is extremely unprofessional in my eyes. I don’t ask an marketing manager which algorithm terminate as best. He simply has no idea what I’m talking about, because it is not his area of responsibility. A project manager who speaks out against test-driven work in the development team has also missed his job. Nowadays, test frameworks are so well integrated into the build environment that even inexperienced people can prepare for TDD in a matter of moments. It is therefore not necessary to make a big deal of the project. I can promise that even the first attempts will not take any longer than with the original approach. On the contrary, there will be a noticeable increase in productivity very quickly.

The first stage of evolution

As already mentioned, logging is a central part of test-driven development for me. Whenever it makes sense, I try to output the status of objects or variables on the console. If we use the means provided by the programming language used for this, this means that we must at least comment out this system output after the work has been done and comment it in again later when searching for errors. A redundant and error-prone procedure.

If, on the other hand, we use a logging framework right from the start, we can confidently leave the debug information in the code and deactivate it later in productive operation via the setting log level.

I also use logging as a tracer. This means that each constructor of a class writes a corresponding log entry by the log level info while it is being called. This allows me to see the order in which objects are instantiated. From time to time I have also become aware of the excessively frequent instantiation of a single object. This is helpful for performance and memory optimization measures.

I log errors that are thrown during exception handling as errors or warnings, depending on the context. This is a very helpful tool for tracking down errors later in operation.

So if I have a database access, I write a log output in the log level debug as the associated SQL was assembled. If this SQL leads to an exception because it contains an error, this exception is written with the log level error. If, on the other hand, a simple search query with correct SQL syntax takes place and the result set is empty, this event is classified as either Debug or Warning, depending on requirements. For example, if it is a login request with an incorrect user name or password, I tend to opt for the Log Level Warning, as this may contain security-related aspects during operation.

In the overall context, I tend to configure the logging for the test case execution very loquaciously and limit myself to a pure console output. During operation, the logging information is written to a log file.

The chicken or egg

Once we have laid the foundations for an additional feedback loop with logging, the next step is to decide what to do next. As already mentioned, I find it very difficult to first write a test case and then find a suitable implementation for it. Many other developers who start with TDD also face this problem.

One thing I can already anticipate is the problem of making sure that an implementation is testable. Once I have the test case, I immediately realize whether what I am creating is really testable. Experienced TDD developers have quickly learned in flesh and blood how testable code should look like. The most important point here is that methods should always have a return value that is preferably not null. This can be achieved, for example, by returning an empty list instead of null.

The requirement to have a return value is due to the way unit test frameworks work. A test case compares the return value of a method with an expected value. The test assertion has different characteristics and can therefore be: equal, unequal, true or false. Of course, there are also different variations here. For example, it may be possible to test methods that have no return value by using exceptions. All these details become clear in a very short time during using TDD. So that everyone can get started immediately without lengthy preparations.

When reading the book Test Driven Development by Example by Kent Beck, we also quickly find an explanation as to why the test cases should be written first. It is a psychological factor. It should help us to cope better with the usual stress that arises in the project. It creates a mental state in us about the status and progress of the current work. It guides us in an iterative process to expand and improve the existing solution step by step via the various test cases.

For those who, like me, have no concrete idea of the final result at the start of an implementation, this approach is difficult to implement. The intended effect of relaxation turns into a negative one. As we humans are all different, we have to find out what makes us tick in order to achieve the best possible result. It’s the same with learning strategies. Some people process information better visually, others more haptically and still others extract everything important from spoken words. So let’s try not to bend ourselves against our nature in order to produce mediocre or poor results.

Drawing the first line

A topic only becomes clear to me while I’m working on it. So I try my hand at an implementation until I need some initial feedback. That’s when I write the first test. This approach automatically gives rise to questions, each of which is worth its own test case. Can I find all available results? What happens if the result set is empty? How can the result set be narrowed down? These are all points that can be noted on a piece of paper and ticked off step by step. I had the idea of writing down a to-do list on a piece of paper a long time before I rode about it in the book by Kent Beck mentioned above. It helps me to preserve quick thoughts without being distracted from what I am currently doing. It also gives me a sense of accomplishment at the end of the day.

Since I don’t wait until I’ve implemented everything to write the first test, this approach also results in an iterative approach. I also notice very quickly if my design is not sufficiently testable, as I receive immediate feedback. This results in my own interpretation of TDD, which is characterized by the permanent change between implementing and writing tests.

As a result of my early TDD attempts, I already noticed a speeding up of my working methods in the first week. I also became more confident. But the way I program also started to change very early on. I have noticed that my code has become more compact and robust. Things that had only become apparent over time emerged during activities such as refactoring and extensions. Failed test cases have saved me from unpleasant surprises.

Start without overzealousness

If we decide to use TDD in an existing project, it is a bad idea to start writing test cases for existing functionality. Apart from the time that has to be planned for this, the result will not fulfill the high expectations.

One of the problems is that you now have to familiarize yourself with each functionality and this is very time-consuming. The quality of the resulting test cases is also inadequate. The problem also arises from missing experience. When the experience is first built up, the quality of the test cases is also not quite optimal and code may also have to be rewritten to make it testable. This creates a lot of risks that are problematic for day-to-day project business.

A proven procedure for introducing TDD is simply to use it for the current implementation you are currently working on. The current state of the current problem is documented by automated tests. Since you are already in familiar territory, you do not have to familiarize yourself with a new topic, so you can concentrate fully on formulating meaningful tests. Apart from the fact that you take responsibility for other people’s work without being asked when you implement test cases for them.

Existing functionality is only supplemented with test cases when errors are corrected. For the correction, you have to deal with the implementation details anyway, so that there is sufficient knowledge here of how a functionality should behave. The resulting tests also document the correction and ensure that the behavior does not change in the future during optimization work.

If you follow this procedure in a disciplined manner, you will not lose yourself in so-called hectic activity, which in turn is the opposite of productivity. In addition, you quickly acquire knowledge of how effective and meaningful tests can be implemented. Only when sufficient experience has been gained and possibly extensive refactoring are planned you can consider how test coverage can be gradually improved for the entire project.

Quality level

Just because test cases are available does not mean that they are meaningful. Nor does a high test coverage prove that a program is error-free. A high test coverage only ensures that a program behaves within the scope of the tests.

So how can you ensure that the existing tests are really an enrichment and have good informative value? The first and, in my opinion, most important point is to keep test cases as short as possible. In concrete terms, this means that a test only answers one explicit question, e.g. What happens if the result set is empty? The test method is then named according to the question. The added value of this approach arises when the test case fails. If the test is very short, it is often possible to get to know from the test method what the problem is without having to spend a lot of time familiarizing yourself with a test case.

Another important point in the TDD procedure is to check the test coverage for lines of code as well as for branches for my implemented functionality. If, for example, I cannot simulate the occurrence of a single condition in an IF statement, this condition can be deleted without hesitation.

Of course, you also have enough dependencies on external libraries in your own project. Now it can happen that a method from this library throws an exception that cannot be simulated by any test case. This is exactly the reason why you should strive for high test coverage but not despair if 100% cannot be achieved. Especially when introducing TDD, a good measure of test coverage greater than 85% is common. As the development team gains experience, this value can be increased up to 95%.

Finally, however, it should be noted that you should not get too carried away. Because it can quickly become excessive and then all the advantages gained are quickly lost. The point is that you don’t write tests that in turn test tests. This is where the cat bites its own tail. This also applies to third-party libraries. No tests are written for these either. Kent Beck is very clear about this: “Even if there are good reasons to distrust other people’s code, don’t test it. External code requires more of your own implementation logic”.

Lessons learned

The lessons that can be learned when trying to achieve the highest possible test coverage are the ones that will have an impact on future programming. The code becomes more compact and robust.

Productivity increases simply due to the fact that error-prone and monotonous work is avoided through automation. There are no additional work steps because old habits are replaced by newer, better ones.

One effect that I have observed time and again is that when individual members of the team have opted for TDD, their successes are quickly recognized. Within a few weeks, the entire team had developed TDD. Each individual according to their own abilities. Some with Test First, others as I have just described. In the end, it’s the result that counts and it was uniformly excellent. When the work is easier and at the end of the day each individual has the feeling that they have also achieved something, this gives the team an enormous motivation boost, which gives the project and the working atmosphere a huge boost. So what are you waiting for? Try it out for yourself right away.

Acceptance Tests in Java With JGiven

In a standard set-up for Java projects like NetBeans, Maven, and JUnit, it is not...

The spectre of artificial intelligence

The hype surrounding artificial intelligence has been going on for several years. Currently, companies like OpenAI are causing quite a stir with freely accessible neural networks like ChatGPT. Users are fascinated by the possibilities and some intellectual figures of our time are warning humanity about artificial intelligence. So what is it about the specter of AI? In this article, I explore this question and you are invited to join me on this journey. Let’s go and follow me into the future.

In the spring of 2023, reports about the performance capabilities of artificial neural networks overflowed. This trend is continuing and, in my opinion, will not abate any time soon. In the midst of the emerging gold rush mood, however, there are also isolated bad news doing the rounds. For example, Microsoft announced a massive investment in artificial intelligence on a grand scale. This announcement was underlined in the spring of 2023 with the dismissal of just under 1000 employees and gave rise to familiar fears of industrialization and automation. Things were less spectacular at Digital Ocean, which laid off its entire content creation and documentation team. Quickly, some people rightly asked whether AI would now make professions like programmers, translators, journalists, editors and so on obsolete? For now, I would like to answer this question with a no. In the medium term, however, changes will occur, as history has already taught us. Something old passes away while new things come into being. So follow me on a little historical excursion.

To do this, we first look at the various stages of industrialization, which originated in England in the second half of the 18th century. Already the meaning of the original Latin term Industria, which can be translated with diligence, is extremely interesting. Which leads us to Norbert Wiener and his 1960 book ern God and Golem Inc [1]. He publicly pondered whether people who create machines that in turn can create machines are gods. Something I do not want to subscribe from my feeling. But let’s come back to industrialization for the time being.

The introduction of the steam engine and the use of location-independent energy sources such as coal enabled precise mass production. With cheaper automation of production by machines, manual home workplaces were displaced. In exchange, cheaper products were now available in stores. But there were also significant changes in transportation. The railroad allowed for faster, more comfortable and cheaper travel. This catapulted mankind into a globalized world. Because goods could now also travel long distances in a short time without any problems. Today, when we look back at the discussions that took place when the railroad began its triumphal march, we can only smile. After all, some intellectuals of the time argued that speeds in a train of more than 30 kilometers per hour would literally crush the human occupants. A fear that fortunately turned out to be unfounded.

While people in the first industrial revolution could no longer earn an income from working at home, they found an alternative to continue earning a living by working in a factory.

The second industrial revolution is characterized by electrification, which further increased the degree of automation. Machines became less cumbersome and more precise. But new inventions also entered daily life. Fax, telephone and radio spread information at a rapid pace. This brought us into the Information Age and accelerated not only our communication, but also our lives. We created a society that is primarily characterized by the saying “time is money”.

The third industrial revolution blessed mankind with a universal machine, which determined its functionality by the programs (software) running on it. Nowadays, computers support us in a wide range of activities. Modern cash register systems do much more than just spit out the total amount of the purchase made. They log money and flow of goods and allow evaluations for optimization with the collected data. This is a new quality of automation that we have achieved in the last 200 years. With the widespread availability of artificial neural networks, we are now on our way out of this phase, which is why we are currently in the transformation to the fourth industrial revolution. How else do we as humans intend to cope with the constantly growing flood of information?

Even though Industry 4.0 focuses on the networking of machines, this is not a real revolution. The Internet is only a consequence of the previous development to enable communication between machines. We can compare this with the replacement of the steam engine by electric motors. The real innovation was in electric machines that changed our communication. This is now happening in our time through the broad field of artificial intelligence.

In the near future, we will no longer use computers the way we have been doing. That’s because today’s computers owe their existence to the previously limited communication between humans and machines. The keyboard and mouse are actually clumsy input devices. They are slow and prone to error. Voice and gesture control via microphone and camera will replace mouse and keyboard. We will talk to our computers the way we talk to other people. This also means that today’s computer programs will become obsolete. We will no longer have to fill out tedious input masks in graphical user interfaces in order to reach our goal. Gone are the days where I type my articles. I will type them in and my computer will visually display them for me to proofread. Presumably, the profession of speech therapist will then experience a significant upswing.

There will certainly also be enough outcries from people who fear the disintegration of human communication. This fear is not at all unfounded. Let’s just look at the development of the German language in the period since the turn of the millennium. This was marked by the emergence of various text messaging services and the optimization of messages by using as many abbreviations as possible. This in turn only created question marks on the foreheads of parents when it came to deciphering the content of their children’s messages. Even though the current trend is away from text messages to audio messages, it does not mean that our language will not continue to change. I myself have observed for years that many people are no longer able to express themselves correctly in writing or to extract content from written texts. In the long run, this could lead to the unlearning of skills such as reading and writing. Thus also classical print articles such as books and magazines become obsolete. Finally, content can also be produced as video or podcast. Our intellectual abilities will degenerate in the long run.

Since the turn of the millennium, it has become easier and easier for many people to use computers. So first the good news. It will become much easier to use computers in the future because human-machine interaction is becoming more intuitive. In the meantime, we will see more and more major Internet portals shutting down their services because their business model is no longer viable. Here’s a quick example.

As a programmer, I often use the website StackOverflow to find help with problems. The information on this website about programming issues is now so extensive that you can quickly find suitable solutions to your own concerns by searching Google and the like, without having to formulate questions yourself. So far so good. But if you now integrate a neural network like ChatGPT into your programming environment to find the answer to all questions, the number of visitors for StackOverflow will continuously decrease. This in turn has an impact on advertising campaigns to be able to offer the service free of charge on the net. Initially, this will be compensated by the fact that operators of AI systems that access the data from StackOverflow will pay a flat fee for the use of the database. However, this will not stop the dwindling number of visitors. Which will lead to either a payment barrier preventing free use or the service being discontinued completely. There are many offers on the Internet that will encounter similar problems, which will ensure in the long term that the Internet as we know it has disappeared in the future.

Let’s imagine what a future search query for the search term ‘industrial revolution’ might look like. I ask my digital assistant: What do you know about industrial revolution? – Instead of searching through a seemingly endless list of thousands of entries for relevant results, I am read a short explanation with a personalized address that matches my age and level of education. Which immediately raises the question of who is judging my level of education and how?

This is a further downgrading of our abilities. Even if it is perceived as very comfortable in the first moment. If we no longer have the need to focus our attention on one specific thing over a long period of time, it will certainly be difficult for us to think up new things in the future. Our creativity will be reduced to an absolute minimum.

It will also change the way data is stored in the future. Complicated structures that are optimized and stored in databases will be the exception rather than the rule. Rather, I expect independent chunks of data that are concatenated like lists. Let’s look at this together to get a good idea of what I mean.

As a starting point, let’s take Aldous Huxley’s book ‘Brave New World’ from 1932. In addition to the title, the author and the year of publication, we can add English as the language to the meta information. This is then followed by the entire contents of the book including preface and epilogue as plain ASCII text. Generic or changeable things like table of contents or copyright are not included at this stage. With such a chunk, we have defined an atomic datum that can be uniquely identified by a hash value. Since Huxley’s Brave New World was originally written in English, this datum is also an immutable source for all data derived and generated from it.

If the work of Huxley is now translated into German or Spanish, it is the first derivation with the reference to the original. It can happen that books have been translated by different translators in different epochs. This results in a different reference hash for the German translation by Herbert E. Herlitschka from 1933 with the title ‘Brave New World’ than for the translation by Eva Walch published in 1978 with the same title ‘Brave New World’.

If audio books are now produced from the various texts, these audio books are the second derivative of the original text, since they represent an abridged version. A text is also created as an independent version before the recording. The audio track created from the abridged original text has the director as its author and refers to the speaker(s). As in theater, a text can be interpreted and staged by different people. Film adaptations can be treated in the same way.

Books, audio books and films in turn have graphics for the cover. These graphics again represent independent works, which are referenced with the corresponding version of the original.

Quotations from books can also be linked in this way. Similarly, critiques, interpretations, reviews and all kinds of other variations of content that refer to an original.

However, such data blocks are not only limited to books, but can also be applied to music scores, lyrics, etc. The decisive factor is that one can start from the original as far as possible. The resulting files are optimized exclusively for software programs, since they do not contain any formatting that is visible to the human eye. Finally, the corresponding hash value about the content of the file is sufficient as file name.

This is where the vision of the future begins. As authors of our work, we can now use artificial intelligence to automatically create translations, illustrations, audio books and animations even from a book. At this point, I would like to briefly refer to the neural network DeepL [2], which already delivers impressive translations and even improves the original text if handled skillfully. Does DeepL now put translators and editors out of work? I mean no! Because also like us humans, artificial intelligences are not infallible. They also make mistakes. That’s why I think that the price for these jobs will drop dramatically in the future, because these people can now do many times more work than before, thanks to their knowledge and excellent tools. This makes the individual service considerably cheaper, but because more individual services are possible through automation in the same period of time, this compensates for the price reduction for the provider.

If we now look at the new possibilities that are open to us, it doesn’t seem to be so problematic for us. So what are people like Elon Musk trying to warn us about?

If we now assume that the entire human knowledge will be digitized by the fourth industrial revolution and that all new knowledge will only be created in digital form, computer algorithms will be free to use suitable computing power to change these chunks of knowledge in such a way that we humans will not notice. A scenario loosely based on Orwell’s Ministry of Truth from the novel 1984. If we unlearn our abilities out of convenience, we also have few possibilities of verification.

If you think this would not be a problem, I would like to refer you to the lecture “Trust no scan” by David Kriesel [3].What happened? In short, it was about the fact that a construction company noticed discrepancies in copies of their construction plans. This resulted in different copies of the same original, in which the numerical values were changed. A very fatal problem in a construction project for the executing trades. If the bricklayer gets different size data than the concrete formers. The error was finally traced back to the fact that Xerox used an AI as software in their scanners for the OCR and the subsequent compression, which could not reliably recognize the characters read in.

But also the quote from Ted Chiang “Think of ChatGPT as a blurry jpeg of all the text on the Web.” should make us think. Certainly, for people who only know AI as an application, the meaning is hard to understand what is meant by saying “ChatGPT is just a blurry jpeg of all the text on the web”. However, it is not as difficult to understand as it seems at the first moment. Due to their structure, neural networks are always only a snapshot. Because with every input the internal state of a neural network changes. It is the same as with us humans. After all, we are only the sum of our experiences. If in the future more and more texts created by an AI are placed on the web without being reflected, the AI will form its knowledge from its own derivations. The originals fade with the time there they by ever smaller references in weighting lose. If someone would flood the Internet with topics like flat earth and lizard people, programs like ChatGPT would inevitably react to it and let this flow into their texts. These texts could be published then either independently by the AI in the net automated or find their spreading by unreflective persons accordingly. We have thus created a spiral that can only be broken if people have not given up their ability to exercise judgment out of convenience.

So we see that the warnings for caution in dealing with AI are not unfounded. Even if I consider scenarios like in the movie WarGames from 1983 [4] as improbable, we should consider very well how far we want to go with the technology of AI. Not that it happens to us like the sorcerer’s apprentice and we have to find out that we cannot master it any more.

References

Links are only visible for logged in users.

Expressions for Source Control Management Systems

Abstract: In the last decades, many standards were established to increase productivity during Software Lifecycle Management. All these techniques and methodologies promise a higher success rate in software projects which could affirm themselves in the case the involved protagonists are willing to follow the instances recommended. Semantic Versioning, for example, addresses the information leak between functional changes, BugFixes and compatibility of existing and future releases of artifacts. Diving deeper into the daily craftsmanship of software projects enables us to identify the Source Control Management Systems (SCM) as a big treasure box. Much information can be extracted from these repositories, which are currently ignored for project analyzing. Expressions on SCM Commit Messages represent a new formalism that is both human-readable and machine-processable. Such a standard also forms a bridge between the code base and the requirements management and release management, since these activities are identified by a freely expandable vocabulary in the SCM. Another advantage of this strategy is the clear and compact expressiveness for development teams. A very practical aspect of my proposal is the easy applicability of the presented solution in real software development projects. As with the Semantic Versioning methodology already mentioned, there are no additional technical requirements to be met, since commit messages are a fundamental function of SCM systems. This paper discuss the option to improve data collection for controlling software projects and knowledge sharing in collaborative teams.

To cite this article: Marco Schulz. Expressions for Source Control Management Systems. American Journal of Software Engineering and Applications. Vol. 11, No. 2, 2022, pp. 22-30. doi: 10.11648/j.ajsea.20221102.11

Download the PDF: https://www.sciencepg.com/journal/paperinfo?journalid=137&doi=10.11648/j.ajsea.20221102.11

1. Introduction

Thinking about SCM systems we have to keep in mind, that since the first roll out of CVS in the early 1990‘s and today, many things have changed. Searching the free online encyclopedia Wikipedia, presents a page ”Comparison of Version Control Software” which contains an overview of version control software of more than 30 SCM tools. This gives an idea why software companies usually have around three or more different SCM systems in work – of course the real amount depends on how many years they are in business.

The possibility to attach every revision in SCM Systems with a commit message allows the developer to inform other users with a short explanation of his work. This feature is extremely helpful by browsing the history manually in search of special code changes. If these commit messages well structured there exist a possibility to grab automated information of project growth. In this paper on expressions is introduced as solution for structured commit messages which could processed by software and also helps developers to resume their work more efficient.

The list of research on SCM is quite overwhelming and covers multiple aspects. The work of Walter F. Tichy on RCS [2] presents a deep fundamental insight into technical aspects of SCM systems. Abdullah Uz Tansel et al. gives in his research a brief history and builds a bridge to nowadays SCM systems [11]. The paper of Christian Bird et al. describes the ideas why companies deal with various SCM solutions [12]. Many existing papers like the one from Filip Van Rysselberghe and Serge Demeyer already identified SCM repositories as a significant information storage [5], which contains more than a simple history of source code. The approach from Louis Glassy to observe the growth of students in the software development process by using SCM techniques [6] demonstrates another method to grab implicit information from SCM. Alongside the fundamental research in software engineering, there exists a great resource of Blogs, articles and books from people who are directly involved in the topic. They describe experiences and best practice to make the next release come true, as referred towards the web resources in the footnotes. A small selection of related practitioners books is also included in the reference list.

Let us take a closer look at how processes for SCM could be improved. For this reason, section II defines the terminology of this paper and talks in detail about merging and branching strategies. Section III remind some basic knowledge on SCM and gives a simple idea about how complex build and deploy pipelines interact. Following this quick journey, section IV draws a picture about real problems that occur in software development projects and explains possible Points of Interest (POI) inside an SCM repository. These fundamentals allow a definition of the vocabulary we introduce in section V. A real world example will demonstrate in VI the cardinality of the expression and gives ideas about its usage. After all, section VII will reflect and summarize these thoughts. The last section talks about ideas how future work could be continued.

Figure 1: Branch and Merge.

The definitions in this section are based on the English dictionary Merriam Webster with a contextual relation to SCM systems. The term Source Control Management System (SCM) is applied in this paper to describe tools like CVS, Subversion (SVN) or Git. Many other names have appeared over the years in literature for this type of tools. All these terms like Version Control System (VCS) or Revision Control System (RCS) are considered as equal to each other.

Artifact “A USUALLY SIMPLE OBJECT (SUCH AS A TOOL OR ORNAMENT) SHOWING HUMAN WORKMANSHIP OR MODIFICATION AS DISTINGUISHED FROM A NATURAL. OBJECT; “ESPECIALLY: AN OBJECT REMAINING FROM A PARTICULAR PERIOD”. In the context of SCM, an artifact is a binary result of the build process. Artifacts can be libraries, applications and so on.

Repository “A PLACE, ROOM, OR CONTAINER WHERE IS DEPOSITED OR STORED”. In software engineering a repository denotes a managed storage. We can distinguish repositories for source code and for binary artifacts.

Revision “A CHANGE OR A SET OF CHANGES THAT CORRECTS OR IMPROVES SOMETHING”. Each successful commit from a user to the SCM represents a change of the internal state in the SCM. These different states are revisions. Subversion for example increments an internal number after each commit [18]. This unique identifier is called revision number. Git on the other hand manages the revision number smarter and creates SHA-1 Hashes from each commit as an identifier [15]. This brings more flexibility for dealing with branches.

Release “TO GIVE PERMISSION FOR PUBLICATION, PERFORMANCE, EXHIBITION, OR SALE OF; ALSO: TO MAKE AVAILABLE TO THE PUBLIC”. A release defines a set of functional assertions for an artifact. When all functions are implemented, a test procedure is started to exclude as many failures as possible. After the termination of testing and corrections, the artifact gets packed for delivery. To distinguish the different versions of an artifact, it gets labeled by a unique version number. By convention, it is not allowed to have more than one artifact with the same version number.

Tag “A DESCRIPTIVE OR IDENTIFYING EPITHET”. -A Tag is a label to a special revision, like a release, and is used as bookmark.

Trunk “THE CENTRAL PART OF ANYTHING”. A trunk is a common convention and means the main branch, where the current development happens [17]. In Git this branch is called master for the local repository and orgin in the remote repository. Branching and Merging is one of the major feature in SCM systems and also a high sophisticated operation. It is not so unusual that developers and also Configuration Managers struggle with this. The paper of Shaun Phillips et al. contains a developer comment about the dealing with SCM and the pain of merging [10].

“We are a team of four senior developers (by which I mean we’re all over 40 with 20+ years each of development experience) and not one of us has had a positive experience in the past with branching the mainline… The branch is easy – it’s the merge at the end that’s painful.”

This shows that even persons with many years of experience need a detailed explanation of a seemingly trivial procedure. A simple understanding how branches typically have to be used and how they represent the evolution of a real software project is of high relevance for this paper. Figure 1 explains the optimal interaction between branches and the trunk which is described by Chuck Walrad and Darrel Strom as Branch by Release Model [3]. In addition to the context of branching and merging there is a version tree sample graph explained by Yongchang Ren et al. in their paper [8].

In order to give a comprehensive explanation of the process we assume a simple Java library project. As build tool Apache Maven is chosen which is successfully used for years by many different commercial and Open Source projects. Maven defines many standards for the software development process and implements them. Its success feature is a highly efficient dependency management.

The information about the artifact version number is managed in the pom.xml, the Maven build file. For this reason the POM has our special attention. In the context of Maven a versions number is labeled SNAPSHOT while it is still under development. This convention allows in collaborative teams the sharing of non official published artifacts. After removing the label SNAPSHOT the artifact is released. By convention it is not possible to have more than one artifact with the same version number. In section III this topic is discussed in more detail. For the moment it is necessary to know that this convention takes effect in collaborative processes. The correct way to share artifacts is the usage of a Repository Manager. The most common Repository Manager is Sonatype Nexus OSS which is used for Maven Central [19] to deliver dependencies. Nexus will refuse the request if a developer tries to publish an already existing release of an artifact. With this infrastructure it is not necessary to transfer binary artifacts to the SCM. This tool chain is a simple example for a highly complex infrastructure to build and deliver software in large companies.

In figure 1 the development starts with version 1.0-SNAPSHOT. After the release of this version, the development of the next version 1.1-SNAPSHOT continues in trunk. The revision of the released version 1.0 gets branched to fix some bugs. The branch will not be created automatically during the release, rather it gets created when there is a need, for example BugFixes. The branch will be named by its minor version 1.0 to stay flexible for further corrections. After a correct BugFix the changes get merged back to trunk and so on. It is very important to keep in mind, that after a release, no new functionality can be added to the versions 1.0.X, only corrections are allowed.

The merging of failure corrections can lead to complications if there already exist deployed versions. When a bug is detected down to an existing version it will be necessary to fix all following versions and increment their version number as part of the correction. For example if there exist released versions 1.0.2, 1.1.1, 1.2.3 & 2.0.1. and the fix has been done in version 1.0.2 it will have to be renamed 1.0.3 for release. The merge direction is always from the lower to the higher version which means that the version numbers of all following involved artifacts have to be increased. By this it can be assured that only fixes will be exchanged and no functionality is moving form an higher to a lower version within the merging process.

In this model the case of parallel feature development is missing. This happens when a very complex functionality is planned and the implementation cannot be finished in one release cycle. This especially often occurs in agile projects with a short time line between releases. Feature Branches address this requirement as well. The process is a simple extension of the Branch by Release Model. The Feature Branch will be created from the trunk and will be named like the feature. To test compatibility this branch at least needs to be merged from the trunk after each release. A merge can also be performed if the trunk provides important new features – whenever necessary.

A very useful advanced usage of branches is the stash command, that comes as build-in with Git. Indeed this feature is not so common but simple and powerful. Imagine a developer is working on some implementation with the urgency of having to deliver a BugFix for another release. He needs to switch his workspace to this branch but the current work needs to be saved without a direct commit to the trunk. The solution is create a branch and check in the current work and hence switch the branch for the fix. After all is done he will have to switch to the stashed branch, finish the work and merge the result to the trunk. An often observed procedure for developers are simultaneous checkouts of different branches and just switching the IDE workspace. By experience in large companies, this is very time consuming and error prone. By the law of Murphy, the only needed branch is the one not present in a local checkout collection.

To get in touch with branch models more profoundly, the website of the Git SCM [20] presents different branching workflows. Also at [21] exists a very detailed explanation for Git branch and merge best practices.

3. Quick Survey on SCM Basics

As described, there exists a huge amount of Source Control Management solutions. Even just picking out the most popular systems, we are able to identify many differences in detail. These may be the reasons why some tools have become more popular than others. Naturally, all of these systems do the job and are based on common ideas. A very early and fundamental work on SCM systems done by Tichy gives a deep insight about the Theory on how an SCM should be constructed [2]. Today, based on the approach of how things are done, we can classify them. Directory and file based systems, like Microsoft Visual Source Safe, are part of the less effective group of SCM. In commercial environments this group has low relevance because quite often it causes inconsistencies of the repository. This leads us to the category of Client-Server solutions. Client-Server SCM systems have two manifestations: centralizedand distributed. SVN is the most famous representative for centralized solutions. In new projects the choice of the day will very often be Git, a very popular distributed SCM tool. In “Transition from Centralized to Decentralized Version Control Systems” the authors describes why decentralized SCM systems are favored by developers [12]. Interviews of developers have shown the benefits and risks of applicated SCM systems. They deliver a well elaborated explanation why distributed SCM has a higher learning curve. This finding is a important principle for dealing with SCM.

SCM systems are designed to handle plain text files, like those used for source code. After a file has undergone configuration management and had an initial transfer into the repository, the system stores only a delta of the changes for every new transaction. With this requirement the repository is more efficient and needs less disk storage. This implies binary files like office documents should not be stored in SCM repositories because the system cannot calculate a delta and will always store a complete new copy of the file, if it has been changed. A solution for dealing with binaries, like dependencies or third party libraries, are Repository Managers which were introduced in section II.

Figure 2: Changes in the POM, based on Semantic Versioning.

At this point some performance issues for SCM have to be taken in consideration. This is of outstanding importance, because it defines how a repository should be organized. Large projects with a code repository up to 1 GB take a long time for a checkout, even though there is only a small subset of files that are chosen. 20 minutes and more are very common. The reason for this effect is the size of the repository itself. When it contains a lot of files it takes more time to calculate the internal tree. The best solution for a high performance repository is: Only text files and just one independent project or module per repository.

In continuation surges question how files are represented in a SCM. As an example we remember the small Java library project with the Maven build logic. The build logic is represented as an XML file and contains the entry <version>. This entry defines the version number of the artifact and starts with an initialization of 1.0.0-SNAPSHOT. The procedure to increase the version number strictly follows the Semantic Versioning. Figure 2 visualizes several steps between two releases. For each revision a label describes the process and the version number show the value in the POM file. This graphic is an extension with a detailed view of figure 1.

In reality things are never like explained in theory. Initial assumption often create a big dilemma in automation processes when it comes to execution. It is very easy to claim, that in a repository, the entry for version in the POM for releases is unique. For example, it means that there should not exist two revisions with a released version 1.0. But where humans work, mistakes will happen. For this reason we have the option to create tags into the SCM. Every revision in the SCM which represents a deployed release, will be tagged with the correct version number. Deployed releases are defined by a successful transfer of the binary artifact into the Repository Manager for collaborative usage.

4. Scenarios on Real Problems

We should focus our activities on special points in respect to the evolution of software projects. It is not useful to pay attention on each single revision. Let us highlight the Points of Interest (POI) and why they are special. In real projects with collaborative teams, it is quite common that a developer breaks the current build. The good news are: when Continuous Integration (CI) is applied in the process, these kind of problems will be detected very quickly and can be solved at the instance of them appearing [16]. But how a developer is able to break a build? This occurs when the changes get committed into the repository and some files are not included in the commit. A repair can easily and fast be done by adding a new commit with the missing files needed. In this case it is very important to realize that only the one who delivered an incomplete package is able to add the missing parts. Problems arise when this happens on a Friday evening and the person responsible is leaving the office for vacations the next two or tree weeks without checking that everything is in order, causing unnecessary pain in the continuation of the project. These things happen much more often than anyone would expect.

Another effect is called fast shots. These small and often repeated commits typically change only a few lines in just one or two files. This happens when a user for some reason is not able to test his code or settings locally on his own machine. A simple scenario could be the manipulation of the CI Server build output without direct access.

A work flow for developers is the usage of particular commits in order to preserve intermediate steps of the work and allow an easy rollback. This procedure is only applicable in distributed systems or in environments without collaboration. The effect is quit similar. It will produce many revisions inside the SCM, which could get summarized to a single revision.

The Continuous Delivery approach for modern Web Applications is a quite different method compared to the classical release process [14]. This technique requires special strategies like the Feature Toggle Pattern [22] and a highly automated deploy pipeline. Also the usage of the SCM system is very advanced. Each feature is developed in its own branch and the Configuration- or Build Manager creates for each deployment a proper Integration Branch. The biggest challenge in this methodology are fast responses towards urgent problems arising. In the worst case it could be necessary to push out very quickly a new deployment with a full or partial rollback. During deployments database changes are very critical. This aspect could be discussed in a further paper. Databases are not implicitly part of the SCM, but there also exist techniques [23] to keep them under configuration management.

Figure 3: Structure of a commit naming.

As mentioned before, a release R inside an SCM is defined by several commits to the SCM. These commits are identified by the revision r. The lowest amount of revisions between two release is one, but there is no limit concerning to the upper boundary. Special Points of Interests inside an SCM are released revisions which can formally defined by (2).

  • R := {r 1, r 2, r 3, r n+1,…, r x } (1)
  • POI:= ∆ Release (R; R + 1) (2)

By this interpretation we are able to develop metrics which show a real project growth and do not just produce an output [13]. The paper of P. Kaur and H. Singh contains a collection of metrics related to their VVCT SCM [9]. An adapted suggestion for possibilities to compare project evolution is:

  1. the amount of BugFix releases in a minor branch,
  2. an count of revisions between two release,
  3. the growth between minor and major release (e.g. Line of Codes),
  4. a direct comparison between the current trunk and a previous release,
  5. two selected releases,
  6. a comparison of an release R and its replacement.

For example the amount of BugFix releases for a minor release allows a conclusion about the quality situation of a project. It is very important to understand the reasons to improve program stability and reduce the number of BugFixes. A classification for changes is described by Swanson [1]. An overview of the project based on these classifications of BugFixes should detect the issues that have to be changed to accomplish high quality.

5. A Vocabulary for SCM Commit Messages

In the early times SCM systems were used for synchronizing source code between developers. Typically users were not paying too much attention to write well formulated explanations about their changes. In many instances they were not leaving any description about what they did. Another extreme was that comments like update build logic frequently appeared in the history. An explanation of everything and nothing without saying what was changed or why. It could either be a version update of an existing library or the addition of a new dependency leading to a heavy time-consuming work in order to identify the points of interest in the commit history. Manual checks between the version with a Diff Tool would be necessary to locate the Line of Code that may have to be changed again. Guidelines have been introduced on how to write a well formulated commit message to solve this problems. A short selection of these guides published on the internet: [24, 25, 26] It was discovered by companies that the approach to apply well formulated descriptions of SCM revisions can improve productivity in teams. By exploring new projects on Source Code Hosting Services like GitHub or Sourceforge the quality of commit messages was increasing in the last years.

Based on these recommendations and the experience gained as of today, a vocabulary should be introduced for writing easier and more efficient commit messages. This simple-to-use standardization could help to visualize the evolution of a project more clearly. By very precise and short explanation of every revision readers do not get flooded with information. This allows analysts to see patterns of process leaks more quickly and increases the team productivity. The usage of a defined structure also allows an automatism to parse the commit messages. The result can generate programmatic presentations of diagrams readable by humans. Naturally this approach is not only limited to SCM. Another usage could be for communication in meetings with strict time limitations, for example in the agile method Scrum.

The vocabulary for SCM Commit Messages follows a defined structure which is shown in figure 3. The composition contains a mandatory first line and includes a FunctionID, label and a short specification. The second and third line is optional and contains the TaskID from the Issue Management System and a description of the more detailed explanation. Our suggestion for the vocabulary covers most SCM work flows. It may will be that some companies need adoptions to implement this solution in their processes. For this reason the definition is flexible and allows extensions.

  • #INIT – the repository or a release.
    • repro:documentation / configuration…
    • archetype:jar / war / ear / pom / zip…
    • version:<version>
  • #IMPLEMENT – a functionality.
    • function:<clazz>
  • #CHANGE – a functionality.
    • function:<clazz>
  • #EXTEND – a functionality.
    • function:<clazz>
    • attach:<clazz>
  • #BUGFIX – a functionality.
    • priority:critical / medium / low / design
  • #REVIEW – an implementation.
    • refactor:<function>
    • analyze:<quality>migrate:<function>
    • format:<source>
  • #RELEASE – an artifact.
    • version:<version>
  • #REVERT – a commit.
    • commit:<id>
  • #BRANCH – create.
    • create:<name>
    • stash:<branch>
  • #MERGE – from another branch.
    • from:<branch>
    • to:<branch>
  • #CLOSE – a branch.
    • branch:<name>

As first entry a FunctionID is recommended and not the TaskID of the Issue Management. This decision is based on the experience that functionality could spread in different tasks. In longtime projects it could happen that for some reason the Issue Management System needs to be replaced by another one. Not all projects are connected to Issue Management, especially when they are small or just a prototype. These circumstances proved to be decisive to define the TaskId as optional and move it to the second line. With a FunctionID it is easier to identify parts that should be linked. Sometimes there exist transfers into the repository that cannot be assigned to a dedicated function. These commits are often related to activities of the Build- and Configuration Manager. As best practice an ID should be established which corresponds to these activities. Some examples related to the defined labels are:

  • [CM-00] INIT;
  • [CM-10] REVIEW;
  • [CM-20] BRANCH;
  • [CM-30] MERGE;
  • [CM-40] RELEASE;
  • [CM-50] build management.

The mightiness of this approach is its simplicity and how it can be included in existing projects. The rule set does not contain any additional complexity and the process is quite easy to understand. A short example will demonstrate the usage and a full example is provided in section VI. A change in the POM file to update the version of the test framework could be commented as follows:

[CM-50] #CHANGE ’function:pom’
<QS-23231>
{Change version number of the dependency JUnit from 4 to 5.0.2}

6. Release Process

The sample project in section II is not only fictive. The Together Platform (TP) available on GitHub [26] was initiated to study techniques on real conditions. Hence Git is the SCM tool of the choice. As client SmartGit is recommended because of platform independence and it offers plentiful advanced functionality.

For better comprehension of our approach of writing commit expressions we use the TP-CORE project, from initialization of the repository to its first release. No TaskIDs for the revisions exist due to the project not being connected to an Issue Management System. We use an excerpt of TP-CORE to demonstrate the approach because between the initial commit and the first published release 1.0.2 exist over 70 revisions in the repository. The project also contains a set of 12 functions which do not need to be included completely in our sample. Only three functions were selected for demonstration:

  • CORE-01 Logger;
  • CORE-02 genericDAO;
  • CORE-05 ApplicationConfiguration.

This cuts the revisions in half and shows enough complexity avoiding readers falling asleep.

The condition for a first release was the implementation of all 12 functionalities. The overall test coverage has reached more than 85%. Code smells detected with checks by Findbugs, Checkstyle, PMD et cetera have been removed. For an facilitate explanation, we add a revision number before the FunctionID. TP-CORE Commit Messages:

01  [CM-00] #INIT ’archtype:jar’
{Initial the repository for Java JAR library.}
02  [CORE-01] #IMPLEMENT ’function:Logger’
{Application wide standard logger.}
03  [CORE-02] #IMPLEMENT
{Generic Data Access Object Pattern for centralized database access.}
04  [CORE-05] #IMPLEMENT ’function:AppConfigDO’
{Domain Object for application configuration.}
05  [CM-10] #REVIEW ’analyze:quality’
{Formatting, fix Checkstyle hints, JavaDoc & test coverage}
06  [CORE-05] #IMPLEMENT ’function:ConfigurationDAO’
{Add the ConfigurationDAO implementation.}
07  [CORE-05] #EXTEND ’attach:tests’
{Create test cases for Bean Validation.}
08  [CORE-01] #EXTEND ’function:Logger’
{Add new Method to detect the configured LogLevel.}
09  [CORE-05] #EXTEND ’function:AppConfigDO’
{Change Primary Key to UUID and extend tests.}
10  [CORE-05] #CHANGE ’function:AppConfigDO’
{Rename to ConfigurationDO and define table indexes.}
11  [CORE-02] #EXTEND ’function:GenericDAO’
{Add flushTable, countEnties and optimize.}
12  [CORE-05] #EXTEND ’attach:tests’
{Update test cases for application configuration.}
13  [CORE-05] #EXTEND ’function:ConfigurationDAO’
{Update the implementation for ConfigurationDAOImpl.}
14  [CORE-01] #EXTEND ’function:Logger’
{Add method for exception handling.}
15  [CORE-05] #EXTEND ’function:ConfigurationDO’
{Add field mandatory.}
16  [CM-10] #REVIEW ’migrate:JUnit’
{Migrate Test cases from JUnit4 to JUnit5.}
17  [CM-10] #REVIEW ’analyze:quality’
{Fix JavaDoc, Checkstyle & Findbugs.}
18  [CM-50] #EXTEND ’function:POM’
{Update SCM connection to GitHub.}
19  [CM-50] #EXTEND ’attach:APIguards’
{Attach annotation for API documentation.}
20  [CORE-05] #REVIEW ’refactor:ConfigurationDO’
{FindBugs: optimize constructor parameters.}
21  [CORE-02] #BUGFIX ’priority:design’
{Fix FindBugs hint: visible modifier.}
22  [CM-50] #EXTEND ’attach:site’
{Extend MVN site configuration.}
23  [CORE-02] #BUGFIX ’priority:high’
{Fix spring DAO configuration.}
24  [CORE-05] #IMPLEMENT ’function:ConfigurationService’
{Implement basic functionality for
ConfigurationService.}
25  [CM-10] #REVIEW ’analyze:quality’
{Remove all compiler warnings, FindBugs,
Checkstyle & PMD Hits.}
26  [CORE-05] #EXTEND
’attach:ConfigurationService’
{A  dd JGiven test scenarios.}
27  [CM-40] #RELEASE ’version:1.0’
{Release artifact to version 1.0}
28  [CM-40] #RELEASE ’version:1.0.1’
{Change POM GroupId to Maven Central conventions.}
29  [CM-00] #INIT ’version:1.1’
{Start implementation of version 1.1.0.}
30  [CM-50] #MERGE ’from:1.0.1’
{Integrate GAV POM changes to trunk.}
31  [CM-40] #RELEASE ’version:1.0.2’
{Include PGP signing.}
32  [CM-20] #CHANGE ’function:Constraints’
{Add Constraints.VERSION to 1.1}
33  [CORE-01] #EXTEND ’function:Logger’
{Default loader for logback.xml configuration files in the application DIR.}

Considering the previous example, we see that a limitation to around 80 – 100 characters for the first line is recommendable. Displaying the history with any client could get very messy if the first line has no size restrictions. The log output of the commit messages does not display the branch and tag operation, a behavior of Git. These revisions do not appear in any history list by browsing GitHub. Revision 28 is a branch based on revision 27. The branch is named as 1.0. Releases are published in consonance with the convention to be labeled, revision 31 tagged as Release 1.0.2. The revisions 28 and 31 are part of branch 1.0.

In this constellation we are able to see an important detail for dealing with branches. A branch will only be created when it is necessary. Usually BugFix branches do not have their own build plans on CI Servers and are managed manually. The primary arguments for this practice are to reduce the administrative overhead for the CI Servers. Companies that orchestrate their applications by web services or modules loose capacities by binding their recourses in this kind of activities.

7. Conclusion

“There is nothing permanent except change.” – Heraclitus

The whole infrastructure of commercial software projects contains a lot of independent fragments which share information over all development cycle. In projects we are overloaded by documentation production processes. The high amount of all this information inhibits profoundly comprehension and handling capabilities. Applications are getting more complex and bigger resulting in the necessity to establish more efficient ways to deal with information accumulation. There exists a giant overhead of managing documents like release notes, release plan, issue management, quality reports, statistics & metrics, documentation, architectural documents and BugFix lists. Typically each tool stores its data in its own structure. This makes changes to other tools, that might fit better, risky and expensive.

Companies know the effect that developers feel uncomfortable having to track their work in Issue Management tools like JIRA resulting in them trying to hide their part of the work flow as much as possible. Tasks will be opened up when they are almost done or already finished. The information on how many project days were spent for a function covers more the expectations and less the reality with the intent that developers can escape a bit from the daily pressure of productivity. Often developers are forced to spend their time with data acquisition for management controlling instead of programming resulting in low cost efficiency of a project and even additional and unplanned costs. Developers dislike this kind of activities because it keeps them away from their actual work: development. This is what makes the simple approach towards human readable and machine processable commit messages attractive and more convenient. The most important fact is that no extra costs are generated applying this method to existing processes.

We are enabled to generate several reports based on real data if SCM repositories can be populated with additional information. Impact assessments could be more efficient and accurate when they are created by facts and not emotionally blended.

Future Work

The idea to make information inside SCM systems more transparent is not just limited to commit messages. Another obvious point for future research is the history command. In the paper of Abram Hindle and Daniel M. German a query language for source control is introduced [7]. The idea of SCM Language could be picked up and transformed applying it to a specific solution. This work would use the Domain Driven Development paradigm to model an own SCM language based on Domain Specific Language (DSL) concepts – leading to the discovery of real world DSL solutions allowing for quick construction of a viable prototype or application based upon certain specifications.

Also a point which boldly comes to mind after reading the paper of Fischer et al., is the inclusion of released information into SCM [4]. This approach should not fully be automated due to its requirement of an advanced knowledge about branching and merging. A small self written extension could be a probable solution. A short tutorial 17 for Git suggests certain possibilities.

Acknowledgements

Special thanks to Joachim Reiter and Harald Kaufmann for spending their time to review this document. Their feedback was very productive.

References

[1] E. Burton Swanson, 1978, The Dimension of Maintenance.
[2] Walter F. Tichy, 1985, RCS – A System for Version Control.
[3] Chuck Walrad and Darrel Strom, 2002, The Importance of Branching Models in SCM.
[4] Michael Fischer, Martin Pinzger, Harald Gall, 2003, Populating a Release History Database from Version Control and Bug Tracking Systems.
[5] Filip Van Rysselberghe and Serge Demeyer, 2004, Mining Version Control Systems for FACs (Frequently Applied Changes).
[6] Louis Glassy, 2005, Using version control to observe student software development processes.
[7] Abram Hindle and Daniel M. German, 2005, SCQL: a formal model and a query language for source control.
[8] Yongchang Ren, Tao Xing, Qiang Quan, Ying Zhao, 2010, Software Configuration Management of Version Control Study Based on Baseline.
[9] Parminder Kaur and Hardeep Singh, 2011, A Model for Versioning Control Mechanism in Component- Based Systems
[10] Shaun Phillips, Jonathan Sillito, Rob Walker, 2011, Branching and merging: an investigation into current version control practices.
[11] Abdullah Uz Tansel and Ali Koc, 2011, A Survey of Version Control Systems.
[12] Christian Bird et al., 2014, Transition from Centralized to Decentralized Version Control Systems A Case Study on Reasons, Barriers, and Outcomes.
[13] Norman E. Fenton and Shari Lawrence Pfieeger, 1997, PWS Publishing Company, Software Metrics – A Rigorous and Practical Approach 2nd Edition, ISBN O·534·95425·1.
[14] Jez Humble and David Farley, 2010, Addison-Wesley, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation, ISBN 0-321-60191-2.
[15] Scott Chacon and Ben Straub, 2014, Apress, Pro Git 2nd Edition, ISBN 978-1-4842-0077-3.
[16] Mike Clark, 2004, The Pragmatic Bookshelf, Pragmatic Project Automation, ISBN 0-9745140-3-9.
[17] Dave Thomas and Andy Hunt, 2003, The Pragmatic Bookshelf, Pragmatic Version Control with CVS, ISBN 0-9745140-0-4.
[18] Mike Mason, 2010, The Pragmatic Bookshelf, Pragmatic Guide to Subversion, ISBN 1-934356-61-1.
[19] https://search.maven.org
[20] https://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows
[21] https://nvie.com/posts/a-successful-git-branching-model/
[22] https://www.martinfowler.com/articles/feature-toggles.html
[23] https://flywaydb.org
[24] https://chris.beams.io/posts/git-commit/
[25] http://who-t.blogspot.mx/2009/12/on-commit-messages.html
[26] https://github.com/ElmarDott/TP-CORE/

Biography

Marco Schulz, also kown by his online identity Elmar Dott is an independent consultant in the field of large Web Application, generally based on the JavaEE environment. His main working field is Build-, Configuration- & Release-Management as well as software architecture. In addition his interests cover the full software development process and the discovery of possibilities to automate them as much as possible. Over the time of the last ten years he has authored a variety of technical articles for different publishers and speaks on various software development conferences. He is also the author of the book “Continuous Integration with Jenkins” published 2021 by Rheinwerk.

Learn to walk with Docker and PostgreSQL

After some years the virtualization tool Docker proofed it’s importance for the software industry. Usually when you hear something about virtualization you may could think this is something for administrators and will not effect me as a developer as much. But wait. You’re might not right. Because having some basic knowledge about Docker as a developer will helps you in your daily business.

Step 0: create a local bridged network

docker network create -d bridge --subnet=172.18.0.0/16 services
Bash

The name of the network is services an bind to the IP address range 172.18.0.0 to 172.18.0.255. You can proof the success yourself by typing:

docker network ls
Bash

An output like the one below should appear:

NETWORK ID     NAME       DRIVER    SCOPE
ac2f58175229   bridge     bridge    local
a01dc5513882   host       host      local
1d3d3ac42a40   none       null      local
82da585ee2df   services   bridge    local
Bash

The network step is important, because it defines a permanent connection, how applications need to establish a connect with the PostgreSQL DBMS. If you don’t do this Docker manage the IP address and when you run multiple containers on your machine the IP addresses could changed after a system reboot. This depends mostly on the order how the containers got started.

Step 1: create the container and initialize the database

docker run -d --name pg-dbms --restart=no \
--net services --ip 172.18.0.20 \
-e POSTGRES_PASSWORD=s3cr3t \
-e PGPASSWORD=s3cr3t \
postgres:11
Bash

If you wish that your PostgreSQL is always up after you restart your system, you should change the restart policy form no to always. The second line configure the network connection we had define in step 0. After you created the instance pg-dbms of your PostgreSQL 11 Docker image, you need to cheek if it was success. This you can do by the

docker ps -a
Bash

command. When your container is after around 30 seconds still running you did everything right.

Step 2: copy the initialized database directory to a local directory on your host system

docker cp pg-dbms:/var/lib/postgresql/data /home/ed/postgres
Bash

The biggest problem with the current container is, that all data will got lost, when you erase the container. This means wen need to find a way how to save this data permanently. The easiest way is to copy the data directory from your container to an directory to your host system. The copy command needs tow parameters source and destination. for the source you need to specify the container were you want to grab the files. in our case the container is named pg-dbms. The destination is a PostgreSQL folder in the home directory of the user ed. If you use Windows instead of Linux it works the same. Just adapt the directory path and try to avoid white-spaces. When the files appeared in the defined directory you’re done with this step.

Step 3: stop the current container

docker stop pg-dbms
Bash

In the case you wish to start a container, just replace the word stop for the word start. The container we created to grab the initial files for the PostgreSQL DBMS we don’t need no longer, so we can erase it, but to do that as first the running container have to be stopped.

Step 4: start the current container

docker start pg-dbms
Bash

After the container is stopped we are able to erase it.

Step 5: recreate the container with an external volume

docker run -d --name pg-dbms --restart=no \
--net services --ip 172.18.0.20 \
-e POSTGRES_PASSWORD=s3cr3t \
-e PGPASSWORD=s3cr3t \
-v /media/ed/memory/pg:/var/lib/postgresql/data \
postgres:11
Bash

Now we can link the directory with the exported initial database to a new created PostgreSQL container. that’s all. The big benefit of this activities is, that now every database we create in PostgreSQL and the data of this database is outside of the docker container on our local machine. This allows a much more simpler backup and prevent losing information when a container has to be updated.

If you have instead of PostgreSQL other images where you need to grab files to reuse them you can use this tutorial too. just adapt to the image and the paths you need. The procedure is almost the same. If you like to get to know more facts about Docker you can watch also my video Docker Basics in less then 10 Minutes. In the case you like this short tutorial share it with your friends and colleagues. To stay informed don’t forget to subscribe to my newsletter.

Links sind nur für eingeloggte Nutzer sichtbar.

Modern Times (for Configuration Manager)

Heavy motivation to automate everything, even the automation itself, is the common understanding of the most DevOps teams. There seems to be a dire necessity to automate everything – even automation itself. This is common understanding and therefore motivation for most DevOps teams. Let’s have a look on typical Continuous Stupidities during a transformation from a pure Configuration Management to DevOps Engineer.

In my role as Configuration and Release Manager, I saw in close to every project I joined, gaps in the build structure or in the software architecture, I had to fix by optimizing the build jobs. But often you can’t fix symptoms like long running build scripts with just a few clicks. In his post I will give brief introduction about common problems in software projects, you need to overcome before you really think about implementing a DevOps culture.

  1. Build logic can’t fix a broken architecture. A huge amount of SCM merging conflicts occur, because of missing encapsulation of business logic. A function which is spread through many modules or services have a high likelihood that a file will be touched by more than one developer.
  2. The necessity of orchestrated builds is a hint of architectural problems.Transitive dependencies, missing encapsulation and a heavy dependency chain are typical reasons to run into the chicken and egg problem. Design your artifacts as much as possible independent.
  3. Build logic have developed by Developers, not by Administrators. Persons which focused in Operations have different concepts to maintain artifact builds, than a software developer. A good anti pattern example of a build structure is webMethofs of Software AG. They don‘ t provide a repository server like Sonatype Nexus to share dependencies. The build always point to the dependencies inside a webMethods installation. This practice violate the basic idea of build automation, which mentioned in the book book ‚Practices of an Agile Developer‘ from 2006.
  4. Not everything at once. Split up the build jobs to specific goals, like create artifact, run acceptance tests, create API documentation and generate reports. If one of the last steps fail you don’t need to repeat everything. The execution time of the build get dramatically reduced and it is easier to maintain the build infrastructure.
  5. Don’t give to much flexibility to your build infrastructure. This point is strongly related to the first topic I explains. When a build manager have less discipline he will create extremely complex scripts nobody is able to understand. The JavaScript task runner Grunt is a example how a build logic can get messy and unreadable. This is one of the reason, why my favorite build tool for Java projects is always decided to Maven, because it takes governance of understandable builds.
  6. There is no requirement to automate the automation. By definition have complex automation levels higher costs than simple tasks. Always think before, about the benefits you get of your automation activities to see if it make sens to spend time and money for it.
  7. We do what we can, but can we what we do? Or in the words by Gardy Bloch „A fool with a tool is still a fool“. Understand the requirements of your project and decide based on that which tool you choose. If you don’t have the resources even the most professional solution can not support you. If you understood your problem you are be able to learn new professional advanced processes.
  8. Build logic have run first on the local development environment. If your build runs not on your local development machine than don’t call it build logic. It is just a hack. Build logic have to be platform and IDE independent.
  9. Don’t mix up source repositories. The organization of the sources into several folders inside a huge directory, creates just a complex build whiteout any flexibility. Sources should structured by technology or separate independent modules.

Many of the point I mentioned can understood by comparing the current Situation in almost every project. The solution to fix the things in a healthy manner is in the most cases not that complicated. It needs just a bit of attention and well planning. The most important advice I can give is follow the KISS principle. Keep it simple, stupid. This means follow as much as possible the standard process without modifications. You don’t need to reinvent the wheel. There are reasons why a standard becomes to a standard. Here is a short plan you can follow.

  • First: understand the problem.
  • Second: investigate about a standard solution for the process.
  • Third: develop a plan to apply the solution in the existing process landscape. This implies to kick out tools which not support standard processes.

If you follow step by step you own pan, without jumping to more far ten the ext point, you can see quite fast positive results.

By the way. If you think you like to have a guiding to reach a success DevOps process, don’t hesitate to contact me. I offer hands on Consulting and also training to build up a powerful DevOps team.


InPerson (remote) Training’s (Courses) Deutsch / English

A Fool with a Tool is still a Fool

Even though considerable additional effort has been expended on testing in recent years in order to improve the quality of software projects [1], the path to continuously repeatable successes cannot be taken for granted. Stringent and targeted management of all available resources was and still is indispensable for reproducible success.

(c) 2016 Marco Schulz, Java aktuell Ausgabe 4, S.14-19
Original article translated from Deutsch

It is no secret that many IT projects are still struggling to reach a successful conclusion. One might well think that the many new tools and methods that have emerged in recent years offer effective solutions for dealing with the situation. However, if one takes a look at current projects, this impression changes.

The author has often been able to observe how this problem was supposed to be mastered by introducing new tools. Not infrequently, the efforts ended in resignation. The supposed miracle solution quickly turned out to be a heavyweight time robber with an enormous amount of self-management. The initial euphoria of all those involved quickly turned into rejection and not infrequently culminated in a boycott of its use. It is therefore not surprising that experienced employees are skeptical of all change efforts for a long time and only deal with them when they are foreseeably successful. Because of this fact, the author has chosen as the title for this article the provocative quote from Grady Booch, a co-founder of UML.

Companies often spend too little time establishing a balanced internal infrastructure. Even the maintenance of existing fragments is often postponed for various reasons. At the management level, companies prefer to focus on current trends in order to attract customers who expect a list of buzzwords in response to their RFP. Yet Tom De Marco already described it in detail in the 1970s [2]: People make projects (see Figure 1).

We do what we can, but can we do anything?

The project, despite best intentions and intensive efforts find a happy end, is unfortunately not the rule. But when can one speak of a failed project in software development? An abandonment of all activities due to a lack of prospects of success is of course an obvious reason, but in this context it is rather rare. Rather, one gains this insight during the post-project review of completed orders. In controlling, for example, weak points come to light when determining profitability.

The reasons for negative results are usually exceeding the estimated budget or the agreed completion date. Usually, both conditions apply at the same time, as the endangered delivery deadline is countered by increasing personnel. This practice quickly reaches its limits, as new team members require an induction period, visibly reducing the productivity of the existing team. Easy-to-use architectures and a high degree of automation mitigate this effect somewhat. Every now and then, people also move to replace the contractor in the hope that new brooms sweep better.

A quick look at the top 3 list of major projects that have failed in Germany shows how a lack of communication, inadequate planning and poor management have a negative impact on the external perception of projects: Berlin Airport, Hamburg’s Elbe Philharmonic Hall and Stuttgart 21. Thanks to extensive media coverage, these undertakings are sufficiently well known and need no further explanation. Even if the examples cited do not originate from information technology, the recurring reasons for failure due to cost explosion and time delay can be found here as well.

Figure 1: Problem solving – “A bisserl was geht immer”, Monaco Franze

The will to create something big and important is not enough on its own. Those responsible also need the necessary technical, planning, social and communication skills, coupled with the authority to act. Building castles in the air and waiting for dreams to come true does not produce presentable results.

Great success is usually achieved when as few people as possible have veto power over decisions. This does not mean that advice should be ignored, but every possible state of mind cannot be taken into account. This makes it all the more important for the person responsible for the project to have the authority to enforce his or her decision, but not to demonstrate this with all vigor.

It is perfectly normal for a decision-maker not to be in control of all the details. After all, you delegate implementation to the appropriate specialists. Here’s a brief example: When the possibilities for creating larger and more complex Web applications became better and better in the early 2000s, the question often came up in meetings as to which paradigm should be used to implement the display logic. The terms “multi-tier”, “thin client” and “fat client” dominated the discussions of the decision-making bodies at that time. Explaining the advantages of different layers of a distributed web application to the client was one thing. But to leave it up to a technically savvy layman to decide how to access his new application – via browser (“thin client”) or via a separate GUI (“fat client”) – is simply foolish. Thus, in many cases, it was necessary to clear up misunderstandings that arose during development. The narrow browser solution not infrequently turned out to be a difficult technology to master, because manufacturers rarely cared about standards. Instead, one of the main requirements was usually to make the application look almost identical in the most popular browsers. However, this could only be achieved with considerable additional effort. Similar observations were made during the first hype of service-oriented architectures.

The consequence of these observations shows that it is indispensable to develop a vision before the start of the project, the goals of which also correspond to the estimated budget. A reusable deluxe version with as many degrees of freedom as possible requires a different approach than a “we get what we need” solution. It’s less about getting lost in the details and more about keeping the big picture in mind.

Particularly in German-speaking countries, companies find it difficult to find the necessary players for successful project implementation. The reasons for this may be quite diverse and could be due, among other things, to the fact that companies have not yet understood that experts rarely want to talk to poorly informed and inadequately prepared recruitment service providers.

Getting things done!

Successful project management is not an arbitrary coincidence. For a long time, an insufficient flow of information due to a lack of communication has been identified as one of the negative causes. Many projects have their own inherent character, which is also shaped by the team that accepts the challenge in order to jointly master the task set. Agile methods such as Scrum [3], Prince2 [4] or Kanban [5] pick up on this insight and offer potential solutions to be able to carry out IT projects successfully.

Occasionally, however, it can be observed how project managers transfer planning tasks to the responsible developers for self-management under the pretext of the newly introduced agile methods. The author has frequently experienced how architects have tended to see themselves in day-to-day implementation work instead of checking the delivered fragments for compliance with standards. In this way, quality cannot be established in the long term, since the results are merely solutions that ensure functionality and, because of time and cost pressures, do not establish the necessary structures to ensure future maintainability. Agile is not a synonym for anarchy. This setup likes to be decorated with an overloaded toolbox full of tools from the DevOps department and already the project is seemingly unsinkable. Just like the Titanic!

It is not without reason that for years it has been recommended to introduce a maximum of three new technologies at the start of a project. In this context, it is also not advisable to always go for the latest trends right away. When deciding on a technology, the appropriate resources must first be built up in the company, for which sufficient time must be planned. The investments are only beneficial if the choice made is more than just a short-lived hype. A good indicator of consistency is extensive documentation and an active community. These open secrets have been discussed in the relevant literature for years.

However, how does one proceed when a project has been established for many years, but in terms of the product life cycle a swing to new techniques becomes unavoidable? The reasons for such an effort may be many and vary from company to company. The need not to miss important innovations in order to remain competitive should not be delayed for too long. This consideration results in a strategy that is quite simple to implement. Current versions are continued in the proven tradition, and only for the next major release or the one after that is a roadmap drawn up that contains all the necessary points for a successful changeover. For this purpose, the critical points are worked out and tested in small feasibility studies, which are somewhat more demanding than a “hello world” tutorial, to see how an implementation could succeed. From experience, it is the small details that can be the crumbs on the scale to determine success or failure.

In all efforts, the goal is to achieve a high degree of automation. Compared to constantly recurring tasks that have to be performed manually, automation offers the possibility of producing continuously repeatable results. However, it is in the nature of things that simple activities are easier to automate than complex processes. In this case, it is important to check the cost-effectiveness of the plans beforehand so that developers do not indulge completely in their natural urge to play and also work through unpleasant day-to-day activities.

He who writes stays

Documentation, the vexed topic, spans all phases of the software development process. Whether for API descriptions, the user manual, planning documents for the architecture or learned knowledge about optimal procedures – describing is not one of the favored tasks of all protagonists involved. It can often be observed that the common opinion seems to prevail that thick manuals stand for extensive functionality of the product. However, long texts in a documentation are more of a quality defect that tries the reader’s patience because he expects precise instructions that get to the point. Instead, they receive vague phrases with trivial examples that rarely solve problems.

Figure 2: Test coverage with Cobertura

This insight can also be applied to project documentation and has been detailed by Johannes Sidersleben [6], among others, under the metaphor about Victorian novellas. Universities have already taken up these findings. For example, Merseburg University of Applied Sciences has established the course of study “Technical Writing” [7]. It is hoped to find more graduates of this course in the project landscape in the future.

When selecting collaborative tools as knowledge repositories, it is always important to keep the big picture in mind. Successful knowledge management can be measured by how efficiently an employee finds the information they are looking for. For this reason, company-wide use is a management decision and mandatory for all departments.

Information has a different nature and varies both in its scope and in how long it remains current. This results in different forms of presentation such as wiki, blog, ticket system, tweets, forums or podcasts, to list just a few. Forums very optimally depict the question and answer problem. A wiki is ideal for continuous text, such as that found in documentation and descriptions. Many webcasts are offered as video, without the visual representation adding any value. In most cases, a well-understood and properly produced audio track is sufficient to distribute knowledge. With a common and standardized database, completed projects can be compared efficiently. The resulting knowledge offers a high added value when making forecasts for future projects.

Test & Metrics – the measure of all things

Just by skimming the Quality Report 2014, one quickly learns that the new trend is “software testing”. Companies are increasingly allocating contingents for this, which take up a volume similar to the expenditure for the implementation of the project. Strictly speaking, one extinguishes fire with gasoline at this point. On closer inspection, the budget is already doubled at the planning stage. It is often up to the skill of the project manager to find a suitable declaration for earmarked project funds.

Only your consequent check of the test case coverage by suitable analysis tools ensures that in the end sufficient testing has been done. Even if one may hardly believe it: In an age in which software tests can be created more easily than ever before and different paradigms can be combined, extensive and meaningful test coverage is rather the exception (see Figure 2).

It is well known that it is not possible to prove that software is free of errors. Tests are only used to prove a defined behavior for the scenarios created. Automated test cases are in no way a substitute for manual code review by experienced architects. A simple example of this are nested “try catch” blocks that occur from time to time in Java, which have a direct effect on the program flow. Sometimes nesting can be intentional and useful. In this case, however, the error handling is not limited to the output of the stack trace in a log file. The cause of this programming error lies in the inexperience of the developer and the bad advice of the IDE at this point, for an expected error handling to enclose the instruction with an own “try catch” block instead of supplementing the existing routine by an additional “catch” statement. To want to detect this obvious error by test cases is an infantile approach from an economic point of view.

Typical error patterns can be detected inexpensively and efficiently by static test procedures. Publications that are particularly concerned with code quality and efficiency in the Java programming language [8, 9, 10] are always a good starting point for developing your own standards.

The consideration of error types is also very informative. Issue tracking and commit messages in SCM systems of open source projects such as Liferay [11] or GeoServer [12] show that a larger part of the errors concern the graphical user interface (GUI). These are often corrections of display texts in buttons and the like. The reporting of primarily display errors can also lie in the perception of the users. For them, the behavior of an application is usually a black box, so they deal with the software accordingly. It is not at all wrong to assume that the application has few errors when the number of users is high.

The usual computer science figures are software metrics that can give management a sense of the physical size of a project. Used correctly, such an overview provides helpful arguments for management decisions. For example, McCabe’s [13] cyclic complexity can be used to derive the number of test cases needed. Also statistics about the Lines of Code and the usual counts of packages, classes and methods show the growth of a project and can provide valuable information.

A very informative processing of this information is the project Code-City [14], which visualizes such a distribution as a city map. It is impressive Figure 3: Maven JDepend Plugin – Numbers with little significance to recognize where dangerous monoliths can arise and where orphaned classes or packages occur.

Figure 3: Maven JDepend plugin – numbers with little meaning

Conclusion

In day-to-day business, one is content to spread hectic bustle and put on a stressed face. By producing countless meters of paper, personal productivity is subsequently proven. The energy consumed in this way could be used much more sensibly through a consistently considered approach.

Loosely based on Kant’s “Sapere Aude”, simple solutions should be encouraged and demanded. Employees who need complicated structures to emphasize their own genius in the team may not be supporting pillars on which joint successes can be built. Cooperation with unteachable contemporaries is quickly reconsidered and, if necessary, corrected.

Many roads lead to Rome – and Rome was not built in a day. However, it cannot be denied that at some point the time has come to break ground. The choice of paths is not an undecidable problem either. There are safe paths and dangerous trails on which even experienced hikers have their fair share of trouble reaching their destination safely.

For successful project management, it is essential to lead the pack on solid and stable ground. This does not fundamentally rule out unconventional solutions, provided they are appropriate. The statement in decision-making bodies: “What you are saying is all correct, but there are processes in our company to which your presentation cannot be applied” is best rebutted with the argument: “That is quite correct, so it is now our task to work out ways of adapting the company processes in line with known success stories, instead of spending our time listing reasons for keeping everything the same. I’m sure you’ll agree that the purpose of our meeting is to solve problems, not ignore them.” … more voice

References

Links are only visible for logged in users.

Automation options in software configuration management

Software development offers some extremely efficient ways to simplify recurring tasks through automation. The elimination of tedious, repetitive, monotonous tasks and a resulting reduction in the frequency of errors in the development process are by no means all facets of this topic.

(c) 2011 Marco Schulz, Materna Monitor, Ausgabe 1, S.32-34
Original article translated from Deutsch

The motivation for establishing automation in the IT landscape is largely the same. Recurring tasks are to be simplified and solved by machine without human intervention. The advantages are fewer errors in the use of IT systems, which in turn reduces costs. As simple and advantageous as the idea of processes running independently sounds, implementation is less trivial. It quickly becomes clear that for every identified possibility of automation, implementation is not always feasible. Here, too, the principle applies: the more complex a problem is, the more complex its solution will be.

In order to weigh up whether the economic effort to introduce certain automatisms is worthwhile, the costs of a manual solution must be multiplied by the factor of the frequency of this work to be repeated. These costs must be set against the expenses for the development and operation of the automated solution. On the basis of this comparison, it quickly becomes clear whether a company should implement the envisaged improvement.

Tools support the development process

Especially in the development of software projects, there is considerable potential for optimization through automatic processes. Here, developers are supported by a wide range of tools that need to be skillfully orchestrated. Configuration and release management in particular deals in great detail with the practical use of a wide variety of tools for automating the software development process.

The existence of a separate build logic, for example in the form of a simple shell script, is already a good approach, but it does not always lead to the desired results. Platform-independent solutions are necessary for such cases, since development is very likely to take place in a heterogeneous environment. An isolated solution always means increased adaptation and maintenance effort. Finally, automation efforts should simplify existing processes. Current build tools such as Maven and Ant take advantage of this platform independence. Both tools encapsulate the entire build logic in separate XML files. Since XML has already established itself as a standard in software development, the learning curve is steeper than with rudimentary solutions.

Die Nutzung zentraler Build-Logiken bildet die Grundlage für weitere Automatismen während der Entwicklungsarbeit. Einen Aspekt bilden dabei automatisierte Tests in Form von UnitTests in einer Continuous-Integration-(CI)-Umgebung. Eine CI-Lösung fügt alle Teile einer Software zu einem Ganzen zusammen und arbeitet alle definierten Testfälle ab. Konnte die Software nicht gebaut werden oder ist ein Test fehlgeschlagen, wird der Entwickler per E-Mail benachrichtigt, um den Fehler schnell zu beheben. Moderne CI-Server werden gegen ein Versionsverwaltungssystem, wie beispielsweise Subversion oder Git, konfiguriert. Das bewirkt, dass der Server ein Build erst dann beginnt, wenn auch tatsächlich Änderungen im Sourcecode gemacht wurden.

Complex software systems usually use dependencies on external components (libraries) that cannot be influenced by the project itself. The efficient management of the artifacts used in the project is the main strength of the build tool Maven, which has contributed to its strong distribution. When used properly, this eliminates the need to archive binary program parts within the version control system, resulting in smaller repositories and shorter commit times (successful completion of a transaction). New versions of the libraries used can be incorporated and tried out more quickly without error-prone manual copy actions. Libraries developed in-house can be easily distributed in a protected manner in the company network with the use of an own repository server (Apache Nexus) in the sense of reuse.

When evaluating a build tool, the possibility of reporting should not be neglected. Automated monitoring of code quality using metrics, for example through the Checkstyle tool, is an excellent tool for project management to realistically assess the current status of the project.

Not too much new technology

With all the possibilities for automating processes, several paths can be taken. It is not uncommon for development teams to have long discussions about which tool is particularly suitable for the current project. This question is difficult to answer in general terms, since every project is unique and the advantages and disadvantages of different tools must be compared with the project requirements.

In practical use, the limitation to a maximum of two novel technologies in the project has proven successful. Whether a tool is suitable is also determined by whether people with the appropriate know-how are available in the company. A good solution is a list approved by management with recommendations of the tools that are already in use or can be integrated into the existing system landscape. This ensures that the tools used remain clear and manageable.

Projects that run for many years must undergo modernization of the technologies used at longer intervals. In this context, suitable times must be found to migrate to the new technology with as little effort as possible. Sensible dates to switch to a newer technology are, for example, a change to a new major release of the own project. This procedure allows a clean separation without having to migrate old project statuses to the new technology. In many cases, this is not easily possible.

Conclusion

The use of automatisms for software development can, if used wisely, energetically support the achievement of the project goal. As with all things, excessive use is associated with some risks. The infrastructure used must remain understandable and controllable despite all the technologization, so that project work does not come to a standstill in the event of system failures.