Test First?

java Aktuell 2024.03

When I started test-driven programming over 10 years ago, I was aware of many different concepts in theory. But this approach of first writing test cases and then implementing them was somehow not the way I got on well with. To be honest, this is still the case today. So I found an adaptation of Kent Beck’s TDD paradigm that works for me. But first things first. Perhaps my approach is also quite helpful for one or the other.

I originally come from environments for highly scalable web applications to which all the great theories from the university cannot be easily applied in practice. The main reason for this is the high complexity of such applications. On the one hand, various additional systems such as in-memory cache, database and identity and access management (IAM) are part of the overall system. On the other hand, many modern frameworks such as OR Mapper hide complexity behind different access layers. As developers, we need to master all of these things. That is why there are robust, and practice proven solutions that are well known but rarely used. Kent Beck is one of the most important voices for the practical use of automated software testing.

If we want to get involved with the concept of TDD, it is important not to put too much weight on every character. Not everything is set in stone. What is important is the result at the end of the day. For this reason, it is essential to keep the objective of all efforts in mind in order to achieve personal added value. So let’s start by looking at what we want to achieve in the first place.

Success proves us right

When I first started out as a developer, I needed constant feedback on whether what I was putting together was really working. I mostly generated this feedback by spreading during my implementation countless console outputs on the one hand and on the other hand I always tried to integrate everything into a user interface and then ‘click through’ manually. Basically a very cumbersome test setup, which then has to be removed again at the end. If later bug fixes had to be made, the whole procedure started all over again. Everything was somehow unsatisfactory and far removed from a productive way of working. Somehow this had to be improved without having to reinvent yourself every time.

Finally, my original approach has exactly two significant weaknesses. The most obvious one is the commenting in and out of debug information via the console.

But the second point is much more serious. Because all the knowledge acquired about this particular implementation is not preserved. It is therefore in danger of fading over time and ultimately being lost. However, such specialized knowledge is extremely valuable for many subsequent process steps in software development. By this I explicitly mean the topic of quality. Refactoring, code reviews, bug fixes and change requests are just some of the possible examples where in-depth detailed knowledge is required.

For me personally, there is also the fact that monotonously repetitive work quickly tires me out and I would like to avoid it. Clicking through an application again and again with the same test procedure is a far away from what constitutes a fulfilling working day for me. I want to discover new things. But I can only do that if I’m not trapped in the past.

Conference Talk

But they dare to do something

But before I go into how I have spiced up my day-to-day development work with TDD, I have to say a few words about responsibility and courage. In conversations others told me frequently that I am right, but they can’t take action to follow my recommendations because the project manager or some other superior doesn’t give a green light.

Such an attitude is extremely unprofessional in my eyes. I don’t ask an marketing manager which algorithm terminate as best. He simply has no idea what I’m talking about, because it is not his area of responsibility. A project manager who speaks out against test-driven work in the development team has also missed his job. Nowadays, test frameworks are so well integrated into the build environment that even inexperienced people can prepare for TDD in a matter of moments. It is therefore not necessary to make a big deal of the project. I can promise that even the first attempts will not take any longer than with the original approach. On the contrary, there will be a noticeable increase in productivity very quickly.

The first stage of evolution

As already mentioned, logging is a central part of test-driven development for me. Whenever it makes sense, I try to output the status of objects or variables on the console. If we use the means provided by the programming language used for this, this means that we must at least comment out this system output after the work has been done and comment it in again later when searching for errors. A redundant and error-prone procedure.

If, on the other hand, we use a logging framework right from the start, we can confidently leave the debug information in the code and deactivate it later in productive operation via the setting log level.

I also use logging as a tracer. This means that each constructor of a class writes a corresponding log entry by the log level info while it is being called. This allows me to see the order in which objects are instantiated. From time to time I have also become aware of the excessively frequent instantiation of a single object. This is helpful for performance and memory optimization measures.

I log errors that are thrown during exception handling as errors or warnings, depending on the context. This is a very helpful tool for tracking down errors later in operation.

So if I have a database access, I write a log output in the log level debug as the associated SQL was assembled. If this SQL leads to an exception because it contains an error, this exception is written with the log level error. If, on the other hand, a simple search query with correct SQL syntax takes place and the result set is empty, this event is classified as either Debug or Warning, depending on requirements. For example, if it is a login request with an incorrect user name or password, I tend to opt for the Log Level Warning, as this may contain security-related aspects during operation.

In the overall context, I tend to configure the logging for the test case execution very loquaciously and limit myself to a pure console output. During operation, the logging information is written to a log file.

The chicken or egg

Once we have laid the foundations for an additional feedback loop with logging, the next step is to decide what to do next. As already mentioned, I find it very difficult to first write a test case and then find a suitable implementation for it. Many other developers who start with TDD also face this problem.

One thing I can already anticipate is the problem of making sure that an implementation is testable. Once I have the test case, I immediately realize whether what I am creating is really testable. Experienced TDD developers have quickly learned in flesh and blood how testable code should look like. The most important point here is that methods should always have a return value that is preferably not null. This can be achieved, for example, by returning an empty list instead of null.

The requirement to have a return value is due to the way unit test frameworks work. A test case compares the return value of a method with an expected value. The test assertion has different characteristics and can therefore be: equal, unequal, true or false. Of course, there are also different variations here. For example, it may be possible to test methods that have no return value by using exceptions. All these details become clear in a very short time during using TDD. So that everyone can get started immediately without lengthy preparations.

When reading the book Test Driven Development by Example by Kent Beck, we also quickly find an explanation as to why the test cases should be written first. It is a psychological factor. It should help us to cope better with the usual stress that arises in the project. It creates a mental state in us about the status and progress of the current work. It guides us in an iterative process to expand and improve the existing solution step by step via the various test cases.

For those who, like me, have no concrete idea of the final result at the start of an implementation, this approach is difficult to implement. The intended effect of relaxation turns into a negative one. As we humans are all different, we have to find out what makes us tick in order to achieve the best possible result. It’s the same with learning strategies. Some people process information better visually, others more haptically and still others extract everything important from spoken words. So let’s try not to bend ourselves against our nature in order to produce mediocre or poor results.

Drawing the first line

A topic only becomes clear to me while I’m working on it. So I try my hand at an implementation until I need some initial feedback. That’s when I write the first test. This approach automatically gives rise to questions, each of which is worth its own test case. Can I find all available results? What happens if the result set is empty? How can the result set be narrowed down? These are all points that can be noted on a piece of paper and ticked off step by step. I had the idea of writing down a to-do list on a piece of paper a long time before I rode about it in the book by Kent Beck mentioned above. It helps me to preserve quick thoughts without being distracted from what I am currently doing. It also gives me a sense of accomplishment at the end of the day.

Since I don’t wait until I’ve implemented everything to write the first test, this approach also results in an iterative approach. I also notice very quickly if my design is not sufficiently testable, as I receive immediate feedback. This results in my own interpretation of TDD, which is characterized by the permanent change between implementing and writing tests.

As a result of my early TDD attempts, I already noticed a speeding up of my working methods in the first week. I also became more confident. But the way I program also started to change very early on. I have noticed that my code has become more compact and robust. Things that had only become apparent over time emerged during activities such as refactoring and extensions. Failed test cases have saved me from unpleasant surprises.

Start without overzealousness

If we decide to use TDD in an existing project, it is a bad idea to start writing test cases for existing functionality. Apart from the time that has to be planned for this, the result will not fulfill the high expectations.

One of the problems is that you now have to familiarize yourself with each functionality and this is very time-consuming. The quality of the resulting test cases is also inadequate. The problem also arises from missing experience. When the experience is first built up, the quality of the test cases is also not quite optimal and code may also have to be rewritten to make it testable. This creates a lot of risks that are problematic for day-to-day project business.

A proven procedure for introducing TDD is simply to use it for the current implementation you are currently working on. The current state of the current problem is documented by automated tests. Since you are already in familiar territory, you do not have to familiarize yourself with a new topic, so you can concentrate fully on formulating meaningful tests. Apart from the fact that you take responsibility for other people’s work without being asked when you implement test cases for them.

Existing functionality is only supplemented with test cases when errors are corrected. For the correction, you have to deal with the implementation details anyway, so that there is sufficient knowledge here of how a functionality should behave. The resulting tests also document the correction and ensure that the behavior does not change in the future during optimization work.

If you follow this procedure in a disciplined manner, you will not lose yourself in so-called hectic activity, which in turn is the opposite of productivity. In addition, you quickly acquire knowledge of how effective and meaningful tests can be implemented. Only when sufficient experience has been gained and possibly extensive refactoring are planned you can consider how test coverage can be gradually improved for the entire project.

Quality level

Just because test cases are available does not mean that they are meaningful. Nor does a high test coverage prove that a program is error-free. A high test coverage only ensures that a program behaves within the scope of the tests.

So how can you ensure that the existing tests are really an enrichment and have good informative value? The first and, in my opinion, most important point is to keep test cases as short as possible. In concrete terms, this means that a test only answers one explicit question, e.g. What happens if the result set is empty? The test method is then named according to the question. The added value of this approach arises when the test case fails. If the test is very short, it is often possible to get to know from the test method what the problem is without having to spend a lot of time familiarizing yourself with a test case.

Another important point in the TDD procedure is to check the test coverage for lines of code as well as for branches for my implemented functionality. If, for example, I cannot simulate the occurrence of a single condition in an IF statement, this condition can be deleted without hesitation.

Of course, you also have enough dependencies on external libraries in your own project. Now it can happen that a method from this library throws an exception that cannot be simulated by any test case. This is exactly the reason why you should strive for high test coverage but not despair if 100% cannot be achieved. Especially when introducing TDD, a good measure of test coverage greater than 85% is common. As the development team gains experience, this value can be increased up to 95%.

Finally, however, it should be noted that you should not get too carried away. Because it can quickly become excessive and then all the advantages gained are quickly lost. The point is that you don’t write tests that in turn test tests. This is where the cat bites its own tail. This also applies to third-party libraries. No tests are written for these either. Kent Beck is very clear about this: “Even if there are good reasons to distrust other people’s code, don’t test it. External code requires more of your own implementation logic”.

Lessons learned

The lessons that can be learned when trying to achieve the highest possible test coverage are the ones that will have an impact on future programming. The code becomes more compact and robust.

Productivity increases simply due to the fact that error-prone and monotonous work is avoided through automation. There are no additional work steps because old habits are replaced by newer, better ones.

One effect that I have observed time and again is that when individual members of the team have opted for TDD, their successes are quickly recognized. Within a few weeks, the entire team had developed TDD. Each individual according to their own abilities. Some with Test First, others as I have just described. In the end, it’s the result that counts and it was uniformly excellent. When the work is easier and at the end of the day each individual has the feeling that they have also achieved something, this gives the team an enormous motivation boost, which gives the project and the working atmosphere a huge boost. So what are you waiting for? Try it out for yourself right away.

No post found

Goodbye privacy, goodbye liberty

The new terms of conditions for Microsoft services released on October 2023 caused an outcry in the IT world. The reason was a paragraph who said, that now all Microsoft Services are powered by artificial intelligence. This A. I. supposed to be used to detect copyright violations. This includes things like Music, Movies, Graphics, E-Books and Software. In the case this A. I. Detect copyright violations on your system, those files supposed to got deleted automatically from the ‘system’. At this time it is not clear if this rule applies to your own local disk storage or just to the files on the Microsoft Cloud. Microsoft also declared that user which violates the copyright rule will be suspended from all Microsoft Services.

This exclusion has different flavors. The first questions rise up to my mind is what will happened with paid subscriptions like Skype? They will block me and refund my unused credits? A more worst scenario is may I will loose also all my credits and digital properties like access to games and other things. Or paid subscriptions will not be affected? Until now this part not clear.

If you are an Apple user my you could think this things will not affect you but better be sure you may use a Microsoft Service you don’t know its Microsoft. Not every Product include the companies name. Think about it, because who knows if those products spying around on your system. Some applications like Skype, Teams, Edge Browser and Visual Studio Code are available for other platforms like Apple and Linux.

Microsoft also owned the Source Code hosting Platform GitHub and an social network for professionals called LinkedIn. Whit Office 360 you can use the entire Microsoft Office Suite via Web Browser as Cloud solution and all your documents will be stored in the Microsoft Cloud. The same Cloud where US Government institutions like the CIA, NSA and many others keep their files. Well seems it will be a secure place for all your thought you place inside a office document.

This small detail about Office documents leads us to a little side note in the new terms of condition from Microsoft. The fight against hate speech. Whatever that means. Public insults and defamation have always been strictly enforced by the legislature. This means that it is not a trivial offense but rather a criminal offense. So it’s not clear to me what all this talk about hate speech means. Maybe it’s an attempt to introduce public censorship of freedom of expression.

But well back to the side notice from Microsoft term of conditions about hate speech. Microsoft wrote something like: if we detect hate speech we will warn the user and if the violations occur several times the Microsoft account of the user will be deactivated.

If you may think this is just something happen now by Microsoft, be sure many other companies working to introduce equal services. The communication platform Zoom for example included also A. I. techniques to observe the user communication for training purposes.

With all those news is still a big questions needed to be answered: What can I do by myself? The solution is simple. Move back from the digital universe into the real world. Turn the brain back on. Use pen and paper, pay in cash, leave your smartphone at home and there never on the bedside table. If you don’t use it turn it off. Meet your friend physically when ever it is possible and don’t bring your smartphone. There will be no government, no president and no messiahs to bring a change. It’s up to us.

Installing NextCloud with Docker on a Linux Server

On our first LiveStream we explain shortly what is Docker and how fast it can...

How to reduce the size of a PDF document

When you own a big collection of PDF files the used storage space can increasing...

Goodbye privacy, goodbye liberty

The more often it has to be repeated how good our freedom of speech is...

Docker Basics in less than 10 minutes

This short tutorial covers the most fundamental steps to use docker in your development tool...

README – how to

README files have a long tradition in software projects. These originally plain text files contained license information and instructions on how to compile the corresponding artifact from the source code or important notes on installing the program. There is no real standard how to build such a README file.

Since GitHub (acquired by Microsoft in 2018) started its triumphant march as a free code hosting platform for open source projects, there was quite early the function that the README file as the start page of the repository display. All that is required is to create a simple text file called README.md in the root directory of the repository.

In order to be able to structure the README files more clearly a possibility for a simple formatting was looked for. Quickly the markdown notation was chosen, because it is easy to use and can be rendered quite performant. Thus, the overview pages are easier to read for people and can be used as project documentation.

It is possible to link several such markdown files together as project documentation. So you get a kind of mini WIKI that is included in the project and also versioned via Git.

The whole thing became so successful that self-hosting solutions such as GitLab or the commercial BitBucket have also adopted this function.

Now, however, the question arises as to what content is best written in such a README file so that it also represents real added value for outsiders. The following points have become established over the course of time:

  • Short description of the project
  • Conditions under which the source code may be used (license)
  • How to use the project (e.g. instructions for compiling or how to include the library in own projects)
  • Who are the authors of the project and how to contact them
  • What to do if you want to support the project

Meanwhile, so-called badges (stickers) are very popular. These often reference external services such as the free Continuous Integration Server TravisCI. These help to assess the quality of the project.

On GitHub there are also various templates for README files. However, you also have to look a little at the actual circumstances of your own project and judge which information is really relevant for users. But such templates help a lot to find out if you might have missed a point.

The fact that pretty much every manufacturer of source control management server solutions has integrated the function to display the README.md file as the project start page for the code repository means that a README.me is also a useful thing for commercial projects.

Even if the syntax for markdown is easy to learn, it can be more comfortable to use a MARKDOWN editor directly for extensive editing of such files. You should make sure that the preview is displayed correctly and not only a simple syntax highlighting is offered.

Links are only visible for logged in users.

Links are only visible for logged in users.

Docker Basics in less than 10 minutes

This short tutorial covers the most fundamental steps to use docker in your development tool…

Learn to walk with Docker and PostgreSQL

After some years the virtualization tool Docker proofed it’s importance for the software industry. Usually…

README – how to

README files are text files and are formatted in markdown notation. The code hosting portal…

Treasure chest – Part 2

In the previous part of the article treasure chest, I described how the database connection…

Tooltime: SCM-Manager

This article introduces a new tool for the DevOps CI /CD pipeline. The SCM-Manager can…

Version Number Anti-Patterns

In this article, we discuss some best-practices for working with version numbers for software artifacts…

The dark side of artificial intelligence

published also on DZone 10/2023

As a technician, I am quite quickly fascinated by all sorts of things that somehow blink and beep, no matter how useless they may be. Electronic gadgets attract me like moths to the light. For a while now, a new generation of toys has been available to the masses. Artificial intelligence applications, more precisely artificial neural networks. The freely available applications are already doing remarkable things and it is only the beginning of what could get possible in the future. Many people have not yet realized the scope of A.I. based applications. This is not surprising, because what is happening in the A.I. sector will change our lives forever. We can rightly say that we are living in a time that is making history. It will be up to us to decide whether the coming changes will be good or whether they will turn out to be a dystopia.

When I chose artificial intelligence as a specialization in my studies many years ago, the time was still characterized by so-called expert systems. These rule-based systems were highly specialized for their domain and were designed for corresponding experts. The system was supposed to support the expert in making decisions. Meanwhile, we also have the necessary hardware to create much more general systems. If we consider applications like ChatGPT, they are based on neural networks, which allows a very high flexibility in usage. The disadvantage, however, is that we as developers can hardly understand what output a neural network produces for any given input. A circumstance that makes most programmers I know rather take a negative attitude. Because they are no longer master of the algorithm and can only act on the principle of trial and error.

Podcast

Der philosphische Frühschoppen

Künstliche Intelligenz

In den nächsten 30 Minuten befassen wir uns mit philosophischen Fragen wie „Was ist Intelligenz?“ oder „Was bedeutet Lernen?“. Die Antwort auf diese Fragen lässt uns besser verstehen, wo Künstliche Intelligenz heutzutage zum Einsatz kommt, welche Grenzen es gibt und wohin schlussendlich die Reise führt. Continue reading

German Folge 1 – GERMAN

Nevertheless, the power of neural networks is astounding. The time seems gone now when one can make fun of clumsy automated, software-supported translations. Frommy own experience I remember how tedious it was to let the Google Translator translate a sentence from German into Spanish. To get a usable result you could either use the English – Spanish option. Alternatively, if you speak only rudimentary English for vacation use, you could still formulate very simple German sentences that were at least correct in content. The time saved for automatically translated texts is considerable, even though you have to proofread them and adjust some wording if necessary.

As much as I appreciate being able to work with such powerful tools, we have to be aware that there is also a downside. The more we do our daily tasks with A.I. based tools, the more we lose the ability to do these tasks manually in the future. For programmers, this means that over time they will lose their ability to express themselves in source code via A.I. based IDEs. Of course, this is not a process that happens overnight, but is gradual. Once this dependency is created, the question arises whether the available dear tools will remain free of charge or whether existing subscriptions will possibly be subject to drastic price increases. After all, it should be clear to us that commercially used tools that significantly improve our productivity are usually not available at low prices.

I also think that the Internet as we are used to it so far, will change very much in the future. Many of the free services that have been financed by advertising will disappear in the medium term. Let’s take a look at the StackOverFlow service as an example. A very popular knowledge platform among developer circles. If we now in the future the research to questions of programming ChatGPT or other neural networks are questioned for StackOverFlow the visitor numbers sink continuously. The knowledge base in turn ChatGPT uses is based on data from public forums like StackOverFlow. So for the foreseeable future StackOverFlow will try to make its service inaccessible to AIs. There could certainly also be an agreement with compensation payments. So that the omitted advertising revenues are compensated. As technicians, we do not need to be told that an offer like StackOverFlow incurs considerable costs for operation and development. It then remains to be seen how users will accept the offer in the future. If no new data is added to StackOverFlow, the knowledge base for A.I. systems will also become uninteresting. I therefore suspect that by around 2030, it will be primarily high-quality content on the Internet that will be subject to a charge.

If we look at the forecast of the medium-term trend in the demand for programmers, we come to the question of whether it will be a good recommendation in the future to study computer science or to start an apprenticeship as a programmer. I actually see a positive future here and would encourage anyone who sees education as a vocation and not as a necessity to make a living. In my opinion, we will continue to need many innovative minds. Only those who instead of dealing with basics and concepts prefer to quickly learn a current framework in order to keep up with the emerging hyphe of the market, will certainly achieve only limited success in the future. However, I have already made these observations before the wide availability of A.I. systems. Therefore, I am firmly convinced that quality will always prevail in the long run.

I consider it a virtue to approach all kinds of topics as critically and attentively as possible. Nevertheless, I must say that some fears in dealing with A.I. are quite unfounded. You have already seen some of my possible visions of the future in this article. Statements that A.I. will one day take over our world by subtly influencing uninitiated users to motivate them to take action are, in my opinion, pure fantasy for a period up to 2030 and, given the current state of knowledge, unfounded. Much more realistically I see the problem that if resourceful marketing people litter the Internet with inferior non-revised A.I. generated articles to spice up their SEO ranking and this in turn as a new knowledge cab of the neural networks the quality of future A.I. generated texts significantly reduced.

The A.I. systems that have been freely available so far have one decisive difference compared to humans. You lack the motivation to do something on your own initiative. Only through an extrinsic request by the user does the A.I. begin to work on a question. It becomes interesting when an A.I. dedicates itself to self-selected questions and also researches them independently. In this case the probability is very high that the A.I. will develop a consciousness very fast. If such an A.I. then still runs on a high performance quantum computer, we do not have sufficient reaction time to recognize dangerous developments and to intervene. Therefore, we should definitely keep the play “The Physicists” created by Dürrenmatt in our consciousness. Because the ghosts I called once, I will possibly not get rid of so fast again.

Basically, I have to admit that the topic of A.I. continues to fascinate me and I am very curious about future developments. Nevertheless, I think it is important not to close our eyes to the dark side of artificial intelligence and to start an objective discourse in order to exploit the existing potential of this technology as harmlessly as possible.

No post found

Latest won’t always be greatest

For more than a decade, it has been widely accepted that computer systems should be kept up to date. Those who regularly install updates reduce the risk of having security gaps on their computer that could be misused. Always in the hope that manufacturers of software always fix in their updates also security flaws. Microsoft, for example, has imposed an update requirement on its users since the introduction of Windows 10. Basically, the idea was well-founded. Because unpatched operating systems allow hackers easy access. So the thought: ‘Latest is greatest’ prevailed a very long time ago.

Windows users had little leeway here. But even on mobile devices like smartphones and tablets, automatic updates are activated in the factory settings. If you host an open source project on GitHub, you will receive regular emails about new versions for the libraries used. So at first glance, this is a good thing. However, if you delve a bit deeper into the topic, you will quickly come to the conclusion that latest is not always the best.

The best-known example of this is Windows 10 and the update cycles enforced by Microsoft. It is undisputed that systems must be regularly checked for security problems and available updates must be installed. That the maintenance of computer systems also takes time is also understandable. However, it is problematic when updates installed by the manufacturer paralyze the entire system and a new installation becomes necessary because the update was not sufficiently tested. But also in the context of security updates unasked function changes to the user to bring in I consider unreasonable. Especially with Windows, there are a lot of additional programs installed, which can quickly become a security risk due to lack of further development. That means with all consequence forced Windows updates do not make a computer safe, since here the additionally installed software is not examined for weak points.

If we take a look at Android systems, the situation is much better. However, there are enough points of criticism here as well. The applications are updated regularly, so the security is actually improved significantly. But also with Android, every update usually means functional changes. A simple example is the very popular Google StreetMaps service. With every update, the map usage becomes more confusing for me, as a lot of unwanted additional information is displayed, which considerably reduces the already limited screen.

As a user, it has fortunately not yet happened to me that application updates on Android have paralyzed the entire phone. Which also proves that it is quite possible to test updates extensively before rolling them out to users. However, this does not mean that every update was unproblematic. Problems that can be observed here regularly are things like an excessively increased battery consumption.

Pure Android system updates, on the other hand, regularly cause the hardware to become so slow after almost two years that you often decide to buy a new smartphone. Although the old phone is still in good condition and could be used much longer. I have noticed that many experienced users turn off their Android updates after about a year, before the phone is sent into obsolescence by the manufacturer.

How do you get an update muffler to keep his systems up to date and secure? My approach as a developer and configuration manager is quite simple. I distinguish between feature update and security patch. If you follow the semantic versioning in the release process and use a branch by release model for SCM systems like Git, such a distinction can be easily implemented.

But I also dedicated myself to the question of a versionable configuration setting for software applications. For this, there is a reference implementation in the project TP-CORE on GitHub, which is described in detail in the two-part article Treasue Chest. After all, it must be clear to us that if we reset the entire configuration made by the user to factory settings during an update, as is quite often the case with Windows 10, quite unique security vulnerabilities can arise.

This also brings us to the point of programming and how GitHub motivates developers through emails to include new versions of the libraries used in their applications. Because if such an update is a major API change, the problem is the high migration effort for the developers. This is where an also fairly simple strategy has worked for me. Instead of being impressed by the notifications about updates from GitHub, I regularly check via OWASP whether my libraries contain known risks. Because if a problem is detected by OWASP, it doesn’t matter how costly an update can be. The update and the associated migration must be implemented promptly. This also applies to all releases that are still in production

However, one rule of thumb applies to avoid update hell from the start: Only install or use what you really need. The fewer programs are installed under Windows and the fewer apps there are on the smartphone, the fewer security risks there are. This also applies to program libraries. Less is more from a security perspective. Apart from that, we get a free performance measurement by dispensing with unnecessary programs.

Certainly, for many private users the question of system updates is hardly relevant. Only new unwanted functions in existing programs, performance degradations or now and then shot operating systems cause more or less strong displeasure. In the commercial surrounding field quite fast substantial costs can develop, which can affect also the straight implementing projects negatively. Companies and people who develop software can improve user satisfaction considerably if they differentiate between security patches and feature updates in their release publications. And a feature update should then also contain all known security updates.

Docker Basics in less than 10 minutes

This short tutorial covers the most fundamental steps to use docker in your development tool…

Learn to walk with Docker and PostgreSQL

After some years the virtualization tool Docker proofed it’s importance for the software industry. Usually…

README – how to

README files are text files and are formatted in markdown notation. The code hosting portal…

Treasure chest – Part 2

In the previous part of the article treasure chest, I described how the database connection…

Tooltime: SCM-Manager

This article introduces a new tool for the DevOps CI /CD pipeline. The SCM-Manager can…

Version Number Anti-Patterns

In this article, we discuss some best-practices for working with version numbers for software artifacts…

The digital toolbox

  • [English en] This article is only visible for logged in Patreons. With your Patreon membership you help me to keep this page up to date.
  • [Deutsch de] Dieser Artikel ist nur für eingeloggte Patreons sichtbar. Mit ener Patreon Mitgliedschaft helft Ihr mir diese Seite stets aktuell zu halten.
  • [Español es] Este artículo sólo es visible para los Patreons registrados. Con tu suscripción a Patreon me ayudas a mantener esta página actualizada.
To view this content, you must be a member of Elmar's Patreon at $2.99 USD or more
Already a qualifying Patreon member? Refresh to access this content.

Conway’s law

During my work as a Configuration Manager / DevOps for large web projects, I have watched companies disregard Conway’s Law and fail miserably. Such failure then often manifested itself in significant budget overruns and missed deadlines.

The internal infrastructure in the project collaboration was exactly modeled on the internal organizational structures and all experiences and established standards were ‘bent’ to fit the internal organization. This resulted in problems that made the CI/CD pipelines particularly cumbersome and resulted in long execution times. But also adjustments could only be made with a lot of effort. Instead of simplifying existing processes and aligning them with established standards, excuses were made to leave everything as it was before. Let’s take a look at what Conway’s Law is and why we should know it.

The US American researcher and programmer Melvin E. Conway received his doctorate from Case Western Reserve University in 1961. His area of expertise is programming languages and compiler design.

In 1967, he submitted to The Harvard Business Review his paper “How Do Committees Invent?” and was rejected. The reason given was that his thesis was not substantiated. However, Datamation, the largest IT magazine at the time, accepted his article and published it in April 1968. And this paper has since been widely cited. The core statement is:

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.

Conway, Melvin E. “How do Committees Invent?” 1968, Datamation, vol. 14, num. 4, pp. 28–31

When Fred Brooks cited the essay in his legendary 1975 book, The Mythical Man-Month, he called this key statement Conway’s Law. Brooks recognized the connection between Conway’s Law and management theory. In this regard, we find the following example in the article:

Because the design which occurs first is almost never the best possible, the prevailing system concept may need to change. Therefore, flexibility of organization is important to effective design.

The Mythical Man-Month: Essays on Software Engineering

An often-cited example of an “ideal” team size in terms of Conway’s Law is Amazon’s two-pizza rule, which states that individual project teams should have no more members than two pizzas can fill in one meeting. The most important factor to consider in team alignment, however, is the ability to work across teams and not live in silos.

Conway’s Law was not intended as a joke or Zen koan, but is a valid sociological observation. Take a look at structures from government agencies and their digital implementation. But also processes found in large corporations have been emulated by software systems. Such applications are considered very cumbersome and complicated, so that they find little acceptance among users and they prefer to fall back on alternatives. Unfortunately, it is often impossible to simplify processes in large organizational structures for politically motivated reasons.

Among other things, there is a detailed article by Martin Fowler, who deals explicitly with software architectures and elaborates the importance of the coupling of objects and modules.The communication of the developers among themselves plays a substantial role, in order to obtain best possible results. This circumstance over the importance of communication was taken up also by the agile software development and converted as essential point.Especially when distributed teams work on a joint project, the time difference is a limiting factor in team communication.This must then be designed particularly efficiently.

In 2010, Jonny Leroy and Matt Simons coined the term Inverse Conway Maneuver in the article “Dealing with creaky legacy platforms”:

Conway’s Law … can be summarized as “Dysfunctional organizations tend to create dysfunctional applications.” To paraphrase Einstein, you can’t fix a problem from within the same mindset that created it, so it is often worth investigating whether restructuring your organization or team would prevent the new application from displaying all the same structural dysfunctions as the original. In what could be termed an “inverse Conway maneuver,” you may want to begin by breaking down silos that constrain the team’s ability to collaborate effectively.

Since the 2010s, a new architectural style has entered the software industry. The so-called microservices, which are created by small agile teams. The most important criterion of a microservice compared to a modular monolith is that a microservice can be seen as an independently viable module or subsystem. On the one hand, this allows the microservice to be reused in other applications. On the other hand, there is a strong encapsulation of the functional domain, which opens up a very high flexibility for adaptations.

However, Conway’s law can be applied to many other areas and is not exclusively limited to the software industry. This is what makes the work so valuable.

Ressourcen

Links are only visible for logged in users.

Docker Basics in less than 10 minutes

This short tutorial covers the most fundamental steps to use docker in your development tool…

Learn to walk with Docker and PostgreSQL

After some years the virtualization tool Docker proofed it’s importance for the software industry. Usually…

README – how to

README files are text files and are formatted in markdown notation. The code hosting portal…

Treasure chest – Part 2

In the previous part of the article treasure chest, I described how the database connection…

Tooltime: SCM-Manager

This article introduces a new tool for the DevOps CI /CD pipeline. The SCM-Manager can…

Version Number Anti-Patterns

In this article, we discuss some best-practices for working with version numbers for software artifacts…

WordPress REST API is unreachable

  • [English en] This article is only visible for logged in Patreons. With your Patreon membership you help me to keep this page up to date.
  • [Deutsch de] Dieser Artikel ist nur für eingeloggte Patreons sichtbar. Mit ener Patreon Mitgliedschaft helft Ihr mir diese Seite stets aktuell zu halten.
  • [Español es] Este artículo sólo es visible para los Patreons registrados. Con tu suscripción a Patreon me ayudas a mantener esta página actualizada.
To view this content, you must be a member of Elmar's Patreon at $2.99 USD or more
Already a qualifying Patreon member? Refresh to access this content.

The spectre of artificial intelligence

The hype surrounding artificial intelligence has been going on for several years. Currently, companies like OpenAI are causing quite a stir with freely accessible neural networks like ChatGPT. Users are fascinated by the possibilities and some intellectual figures of our time are warning humanity about artificial intelligence. So what is it about the specter of AI? In this article, I explore this question and you are invited to join me on this journey. Let’s go and follow me into the future.

In the spring of 2023, reports about the performance capabilities of artificial neural networks overflowed. This trend is continuing and, in my opinion, will not abate any time soon. In the midst of the emerging gold rush mood, however, there are also isolated bad news doing the rounds. For example, Microsoft announced a massive investment in artificial intelligence on a grand scale. This announcement was underlined in the spring of 2023 with the dismissal of just under 1000 employees and gave rise to familiar fears of industrialization and automation. Things were less spectacular at Digital Ocean, which laid off its entire content creation and documentation team. Quickly, some people rightly asked whether AI would now make professions like programmers, translators, journalists, editors and so on obsolete? For now, I would like to answer this question with a no. In the medium term, however, changes will occur, as history has already taught us. Something old passes away while new things come into being. So follow me on a little historical excursion.

To do this, we first look at the various stages of industrialization, which originated in England in the second half of the 18th century. Already the meaning of the original Latin term Industria, which can be translated with diligence, is extremely interesting. Which leads us to Norbert Wiener and his 1960 book ern God and Golem Inc [1]. He publicly pondered whether people who create machines that in turn can create machines are gods. Something I do not want to subscribe from my feeling. But let’s come back to industrialization for the time being.

The introduction of the steam engine and the use of location-independent energy sources such as coal enabled precise mass production. With cheaper automation of production by machines, manual home workplaces were displaced. In exchange, cheaper products were now available in stores. But there were also significant changes in transportation. The railroad allowed for faster, more comfortable and cheaper travel. This catapulted mankind into a globalized world. Because goods could now also travel long distances in a short time without any problems. Today, when we look back at the discussions that took place when the railroad began its triumphal march, we can only smile. After all, some intellectuals of the time argued that speeds in a train of more than 30 kilometers per hour would literally crush the human occupants. A fear that fortunately turned out to be unfounded.

While people in the first industrial revolution could no longer earn an income from working at home, they found an alternative to continue earning a living by working in a factory.

The second industrial revolution is characterized by electrification, which further increased the degree of automation. Machines became less cumbersome and more precise. But new inventions also entered daily life. Fax, telephone and radio spread information at a rapid pace. This brought us into the Information Age and accelerated not only our communication, but also our lives. We created a society that is primarily characterized by the saying “time is money”.

The third industrial revolution blessed mankind with a universal machine, which determined its functionality by the programs (software) running on it. Nowadays, computers support us in a wide range of activities. Modern cash register systems do much more than just spit out the total amount of the purchase made. They log money and flow of goods and allow evaluations for optimization with the collected data. This is a new quality of automation that we have achieved in the last 200 years. With the widespread availability of artificial neural networks, we are now on our way out of this phase, which is why we are currently in the transformation to the fourth industrial revolution. How else do we as humans intend to cope with the constantly growing flood of information?

Even though Industry 4.0 focuses on the networking of machines, this is not a real revolution. The Internet is only a consequence of the previous development to enable communication between machines. We can compare this with the replacement of the steam engine by electric motors. The real innovation was in electric machines that changed our communication. This is now happening in our time through the broad field of artificial intelligence.

In the near future, we will no longer use computers the way we have been doing. That’s because today’s computers owe their existence to the previously limited communication between humans and machines. The keyboard and mouse are actually clumsy input devices. They are slow and prone to error. Voice and gesture control via microphone and camera will replace mouse and keyboard. We will talk to our computers the way we talk to other people. This also means that today’s computer programs will become obsolete. We will no longer have to fill out tedious input masks in graphical user interfaces in order to reach our goal. Gone are the days where I type my articles. I will type them in and my computer will visually display them for me to proofread. Presumably, the profession of speech therapist will then experience a significant upswing.

There will certainly also be enough outcries from people who fear the disintegration of human communication. This fear is not at all unfounded. Let’s just look at the development of the German language in the period since the turn of the millennium. This was marked by the emergence of various text messaging services and the optimization of messages by using as many abbreviations as possible. This in turn only created question marks on the foreheads of parents when it came to deciphering the content of their children’s messages. Even though the current trend is away from text messages to audio messages, it does not mean that our language will not continue to change. I myself have observed for years that many people are no longer able to express themselves correctly in writing or to extract content from written texts. In the long run, this could lead to the unlearning of skills such as reading and writing. Thus also classical print articles such as books and magazines become obsolete. Finally, content can also be produced as video or podcast. Our intellectual abilities will degenerate in the long run.

Since the turn of the millennium, it has become easier and easier for many people to use computers. So first the good news. It will become much easier to use computers in the future because human-machine interaction is becoming more intuitive. In the meantime, we will see more and more major Internet portals shutting down their services because their business model is no longer viable. Here’s a quick example.

As a programmer, I often use the website StackOverflow to find help with problems. The information on this website about programming issues is now so extensive that you can quickly find suitable solutions to your own concerns by searching Google and the like, without having to formulate questions yourself. So far so good. But if you now integrate a neural network like ChatGPT into your programming environment to find the answer to all questions, the number of visitors for StackOverflow will continuously decrease. This in turn has an impact on advertising campaigns to be able to offer the service free of charge on the net. Initially, this will be compensated by the fact that operators of AI systems that access the data from StackOverflow will pay a flat fee for the use of the database. However, this will not stop the dwindling number of visitors. Which will lead to either a payment barrier preventing free use or the service being discontinued completely. There are many offers on the Internet that will encounter similar problems, which will ensure in the long term that the Internet as we know it has disappeared in the future.

Let’s imagine what a future search query for the search term ‘industrial revolution’ might look like. I ask my digital assistant: What do you know about industrial revolution? – Instead of searching through a seemingly endless list of thousands of entries for relevant results, I am read a short explanation with a personalized address that matches my age and level of education. Which immediately raises the question of who is judging my level of education and how?

This is a further downgrading of our abilities. Even if it is perceived as very comfortable in the first moment. If we no longer have the need to focus our attention on one specific thing over a long period of time, it will certainly be difficult for us to think up new things in the future. Our creativity will be reduced to an absolute minimum.

It will also change the way data is stored in the future. Complicated structures that are optimized and stored in databases will be the exception rather than the rule. Rather, I expect independent chunks of data that are concatenated like lists. Let’s look at this together to get a good idea of what I mean.

As a starting point, let’s take Aldous Huxley’s book ‘Brave New World’ from 1932. In addition to the title, the author and the year of publication, we can add English as the language to the meta information. This is then followed by the entire contents of the book including preface and epilogue as plain ASCII text. Generic or changeable things like table of contents or copyright are not included at this stage. With such a chunk, we have defined an atomic datum that can be uniquely identified by a hash value. Since Huxley’s Brave New World was originally written in English, this datum is also an immutable source for all data derived and generated from it.

If the work of Huxley is now translated into German or Spanish, it is the first derivation with the reference to the original. It can happen that books have been translated by different translators in different epochs. This results in a different reference hash for the German translation by Herbert E. Herlitschka from 1933 with the title ‘Brave New World’ than for the translation by Eva Walch published in 1978 with the same title ‘Brave New World’.

If audio books are now produced from the various texts, these audio books are the second derivative of the original text, since they represent an abridged version. A text is also created as an independent version before the recording. The audio track created from the abridged original text has the director as its author and refers to the speaker(s). As in theater, a text can be interpreted and staged by different people. Film adaptations can be treated in the same way.

Books, audio books and films in turn have graphics for the cover. These graphics again represent independent works, which are referenced with the corresponding version of the original.

Quotations from books can also be linked in this way. Similarly, critiques, interpretations, reviews and all kinds of other variations of content that refer to an original.

However, such data blocks are not only limited to books, but can also be applied to music scores, lyrics, etc. The decisive factor is that one can start from the original as far as possible. The resulting files are optimized exclusively for software programs, since they do not contain any formatting that is visible to the human eye. Finally, the corresponding hash value about the content of the file is sufficient as file name.

This is where the vision of the future begins. As authors of our work, we can now use artificial intelligence to automatically create translations, illustrations, audio books and animations even from a book. At this point, I would like to briefly refer to the neural network DeepL [2], which already delivers impressive translations and even improves the original text if handled skillfully. Does DeepL now put translators and editors out of work? I mean no! Because also like us humans, artificial intelligences are not infallible. They also make mistakes. That’s why I think that the price for these jobs will drop dramatically in the future, because these people can now do many times more work than before, thanks to their knowledge and excellent tools. This makes the individual service considerably cheaper, but because more individual services are possible through automation in the same period of time, this compensates for the price reduction for the provider.

If we now look at the new possibilities that are open to us, it doesn’t seem to be so problematic for us. So what are people like Elon Musk trying to warn us about?

If we now assume that the entire human knowledge will be digitized by the fourth industrial revolution and that all new knowledge will only be created in digital form, computer algorithms will be free to use suitable computing power to change these chunks of knowledge in such a way that we humans will not notice. A scenario loosely based on Orwell’s Ministry of Truth from the novel 1984. If we unlearn our abilities out of convenience, we also have few possibilities of verification.

If you think this would not be a problem, I would like to refer you to the lecture “Trust no scan” by David Kriesel [3].What happened? In short, it was about the fact that a construction company noticed discrepancies in copies of their construction plans. This resulted in different copies of the same original, in which the numerical values were changed. A very fatal problem in a construction project for the executing trades. If the bricklayer gets different size data than the concrete formers. The error was finally traced back to the fact that Xerox used an AI as software in their scanners for the OCR and the subsequent compression, which could not reliably recognize the characters read in.

But also the quote from Ted Chiang “Think of ChatGPT as a blurry jpeg of all the text on the Web.” should make us think. Certainly, for people who only know AI as an application, the meaning is hard to understand what is meant by saying “ChatGPT is just a blurry jpeg of all the text on the web”. However, it is not as difficult to understand as it seems at the first moment. Due to their structure, neural networks are always only a snapshot. Because with every input the internal state of a neural network changes. It is the same as with us humans. After all, we are only the sum of our experiences. If in the future more and more texts created by an AI are placed on the web without being reflected, the AI will form its knowledge from its own derivations. The originals fade with the time there they by ever smaller references in weighting lose. If someone would flood the Internet with topics like flat earth and lizard people, programs like ChatGPT would inevitably react to it and let this flow into their texts. These texts could be published then either independently by the AI in the net automated or find their spreading by unreflective persons accordingly. We have thus created a spiral that can only be broken if people have not given up their ability to exercise judgment out of convenience.

So we see that the warnings for caution in dealing with AI are not unfounded. Even if I consider scenarios like in the movie WarGames from 1983 [4] as improbable, we should consider very well how far we want to go with the technology of AI. Not that it happens to us like the sorcerer’s apprentice and we have to find out that we cannot master it any more.

References

Links are only visible for logged in users.

No post found

Date vs. Boolean

When we designing data models and their corresponding tables appears sometimes Boolean as datatype. In general those flags are not really problematic. But maybe there could be a better solution for the data design. Let me give you a short example about my intention.

Assume we have to design a simple domain to store articles. Like a Blog System or any other Content Management. Beside the content of the article and the name of the author could we need a flag which tells the system if the article is visible for the public. Something like published as a Boolean. But there is also an requirement of when the article is scheduled a date for publishing. In the most database designs I observed for those circumstances a Boolean: published and a Date: publishingDate. In my opinion this design is a bit redundant and also error prone. As a fast conclusion I would like to advice you to use from the beginning just Date instead of Boolean. The scenario I described above can also transformed to many other domain solutions.

For now, after we got an idea why we should replace Boolean for Date datatype we will focus about the details how we could reach this goal.

Dealing with standard SQL suggest that replacing a Database Management System (DBMS) for another one should not be a big issue. The reality is unfortunately a bit different. Not all available data types for date like Timestamp are really recommendable to use. By experience I prefer to use the simple java.util.Date to avoid future problems and other surprises. The stored format in the database table looks like: ‘YYYY-MM-dd HH:mm:ss.0’. Between the Date and Time is a single space and .0 indicates an offset. This offset describes the time zone. The Standard Central European Timezone CET has an offset of one hour. That means UTC+01:00 as international format. To define the offset separately I got good results by using java.util.TimeZone, which works perfectly together with Date.

Before we continue I will show you a little code snippet in Java for the OR Manager Hibernate and how you could create those table columns.

@Table(name = "ARTICLE")
public class ArticleDO {

    @CreationTimestamp
    @Column(name = "CREATED")
    @Temporal(TemporalType.DATE)
    private Date created;

    @Column(name = "PUBLISHED")
    @Temporal(TemporalType.DATE)
    private Date published;

    @Column(name = "DEFAULT_TIMEZONE")
    private String defaultTimezone;

    //Constructor
    public ArticleDO() {
        TimeZone.setDefault(Constraints.SYSTEM_DEFAULT_TIMEZONE);
        this.defaultTimezone = "UTC+00:00";
        this.published = new Date('0000-00-00 00:00:00.0');
    }

    public Date isPublished() {
        return published;
    }

    public void setPublished(Date publicationDate) {
    	if(publicationDate != null) {
        	this.published = publicationDate;
    	} else {
    		this.published = new Date(System.currentTimeMillis());
    	}
    }
}    

Java

-- SQL
INSERT INTO ARTICLE (CREATED, PUBLISHED, DEFAULT_TIMEZONE)
    VALUES ('1984-04-01 12:00:01.0', '0000-00-00 00:00:00,0', 'UTC+00:00);

SQL

Let get a bit closer about the listing above. As first we see the @CreationTimestamp Annotation. That means when the ArticleDO Object got created the variable created will initialized by the current time. This value never should changed, because an article can just once created but several times changed. The Timezone is stored in a String. In the Constructor you can see how the system Timezone could grabbed – but be careful this value should not trusted to much. If you have a user like me traveling a lot you will see in all the places I stay the same system time, because usually I never change that. As default Timezone I define the correct String for UTC-0. The same I do for the variable published. Date can also created by a String what we use to set our default zero value. The Setter for published has the option to define an future date or use the current time in the case the article will published immediately. At the end of the listing I demonstrate a simple SQL import for a single record.

But do not rush to fast. We also need to pay a bit attention how to deal with the UTC offset. Because I observed in huge systems several times problems which occurred because developer was used only default values.

The timezone in general is part of the internationalization concept. For managing the offset adjustments correctly we can decide between different strategies. Like in so many other cases there no clear right or wrong. Everything depends on the circumstances and necessities of your application. If a website is just national wide like for a small business and no time critical events are involved everything become very easy. In this case it will be unproblematic to manage the timezone settings automatically by the DBMS. But keep in mind in the world exist countries like Mexico with more than just one timezone. An international system where clients spread around the globe it could be useful to setup each single DBMS in the cluster to UTC-0 and manage the offset by the application and the connected clients.

Another issue we need to come over is the question how should initialize the date value of a single record by default? Because null values should avoided. A full explanation why returning null is not a good programming style is given by books like ‘Effective Java’ and ‘Clean Code’. Dealing with Null Pointer Exceptions is something I don’t really need. An best practice which well works for me is an default date – time value by ‘0000-00-00 00:00:00.0’. Like this I’m avoiding unwanted publishing’s and the meaning is very clear – for everybody.

As you can see there are good reasons why Boolean data types should replaced by Date. In this little article I demonstrated how easy you can deal with Date and timezone in Java and Hibernate. It should also not be a big thing to convert this example to other programming languages and Frameworks. If you have an own solution feel free to leave a comment and share this article with your colleagues and friends.

No post found