Databases: the choice of torture

The permanent storage of data is called persistence in technical terms. To be able to access this data again, software is needed that structures and searches it. Such software is called a Database Management System (DBMS). To access a database from a programming language like Java, Ruby, Python, or PHP, a corresponding driver is required. This driver is often referred to as a client, because the DBMS is the server that allows access for multiple clients. In this article, we won’t focus on how to connect to the respective databases with which programming language, but rather on the different database technologies and their applications.

[Relational DB (rows, columns) | GIS DB | embedded DB]
[NoSQL | Key Value Store | Document DB (JSON, XML) | Graph DB | Time Series Server]

There are now numerous solutions to choose from for classic database systems, the so-called relational databases. Both commercial and professional free open-source options are vying for users’ attention. Most web hosting providers offer their users the choice between the free DBMS MySQL (Oracle) and MariaDB (a fork of MySQL after its acquisition by Oracle) for data storage. However, those who can manage their own servers can, of course, opt for the more professional PostgreSQL.

PostgreSQL is rather unsuitable for most standard PHP applications, although WordPress and Joomla do support this database system. Problems usually arise with the developers of extensions. Instead of using the application’s interfaces, database access is often achieved by ignorantly using MySQL’s native commands.

In commercial application development, Oracle or Microsoft SQL Server are typically used, depending on familiarity with the Microsoft Windows environment. The reason for using commercial database servers lies in the costly support available when vulnerabilities and bugs are discovered. Business-critical applications must ensure the continued existence of both the vendor and their customers. The speed of delivery of security patches is a particularly significant reason for using commercial software.

The functionality of relational databases is defined by tables. The columns of a table define the properties, and a row of the table represents the data record. To access an explicit data record, a column (primary key) must contain unique entries that do not appear again in that column. This property of the primary key is called uniqueness. Primary keys allow for the establishment of relationships, or relations, between tables. To keep this article from becoming excessively long, I will limit my in-depth discussion of the functionality of relational databases to this point and move on to the next category.

Of course, there are also relational databases that operate in a column-oriented rather than row-oriented manner. This enables more efficient queries and analyses, especially with large datasets. Here are some of the main features and benefits of column-oriented databases:

  • Data organization: Stores data in columns, which speeds up the processing of specific columns in queries.
  • Compression: Often offers better compression rates for columnar data because similar data types are stored together.
  • Analytical queries: Optimized for analysis and aggregate queries that need to quickly process large amounts of data.
  • Reduced I/O: Reduces the amount of data that needs to be read from disk, as only the required columns are retrieved.

Column-oriented databases include Apache Cassandra, SAP Hanna, DB2, and Amazon BigQuery, with classic use cases for:

  • Business Intelligence: Ideal for databases that need to process large amounts of data for analytical purposes.
  • Data Warehousing: Efficient storage and analysis of historical data.
  • Real-time analytics: Suitable for applications that require rapid decisions based on current data.
IDsupplierarticlepricepackageamount
13MongoDBJSON7.88piece1
21XindiceXML15.67piece1
// Row-oriented DBMS
[{13, MongoDB, JSON, 7.88, piece, 1} {21, Xindice, XML, 15.67, piece, 1}]

// Column-oriented DBMS
[{13,21} {MongoDB, Xindice} {JSON, XML} {7.88, 15.67} {piece, piece} {1, 1}]

To provide data for geographic information systems (GIS) like Google Maps, so-called geospatial databases are used. Geospatial databases are extensions of relational databases that provide tables and relations optimized and standardized for geometric objects. The GIS extension for PostgreSQL is called PostGIS. The datasets for the freely available OpenStreetMap are in a specialized XML format but can also be transformed into geospatial data structures.

Key-value stores are often used in configuration files. However, if you want to build a fast caching system, you need a bit more complexity. This is because the key/value relationship can range from simple strings to complex objects. Basically, a store consists of a unique key to which values ​​can be assigned depending on the data type. Data types can be strings, numbers (integers, floats), Boolean values, and lists. Key-value databases belong to the NoSQL database family because, unlike relational databases, queries are not performed using SQL but are database- and vendor-specific.

Typical key-value databases include Redis, MemCached, Amazon DynamoDB, and the somewhat outdated BarkleyDB, which was acquired by Oracle. A characteristic of key-value databases is that data is stored in memory and backed up to disk at regular intervals. Keeping data in RAM naturally requires a machine with sufficient RAM. Especially with large applications, an enormous amount of data can accumulate for caching.

Another category of databases is embedded databases. “Embedded” refers to the database server itself. Specifically, this means that the database system is not a standalone installation but rather a library integrated into the application. The advantage of this solution is a simpler application installation process. However, this often comes at the expense of security, as many embedded databases lack a dedicated user management layer. This is particularly true for SQLite and the Java-implemented H2. Even the previously mentioned NoSQL BarkelyDB, available as a Java or C library, lacks user management. This means that anyone with access to the application can use a client to read data from the database. Therefore, these systems are not suitable for applications requiring a high level of security.

Regarding the Java version of BarkelyDB, the last available implementation dates back to 2017 and is available as source code in Java/Apache Ant, but this code must be compiled manually. An official binary from Oracle is no longer available, but unofficial versions can be found in the Maven Central Repository.

Anyone wanting to integrate a fully functional relational database into their application can use the embedded version of PostgreSQL – pgx – which provides all the functions of the PostgreSQL server locally.

The next class of databases belongs to the NoSQL category: document-based databases. The two DBMSs, MongoDB and CouchDB, are quite similar in their feature set, but there are significant differences.

  • MongoDB is often chosen for applications requiring complex queries and real-time analytics due to its comprehensive query language and high performance.
  • CouchDB is particularly well-suited for applications that require reliability, a distributed architecture, and easy replication, especially in scenarios where offline access is essential.

The fundamental way document databases work is that the schema is derived from the underlying data structure. These data structures are usually in JSON format and are accessed accordingly. Documents of the same data structure are assigned to a collection. Therefore, these databases don’t store classic office documents, but rather formats like JSON and XML. Document databases that specialize in XML include Oracle XML DB and Apache Xindice.

Many web developers specializing in front-end (UX/UI) development frequently use document databases. This allows them to store data in JSON format to simulate RESTful access and thus populate the dynamic content of the user interface.

A very exotic variant of NoSQL databases are graph databases, which represent data as graphs. This storage format allows for the efficient storage of information according to relationships. Such relationships can be links between websites or a person’s representation on social media. Even the complex relationships used in recommendation systems can be represented as graphs. The following figure shows a simple example of a graph database implemented in Java using Neo4j, to illustrate its use case.

Other graph databases include Amazon Neptune and ArangoDB.

Finally, I’d like to introduce time series. Since monitoring has become essential, especially in the context of application operation, data presented as time series has gained in importance. Typical databases that specialize in processing time series are Prometheus and InfluxDB. However, there are also corresponding extensions for classic relational databases. The PostgreSQL database, which has already been mentioned several times, also has a corresponding extension for this use case called TimescaleDB.

Of course, much more could be said about this topic. After all, countless books on databases fill several meters of library shelves. However, this should suffice for an introduction and an overview of the various database systems and NoSQL solutions. With the information from this article, you now have an idea of ​​which database is suitable for your specific use case. We have also seen that relational databases, especially the free and open-source database PostgreSQL with its available extensions, are very versatile. Further topics related to databases include data modeling and security against hacker attacks.


The internet never forgets.

The internet has its own unique memory that forgets almost nothing. Part of this memory is archive.org, a project initiated by Brewster Kahle in 1996, which has made it its mission to archive the internet. A central component of archive.org is the Wayback Machine.

According to its own figures, the Wayback Machine has access to a database of approximately 1 trillion web pages. Similar to Google, the Wayback Machine is operated via a simple search field. In this search field, you can search for either a specific internet domain or a specific keyword. If something related to the search term is stored in archive.org’s database, the calendar view shows the date on which a so-called snapshot was created. All content from a domain that was freely accessible on that day was included in the snapshot. This makes it easy to recover content that has already been deleted.

However, when working with the Wayback Machine, you need to be aware of certain conditions. While archive.org is a non-profit organization that is financed by donations, there are still some limitations. Furthermore, archive.org is headquartered in the United States. Considering the enormous costs incurred simply for collecting and storing the data, it’s more than just a suspicion that this project has close ties to government agencies. Official bodies also have considerable reasons for wanting such a service without having to adhere to the strict regulations of official government organizations.

One problem that arises from working with the Wayback Machine is the frequency of changes to the archived homepages. Especially with small websites, several changes are made between snapshots. But even seemingly large websites, like spiegelonline.de, don’t have a daily snapshot, as one might expect. The reasons for this are quite varied. In addition, there are various mechanisms that prevent crawlers from indexing the website. The purpose of such efforts can be, among other things, to limit traffic on the server itself, so that resources are available to readers and not blocked by bots.

Another issue arising from this massive amount of data is, of course, the potential use of artificial intelligence to train large LLMs (Learning Management Systems). Large platforms fear losing their users, an aspect I addressed back in 2023. In February 2026, there was also a public discussion on this topic between Wayback Machine board member Mark Graham and Nieman Lab, which can also be found as a blog post at archive.org. Most website operators face this problem, as creating and publishing content costs both time and money. In the case of elmar-dott.com, this includes expenses for the server, domain, books, and various subscriptions. Since we explicitly oppose automated content creation, all articles on elmar-dott.com are based on concrete experience and in-depth research into the respective topics. This also means that many of the solutions described are actually used by the authors themselves. To prevent AI from harvesting the content and thus limiting our visitors to web crawlers, high-quality information is only accessible via subscription. This applies particularly to references, source code, and selected articles.

Another aspect, of course, is the trustworthiness of the stored content. Even though archive.org’s motto is non-profit and its efforts to ensure a freely accessible internet, this doesn’t mean that archive.org doesn’t potentially pursue other, unofficial interests. Electronically stored content is known to be easily manipulated. Therefore, the content collected via archiving services should be considered more of an indicator. Of course, there are ways to protect the collected content from alteration. Blockchain technology would be one such way to detect manipulation.

In the premium article “Harvest Time,” I describe how to gather information using various free and paid APIs. The Wayback Machine can also be used for sensitive research tasks. Because, as is so often the case, mistakes happen in business. Small mishaps are simply human, and sometimes companies can ‘accidentally’ publish sensitive internal information. This could be error messages on the website that reveal which DBMS or server is in use. As soon as you become aware that potentially misusable information appears in any database, the first step is to contact the database owner and request its removal. Often, an explanation and a friendly word are all it takes.

Of course, archive.org isn’t solely focused on websites. Its goal is to create a comprehensive library, which naturally includes digitizing copyright-free books, similar to Project Gutenberg. But films, audio, and software can also be found in the archive. Interestingly, archive.org can also be found on the Onion Tor network under its own Onion domain.

Of course, archive.org isn’t the only organization trying to preserve the internet. The website archive.today also has this goal. However, archive.today’s database isn’t as comprehensive. On the other hand, you can quickly submit your own URL via an input field, and your website will be added to their archive.

As we can see, there are certainly some gems on the internet. You don’t have to be a journalist to delve deeply into research techniques. The field of reconnaissance in cybersecurity also requires a certain amount of intuition. There’s a reason they say: knowledge is power.


Java Enterprise in briefly detail

If you plan to get in touch with Java Enterprise, may in the beginning it’s a bit overwhelmed and confusing. But don’t worry It’s not so worst like it seems. To start with it, you just need to know some basics about the ideas and concepts.

The Java Series

last changed:

As first Java EE is not a tool nor a compiler you download and use it in the same manner like Java Development Kit (JDK) also known as Software Development Kit (SDK). Java Enterprise is a set of specifications. Those specifications are supported by an API and the API have a reference implementation. The reference implementation is a bundle you can download and it’s called Application Server.

Since Java EE 8 the Eclipse Foundation maintain Java Enterprise. Oracle and the Eclipse Foundation was not able to find a common agreement for the usage of the Java Trademark, which is owned by Oracle. The short version of this story is that the Eclipse Foundation renamed JavaEE to JakartaEE. This has also an impact to old projects, because the package paths was also changed in Jakarta EE 9 from javax to jakarta. Jakarta EE 9.1 upgrade all components from JDK 8 to JDK 11.

If you want to start with developing Jakarta Enterprise [1] applications you need some prerequisites. As first you have to choose the right version of the JDK. The JDK already contains the runtime environment Java Vitual Machine (JVM) in the same version like the JDK. You don’t need to install the JVM separately. A good choice for a proper JDK is always the latest LTS Version. Java 17 JDK got released 2021 and have support for 3 years until 2024.

If you wish to overcome the Oracle license restrictions you may could switch to an free Open Source implementation of the JDK. One of the most famous free available variant of the JDK is the OpenJDK from adoptium [2]. Another interesting implementation is GraalVM [3] which is build on top of the OpenJDK. The enterprise edition of GraalVM can speed up your application 1.3 times. For production system a commercial license of the enterprise edition is necessary. GraalVM includes also an own Compiler.

  Version  Year  JSR  Servlet  Tomcat  JavaSE
J2EE – 1.21999
J2EE – 1.32001JSR 58
J2EE – 1.42003JSR 151
Java EE 52006JSR 244
Java EE 62009JSR 316
Java EE 72013JSR 342
Java EE 82017JSR 366
Jakarta 820194.09.08
Jakarta 920205.010.08 & 11
Jakarta 9.120215.010.011
Jakarta 1020226.010.111
Jakarta 1120236.111.017
Jakarta 12under development6.221

The table above is not complete but the most important current versions are listed. Feel free to send me an message if you have some additional information are missing in this overview.

You need to be aware, that the Jakarta EE Specification needs a certain Java SDK and the Application Server maybe need as a runtime another Java JDK. Both Java Versions don’t have to be equal.

Dependencies (Maven):

<dependency>
    <groupId>jakarta.platform</groupId>
    <artifactId>jakarta.jakartaee-api</artifactId>
    <version>${version}</version>
    <scope>provided</scope>
</dependency> 
XML
<dependency>
    <groupId>org.eclipse.microprofile</groupId>
    <artifactId>microprofile</artifactId> 
    <version>${version}</version>
    <type>pom</type>
    <scope>provided</scope>
</dependency>
XML

In the next step you have to choose the Jakarta EE environment implementation. This means decide for an application server. It’s very important that the application server you choose can operate on the JVM version you had installed on your system. The reason is quite simple, because the application server is implemented in Java. If you plan to develop a Servlet project, it’s not necessary to operate a full application server, a simple Servlet Container like Apache Tomcat (Catalina) or Jetty contains everything is required.

Jakarta Enterprise reference implementations are: Payara (fork of Glassfish), WildFly (formerly known as JBoss), Apache Geronimo, Apache TomEE, Apache Tomcat, Jetty and so on.

May you heard about Microprofile [4]. Don’t get confused about it, it’s not that difficult like it seems in the beginnin. In general you can understand Microprofiles as a subset of JakartaEE to run Micro Services. Microprofiles got extended by some technologies to trace, observe and monitor the status of the service. Version 5 was released on December 2021 and is full compatible to JakartaEE 9.


Core Technologies

Plain Old Java Beans

POJOs are simplified Java Objects without any business logic. This type of Java Beans only contains attributes and its corresponding getters and setters. POJOs do not:

  • Extend pre-specified classes: e. g. public class Test extends javax.servlet.http.HttpServlet is not considered to be a POJO class.
  • Contain pre-specified annotations: e. g. @javax.persistence.Entity public class Test is not a POJO class.
  • Implement pre-specified interfaces: e. g. public class Test implements javax.ejb.EntityBean is not considered to be a POJO class.

(Jakarta) Enterprise Java Beans

An EJB component, or enterprise bean, is a body of code that has fields and methods to implement modules of business logic. You can think of an enterprise bean as a building block that can be used alone or with other enterprise beans to execute business logic on the Java EE server.

Enterprise beans are either (stateless or stateful) session beans or message-driven beans. Stateless means, when the client finishes executing, the session bean and its data are gone. A message-driven bean combines features of a session bean and a message listener, allowing a business component to receive (JMS) messages asynchronously.

(Jakarta) Servlet

Java Servlet technology lets you define HTTP-specific Servlet classes. A Servlet class extends the capabilities of servers that host applications accessed by way of a request-response programming model. Although Servlets can respond to any type of request, they are commonly used to extend the applications hosted by web servers.

(Jakarta) Server Pages

JSP is a UI technology and lets you put snippets of Servlet code directly into a text-based document. JSP files transformed by the compiler to a Java Servlet.

(Jakarta) Server Pages Standard Tag Library

The JSTL encapsulates core functionality common to many JSP applications. Instead of mixing tags from numerous vendors in your JSP applications, you use a single, standard set of tags. JSTL has iterator and conditional tags for handling flow control, tags for manipulating XML documents, internationalization tags, tags for accessing databases using SQL, and tags for commonly used functions.

(Jakarta) Server Faces

JSF technology is a user interface framework for building web applications. JSF was introduced to solve the problem of JSP, where program logic and layout was extremely mixed up.

(Jakarta) Managed Beans

Managed Beans, lightweight container-managed objects (POJOs) with minimal requirements, support a small set of basic services, such as resource injection, lifecycle callbacks, and interceptors. Managed Beans represent a generalization of the managed beans specified by Java Server Faces technology and can be used anywhere in a Java EE application, not just in web modules.

(Jakarta) Persistence API

The JPA is a Java standards–based solution for persistence. Persistence uses an object/relational mapping approach to bridge the gap between an object-oriented model and a relational database. The Java Persistence API can also be used in Java SE applications outside of the Java EE environment. Hibernate and Eclipse Link are some reference Implementation for JPA.

(Jakarta) Transaction API

The JTA provides a standard interface for demarcating transactions. The Java EE architecture provides a default auto commit to handle transaction commits and rollbacks. An auto commit means that any other applications that are viewing data will see the updated data after each database read or write operation. However, if your application performs two separate database access operations that depend on each other, you will want to use the JTA API to demarcate where the entire transaction, including both operations, begins, rolls back, and commits.

(Jakarta) API for RESTful Web Services

The JAX-RS defines APIs for the development of web services built according to the Representational State Transfer (REST) architectural style. A JAX-RS application is a web application that consists of classes packaged as a servlet in a WAR file along with required libraries.

(Jakarta) Dependency Injection for Java

Dependency Injection for Java defines a standard set of annotations (and one interface) for use on injectable classes like Google Guice or the Sprig Framework. In the Java EE platform, CDI provides support for Dependency Injection. Specifically, you can use injection points only in a CDI-enabled application.

(Jakarta) Contexts & Dependency Injection for Java EE

CDI defines a set of contextual services, provided by Java EE containers, that make it easy for developers to use enterprise beans along with Java Server Faces technology in web applications. Designed for use with stateful objects, CDI also has many broader uses, allowing developers a great deal of flexibility to integrate different kinds of components in a loosely coupled but typesafe way.

(Jakarta) Bean Validation

The Bean Validation specification defines a metadata model and API for validating data in Java Beans components. Instead of distributing validation of data over several layers, such as the browser and the server side, you can define the validation constraints in one place and share them across the different layers.

(Jakarta) Message Service API

JMS API is a messaging standard that allows Java EE application components to create, send, receive, and read messages. It enables distributed communication that is loosely coupled, reliable, and asynchronous.

(Jakarta) EE Connector Architecture

The Java EE Connector Architecture is used by tools vendors and system integrators to create resource adapters that support access to enterprise information systems that can be plugged in to any Java EE product. A resource adapter is a software component that allows Java EE application components to access and interact with the underlying resource manager of the EIS. Because a resource adapter is specific to its resource manager, a different resource adapter typically exists for each type of database or enterprise information system.

The Java EE Connector Architecture also provides a performance-oriented, secure, scalable, and message-based transactional integration of Java EE platform–based web services with existing EISs that can be either synchronous or asynchronous. Existing applications and EISs integrated through the Java EE Connector Architecture into the Java EE platform can be exposed as XML-based web services by using JAX-WS and Java EE component models. Thus JAX-WS and the Java EE Connector Architecture are complementary technologies for enterprise application integration (EAI) and end-to-end business integration.

(Jakarta) Mail API

Java EE applications use the JavaMail API to send email notifications. The JavaMail API has two parts:

  • An application-level interface used by the application components to send mail
  • A service provider interface

The Java EE platform includes the JavaMail API with a service provider that allows application components to send Internet mail.

(Jakarta) Authorization Contract for Containers

The JACC specification defines a contract between a Java EE application server and an authorization policy provider. All Java EE containers support this contract. The JACC specification defines java.security.Permission classes that satisfy the Java EE authorization model. The specification defines the binding of container-access decisions to operations on instances of these permission classes. It defines the semantics of policy providers that use the new permission classes to address the authorization requirements of the Java EE platform, including the definition and use of roles.

(Jakarta) Authentication Service Provider Interface for Containers

The JASPIC specification defines a service provider interface (SPI) by which authentication providers that implement message authentication mechanisms may be integrated in client or server message-processing containers or runtimes. Authentication providers integrated through this interface operate on network messages provided to them by their calling containers. The authentication providers transform outgoing messages so that the source of each message can be authenticated by the receiving container, and the recipient of the message can be authenticated by the message sender. Authentication providers authenticate each incoming message and return to their calling containers the identity established as a result of the message authentication.

(Jakarta) EE Security API

The purpose of the Java EE Security API specification is to modernize and simplify the security APIs by simultaneously establishing common approaches and mechanisms and removing the more complex APIs from the developer view where possible. Java EE Security introduces the following APIs:

  • SecurityContext interface: Provides a common, uniform access point that enables an application to test aspects of caller data and grant or deny access to resources.
  • HttpAuthenticationMechanism interface: Authenticates callers of a web application, and is specified only for use in the servlet container.
  • IdentityStore interface: Provides an abstraction of an identity store and that can be used to authenticate users and retrieve caller groups.

(Jakarta) Java API for WebSocket

WebSocket is an application protocol that provides full-duplex communications between two peers over TCP. The Java API for WebSocket enables Java EE applications to create endpoints using annotations that specify the configuration parameters of the endpoint and designate its lifecycle callback methods.

(Jakarta) Java API for JSON Processing

The JSON-P enables Java EE applications to parse, transform, and query JSON data using the object model or the streaming model.

JavaScript Object Notation (JSON) is a text-based data exchange format derived from JavaScript that is used in web services and other connected applications.

(Jakarta) Java API for JSON Binding

The JSON-B provides a binding layer for converting Java objects to and from JSON messages. JSON-B also supports the ability to customize the default mapping process used in this binding layer through the use of Java annotations for a given field, JavaBean property, type or package, or by providing an implementation of a property naming strategy. JSON-B is introduced in the Java EE 8 platform.

(Jakarta) Concurrency Utilities for Java EE

Concurrency Utilities for Java EE is a standard API for providing asynchronous capabilities to Java EE application components through the following types of objects: managed executor service, managed scheduled executor service, managed thread factory, and context service.

(Jakarta) Batch Applications for the Java Platform

Batch jobs are tasks that can be executed without user interaction. The Batch Applications for the Java Platform specification is a batch framework that provides support for creating and running batch jobs in Java applications. The batch framework consists of a batch runtime, a job specification language based on XML, a Java API to interact with the batch runtime, and a Java API to implement batch artifacts.

Resources

Notice: I try to keep this post up to date, but mistakes could happen. Please feel free to drop me a message, if you detected some mistakes or if you have some suggestions. If you like this article it would be great to leave a thumbs up and share with friends and colleges.

It doesn’t always have to be Kali Linux!

Kali Linux [1] and Parrot Linux [2] are considered the first choice among Linux distributions when it comes to security and penetration testing. Many relevant programs are already preinstalled on these distributions and can be used out of the box, so to speak.

However, it must also be said that Kali and Parrot are not necessarily the most suitable Linux distributions for everyday use due to their specialization. For daily use, Ubuntu for beginners and Debian for advanced users are more common. For this reason, Kali and Parrot are usually set up and used as virtual machines with VirtualBox or VMWare Player. A very practical approach, especially when it comes to looking at the distribution first before installing it natively on the computer.

In my opinion, the so-called distribution hopping that some people do under Linux is more of a hindrance to getting used to a system in order to be able to work with it efficiently. Which Linux you choose depends primarily on your own taste and the requirements of what you want to do with it. Developers and system administrators will likely have an inclination toward Debian, a version from which many other distributions were derived. Windows switchers often enjoy Linux Mint, and the list goes on.

If you want to feel like a hacker, you can opt for a Kali installation. Things like privacy and anonymous surfing on the Internet are often the actual motives. I had already introduced Kodachi Linux, which specializes in anonymous surfing on the Internet. Of course, it must be made very clear that there is no real anonymous communication on the Internet. However, you can massively reduce the number of possible eavesdroppers with a few easy-to-implement measures. I have addressed the topic of privacy in several articles on this blog. Even if it is an unpopular opinion for many. But a Linux VM that is used for anonymous surfing via an Apple or Windows operating system completely misses its usefulness.

he first point in the “privacy” section is the internet browser. No matter which one you use and how much the different manufacturers emphasize privacy protection, the reality is like the fairy tale “The Emperor’s New Clothes”. Most users know the Tor / Onion network by name. Behind it is the Tor browser, which you can easily download from the Tor Project website [3]. After downloading and unzipping the directory, the Tor Browser can be opened using the start script on the console.

./Browser/start-tor-browser

Anyone using the Tor network can visit URLs ending in .onion. A large number of these sites are known as the so-called dark web and should be surfed with great caution. You can come across very disturbing and illegal content here, but you can also fall victim to phishing attacks and the like. Without going into too much detail about exactly how the Tor network works, you should be aware that you are not completely anonymous here either. Even if the big tech companies are largely ignored, authorities certainly have resources and options, especially when it comes to illegal actions. There are enough examples of this in the relevant press.

If you now think about how the Internet works in broad terms, you will find the next important point: proxy servers. Proxy servers are so-called representatives that, similar to the Tor network, do not send requests to the Internet directly to the homepage, but rather via a third-party server that forwards this request and then returns the answer. For example, if you access the Google website via a proxy, Google will only see the IP address of the proxy server. Even your own provider only sees that you have sent a request to a specific server. The provider does not see in its own log files that this server then makes a request to Google. Only the proxy server appears on both sides, at the provider and on the target website. As a rule, proxy server operators ensure that they do not store any logs with the original IP of their clients. Unfortunately, there is no guarantee for these statements. In order to further reduce the probability of being detected, you can connect several proxy connections in series. With the console program proxychain, this project can be easily implemented. ProxyChain is quickly installed on Debian distributions using the APT package manager.

sudo apt-get install proxychains4

Using it is just as easy. The behavior for proxychain is specified via the configuration file /etc/proxychain.conf. If you change the working mode from stricht_chain to random_chain, a different variation of each proxy server will be randomly assembled for each connection. At the end of the configuration file you can enter the individual proxy servers. Some examples are included in the file. To use proxychain, you simply call it via the console, followed by the application (the browser), which establishes the connection to the Internet via the proxies.

Proxychanin firefox
## RFC6890 Loopback address range
## if you enable this, you have to make sure remote_dns_subnet is not 127
## you'll need to enable it if you want to use an application that 
## connects to localhost.
# localnet 127.0.0.0/255.0.0.0
# localnet ::1/128

The real challenge is finding suitable proxy servers. To get started, you can find a large selection of free proxies worldwide at [4].

Using proxies alone for connections to the Internet only offers limited anonymity. In order for two computers to communicate, an IP address is required that can be linked via the Internet access provider to the correct geographical address where the computer is located. However, additional information is sent to the network via the network card. The so-called MAC address, with which you can directly identify a computer. Since you don’t have to install a new network card every time you restart your computer to get a different MAC address, you can use a small, simple tool called macchanger. Like proxychain, this can also be easily installed via APT. After installation you can set the autostart and you have to decide whether you want to always use the same MAC address or a randomly generated MAC address each time.

Of course, the measures presented so far are only of any use if the connection to the Internet is encrypted. This happens via the so-called Secure Socket Layer (SSL). If you do not connect to the Internet via a VPN and the websites you access only use http instead of https, you can use any packet sniffer (e.g. the Wireshark program) to record the communication and read the content of the communication in plain text. In this way, passwords or confidential messages are spied on on public networks (WiFi). We can safely assume that Internet providers run all of their customers’ communications through so-called packet filters in order to detect suspicious actions. With https connections, these filters cannot look into the packets.

Now you could come up with the idea of ​​illegally connecting to a foreign network using all the measures described so far. After all, no one knows that you are there and all activities on the Internet are assigned to the connection owner. For this reason, I would like to expressly point out that in pretty much all countries such actions are punishable by law and if you are caught doing so, you can quickly end up in prison. If you would like to find out more about the topic of WiFi security in order to protect your own network from illegal access, you will find a detailed workshop on Aircrack-ng in the members’ area (subscription).

The next item on the privacy list is email. For most people, running their own email server is simply not possible. The effort is enormous and not entirely cost-effective. That’s why offers from Google, Microsoft and Co. to provide an email service are gladly accepted. Anyone who does not use this service via a local client and does not cryptographically encrypt the emails sent can be sure that the email provider will scan and read the emails. Without exception! Since configuring a mail client with functioning encryption is more of a geek topic, just like running your own mail server, the options here are very limited. The only solution is the Swiss provider Proton [5], which also provides free email accounts. Proton promotes the protection of its customers’ privacy and implements this through strict encryption. Everyone has to decide for themselves whether they should still send confidential messages via email. Of course, this also applies to the available messengers, which are now used a lot for telephony.

Many people have googled themselves to find out what digital traces they have left behind on the Internet. Of course, this is only scratching the surface, as HR people at larger companies and corporations use more effective ways. Matego is a very professional tool, but there is also a powerful tool in the open source area that can reveal a lot of things. There is also a corresponding workshop for subscribers on this subject. Because if you find your traces, you can also start to cover them up.

As you can see, the topic of privacy and anonymity is very extensive and is only covered superficially in this short article. Nevertheless, the depth of information is sufficient to get a first impression of the matter. It’s not nearly enough to set up a system like Kali if you don’t know the basics to use the tools correctly. Because if you don’t put the different pieces of the puzzle together accurately, the hoped-for effect of providing more privacy on the Internet through anonymity will remain. This article also explains my personal point of view on a technical level as to why there is no such thing as secure, anonymous electronic communication. Anyone who wants to familiarize themselves with the topic will achieve success more quickly with a sensible strategy and their own system, which is gradually expanded, than with a ready-made all-round tool like Kali Linux.

Resources

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

Age verification via systemd in Linux distributions

Since 2025, several countries have already introduced age verification for using social media and the internet in general. Australia and the United Kingdom are leading the way in this trend. Several US states have also followed suit. Age verification is slated to be rolled out across the EU by 2027. Italy and France have already passed corresponding laws. The new government that has been in power in Germany since the beginning of 2025 also favors this form of paternalism. This was demonstrated by a clause in the coalition agreement that stipulates the nationwide introduction of eID in Germany. In this article, I will outline the social and technical aspects that will inevitably affect us citizens.

Under the guise of protecting minors, children and young people under 16 are to be denied access to harmful content such as pornography. Social media platforms like Facebook, X, and others will also be affected by these measures. Already, various types of content on YouTube are only accessible to registered users.

If the well-being of children were truly the priority, the focus would be on fostering their development into stable and healthy personalities. This begins with balanced, healthy school meals, which should be available to every student at an affordable price. Teaching media literacy in schools would also be a step in the right direction. These are just a few examples demonstrating that the justification for introducing age verification is a smokescreen and that fundamentally different goals are being pursued.

It’s about paternalism and control over every single citizen. It’s a violation of the right to self-determination. Because one thing must be clear to everyone: to ensure that a person is indeed of legal age for accessing restricted content, everyone who wants to view it must provide proof of age. This proof will only be possible with an eID. Once a critical mass is reached using their eID, this will become the standard for payments and all sorts of other things. It sounds somewhat prophetic, especially if you’re familiar with the Book of Revelation in the New Testament.

The second beast caused everyone—great and small, rich and poor, master and slave—to receive a mark on their right hand or forehead. Without this mark, no one could buy or sell anything. Revelation 13:16

It is therefore foreseeable that an individual’s refusal to accept the eID will completely exclude them from the digital world. Simultaneously, opportunities that provide alternatives in real life, the so-called analog realm, will disappear. However, I don’t want to be too prophetic here. Everyone can imagine for themselves what consequences the introduction of the digital ID will have on their own lives. I will now delve into some technical details and offer some food for thought regarding civic self-defense. Because I am quite certain that there is broad acceptance of the eID. Even if the specific reasons vary, they can be reduced to personal comfort and convenience. Anyone who continues reading from here on is fully responsible for implementing things independently and acquiring the necessary knowledge. There will be no quick, easy, off-the-shelf solution. But you don’t have to be a techie either. The willingness to think independently is perfectly sufficient to quickly understand the technical connections. It’s not rocket science, as they say.

Because I am quite certain that there is widespread acceptance of the eID. Even if the specific reasons vary, they can be reduced to personal comfort and convenience. People who rely on Apple or Microsoft products have no choice but to switch to open-source operating systems. Smartphones simply don’t offer a practical alternative to banking apps and messaging services. There’s a reason why you need a working phone number to register for Telegram and Signal Messenger: chats are synchronized from the phone to the desktop application. So, you’re left with your computer, which ideally shouldn’t be newer than 2020. I’ve already published an article on this topic.

All Linux distributions run smoothly on older and even low-performance hardware. Switching to Linux is now easy, and you’ll be used to the new system in just a few weeks. So far, so good.

However, since calendar week 13 of 2026, the Linux community has been up in arms across all social media. The program systemd made a commit to the public source code repository adding a birthday field for age verification. Anyone thinking, “Oh well, just one program, I’ll ignore it,” should know that systemd stands for System Daemon. Besides the kernel, it’s one of the most important programs in a Linux distribution. Among other things, it’s responsible for starting necessary services and programs when the computer is turned on.

This is the same record that already holds basic user metadata like realNameemailAddress, and location. The field stores a full date in YYYY-MM-DD format and can only be set by administrators, not by users themselves.

Lennart Poettering, the creator of systemd, has clarified that this change is:

An optional field in the userdb JSON object. It’s not a policy engine, not an API for apps. We just define the field, so that it’s standardized iff people want to store the date there, but it’s entirely optional.

Source: It’s FOSS

All these events also shed new light on the meeting between Linus Torvalds and Bill Gates on June 22, 2025, their first personal encounter in 30 years. It’s absolutely unacceptable in the Linux community to patronize computer users and infringe on their privacy. And there are strong voices opposing the systemd project. However, it’s impossible to predict how strong this resistance will remain if government pressure is exerted on these staunch dissenters.

The first approach to solving this problem is to use a Linux distribution that doesn’t use systemd. Well-known distributions that manage without systemd include Gentoo, Slackware, and Alpine Linux. Those who, like myself and many others, use a pure Debian system might want to take a look at Devuan (version 6.1 Excalibur for March 2026), which is a fork of current Debian versions that doesn’t use systemd.

It’s also worth mentioning that systemd has always been viewed critically by hardcore Linux users. It’s simply considered too bloated. Those who have been running their distribution for a while often hesitate to switch. Linux is like a fine wine. It matures with time, and fresh installations are considered unnecessary by power users, as everything can easily be repaired. Migrations to newer major versions are also generally trouble-free. Therefore, replacing systemd with the more lightweight SysVinit is no problem. The only requirement is that you’re not afraid of the Linux Bash shell. However, there are limits here as well. Those using the GNOME 3 desktop should first switch to a desktop environment that isn’t based on systemd. Devuan Linux shows us the alternatives: KDE Plasma, MATE (a GNOME 2 fork), Cinnamon (for Windows switchers), or the rudimentary Xfce. Before starting, you should at least back up your data for security reasons and, if possible, clone your hard drive to restore the original state in case of problems.

Since I haven’t yet found the time to try out the tutorial myself due to the topic’s current relevance, I refer you to the English-language website linuxconfig.org, which provides instructions on replacing systemd with sysVinit in Debian.

It’s probably like so many things: things are never as bad as they seem. I don’t think the mandatory digital ID will arrive overnight. It will likely be a gradual process that makes life difficult for those who resist total control by authoritarian authorities. There will always be a way for determined individuals to find a solution. But to do so, one must take action and not passively wait for the great savior. He was here before, a very long time ago.

A Handful of JAVA Key Features of Each Version

The object-oriented programming language Java was designed by James Gosling. The first version was released in 1995 by Sun Microsystems. After Oracle acquired Sun Microsystems in 2010, Java became part of Oracle’s product portfolio.

last changed:

The Java Series

  • Java 21 LTS
  • Java 20
  • Java 19
  • Java 18
  • Java 17 LTS
  • Java 16
  • Java 15
  • Java 14
  • Java 13
  • Java 12
  • Java 11 LTS
  • Java 10
  • Java 9
  • Java 8
  • Java 7
  • Java 6
  • Java 5 / J5SE
  • Java 1.4
  • Java 1.3
  • Java 1.2 / Java2

To run Java programs on your computer, you need a virtual machine (JVM). The download for this JVM is called the JRE (Java Runtime Environment). If you want to develop in the Java programming language yourself, you need the corresponding compiler, which is known by the general term SDK (Software Development Kit) or, more specifically, JDK (Java Development Kit). The JDK naturally includes the corresponding runtime environment (the JVM).

With the release of Java 9 in September 2017, Oracle announced a six-month release cycle for future Java versions. This means a new version of the popular programming language is released every year in March and September. But don’t worry, it’s not necessary to follow Oracle’s licensing terms and update your own Java version on this cycle. To provide companies with sufficient stability for their IT infrastructure, every sixth Java release is a so-called LTS (Long Term Support) and has a lifespan of three years from its release date. I have already published a more detailed article on this topic.

With the frequent release cycle, each new Java version naturally brings several new key features to the core language. It’s easy to lose track of everything. Therefore, I’ve compiled a brief overview. For those who would also like an overview of the individual Java Enterprise versions, I recommend my corresponding article on Java EE.

Version overview

To avoid getting bogged down in details, I’ll start with version 1.2, also known as Java 2, which was released in 1998. The most important features of Java 2 were the graphical Swing API and the introduction of the Just-In-Time (JIT) compiler.

In 2000, version 1.3 was released, featuring JNDI (Java Naming and Directory Interface) and JPDA (Java Platform Debugger Architecture).

Just two years later, in 2002, version 1.4 was released, expanding the available standard library with features such as regular expressions, a logging API, an integrated XML & XSLT processor, an Image I/O API, and New I/O.

With the release of version 1.5 in 2004, the version numbering system changed. Java is now numbered as a major version, meaning that from then on, it’s referred to as Java5. Just like version 1.2, the Java 5 Standard Edition (SE), or J5SE for short, brings a host of changes for everyday development. These include: autoboxing/unboxing, annotations, enums, and generics.

Released in 2006, Java6 SE extended XML functionality with the StAX parser, introduced JAX-WS for web services, and brought scripting language support, enabling scripting languages ​​like JavaScript to run on the JVM. The most important examples are the two JVM languages ​​Kotlin (2011, JetBrains) and Scala (2004).

Five years later, and for the first time under Oracle’s direction, Java7 SE was released in 2011. This Java release introduced the diamond operator <>. Further measures to simplify the language and increase expressiveness included simplifying the declaration of varargs methods and enabling the use of strings in switch statements. The concerns that exceptions are slow and should be used sparingly were also addressed, and exception handling was improved.

// Diamond Operator
List<String> myList = new ArrayList<>();

// varargs (variable-length arguments)
public void myMethod(String... args);

Java8 SE took another significant step forward with its release in 2014. In this release, Oracle provided the Java community with long-awaited features such as lambda functions and the Stream API, to name just a few of the most important new features. Furthermore, the Date & Time API was completely redesigned, with changes inspired by the then widely used JODA-Time library.

// ForEach Lambda
myList.forEach(element -> System.out.println(element));

//Stream API
myList.stream()
    .filter(element -> element.startsWith("A"))
    .forEach(System.out.println); 

The 2017 release of Java9 SE marked a turning point. Due to the introduction of the module system, the entire standard library was reworked so that the individual APIs were no longer bundled into a single, gigantic JAR file. The resulting modules now required less memory. These massive changes demanded enormous effort from many long-standing projects to migrate to the new Java version. Furthermore, Java 9 introduced the Java Shell, or JShell for short, a command-line tool that allows Java functions to be executed as scripts.

// module-info.java
module com.example.myapp {
  requires java.base;
  exports com.example.myapp;
}

Further significant changes were brought with the release of Java10 SE, which was released just a few months after Java 9 in 2018 and necessitated additional migration efforts. This release introduced the Local Variable Type interface with its associated keyword var. Furthermore, Release 10 also marked the start of Time-Based Versioning for the Java language. A new major version is released every six months, typically in March and September. An LTS (Long Term Support) version is released every three years.

// the var KeyWord
public static void main(String[] args) {
  var message = "Hello World, Java10!";
  System.out.println(message);
}

The first Long Term Support version for Java was released in the fall of 2018. Release Java11 carries the LTS designation and receives updates for three years. Due to the short release cycles, this version contains only a few API changes. The String class, in particular, received many new methods designed to simplify working with strings.

// String FUnctions
String text = new String(" Helle Java 11 LTS world! ");
text.strip();
text.isBlank();
text.lines().count();

The most important change in the Java12 SE release, released in spring 2019, besides several API extensions, is the ability to use expressions in switch statements.

// switch expressions
Var day = 1;
String weekday = switch(day) {
    default -> "";
    case 1 -> "Monday";
    case 2 -> "Tuesday";
    ...  
}

Java13 SE was released as planned in autumn 2019 and improved expressions in switch statements. It also introduced text blocks, which eliminate the need for resource-intensive string concatenations.

// textblock
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim 
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut 
aliquip ex ea commodo consequat. Duis aute irure dolor in 
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla 
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in 
culpa qui officia deserunt mollit anim id est laborum.
""";

With the release of Java14 SE in March 2020, records were introduced. These data structures are intended to make code more compact and readable, as getters and setters no longer need to be defined.

// records
public record Person (String name, String address) {}

The release of Java15 SE, released in autumn 2020, standardized text blocks and introduced sealed classes. Sealed means that a class cannot be inherited.

// inheritance protection
public sealed class Shape permits Circle {
    // Class body
}

public final class Circle extends Shape {
    // Class body
}

In 2021, the release of Java16 SE standardized various features, including sealed classes, pattern matching for instanceof, and records.

// Pattern Matching instanceof with auto cast
if (obj instanceof String s) {
    System.out.println(s);
}

Java17 SE LTS was released as planned in autumn 2021, replacing Java 11 SE LTS after three years. In this version, the Java Applet API and the Security Manager were marked as deprecated.

Java18 SE established UTF-8 as the default code page for the JVM. The Vector API was also introduced with an experimental status.

// Java Vector API
import jdk.incubator.vector.*;
import java.util.Random;

public class VectorExample {
    // Use preferred vector species for optimal CPU performance
    static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;

    // Vectorized implementation using the Vector API
    static void sqrtsumVector(float[] a, float[] b, float[] c) {
        int i = 0;
        int upperBound = SPECIES.loopBound(a.length); // Efficient loop bound

        // Process data in chunks matching the vector length
        for (; i < upperBound; i += SPECIES.length()) {
            var va = FloatVector.fromArray(SPECIES, a, i);
            var vb = FloatVector.fromArray(SPECIES, b, i);
            var vc = va.mul(va)           // a[i]²
                      .add(vb.mul(vb))    // + b[i]²
                      .neg();             // - (a[i]² + b[i]²)
            vc.intoArray(c, i);           // Store result
        }

        // Handle remaining elements (tail case) with scalar loop
        for (; i < a.length; ++i) {
            c[i] = -(a[i] * a[i] + b[i] * b[i]);
        }
    }
}   

Released at the end of 2022, Java19 SE included several optimizations and integrated the Loom project to enable virtual threads. The Foreign Function & Memory API was also incorporated into the JVM.

// Natice C Memory allocation & Slicing in Java 
Arena arena = Arena.ofAuto();
MemorySegment memorySegment = arena.allocate(12);

MemorySegment segment1 = memorySegment.asSlice(0, 4);
MemorySegment segment2 = memorySegment.asSlice(4, 4);
MemorySegment segment3 = memorySegment.asSlice(8, 4);

VarHandle intHandle = ValueLayout.JAVA_INT.varHandle();

intHandle.set(segment1, 0, Integer.MIN_VALUE);
intHandle.set(segment2, 0, 0);
intHandle.set(segment3, 0, Integer.MAX_VALUE);

Java20 SE, released as usual in the spring of 2023, didn’t introduce any new features but focused on stabilizing existing ones.

In the fall of 2023, Java21 SE LTS, a Long Term Support version, was released. This release finalized Virtual Threads.

public class VirtualThreadsExample {
  public static void main(String[] args) throws
InterruptedException {
    var executor =
      Executors.newVirtualThreadPerTaskExecutor();
    executor.submit( () ->
      System.out.println("Running in virtual thread"));
    executor.submit( () ->
      System.out.println("Running another virtual thread"));
    Thread.sleep(1000);
    complete
  }
}

High-performance hardware under Linux for local AI applications

Anyone wanting to experiment a bit with local LLM will quickly discover its limitations. Not everyone has a massively upgraded desktop PC with 2 TB of RAM and a CPU that could fry an egg under full load. A laptop with 32 GB of RAM, or in my case, a Lenovo P14s with 64 GB of RAM, is more typical. Despite this generous configuration, it often fails to load a more demanding AI model, as 128 GB of RAM is fairly standard for many of these models. And you can’t upgrade the RAM in current laptops because the chips are soldered directly onto the motherboard. We have the same problem with the graphics card, of course. That’s why I’ve made it a habit when buying a laptop to configure it with almost all the available options, hoping to be set for 5-8 years. The quality of the Lenovo ThinkPad series, in particular, hasn’t disappointed me in this regard. My current system is about two years old and is still running reliably.

I’ve been using Linux as my operating system for years, and I’m currently running Debian 13. Compared to Windows, Linux and Unix distributions are significantly more resource-efficient and don’t use their resources for graphical animations and complex gradients, but rather provide a powerful environment for the applications they’re used in. Therefore, my urgent advice to anyone wanting to try local LLMs is to get a powerful computer and run Linux on it. But let’s take it one step at a time. First, let’s look at the individual hardware components in more detail.

Let’s start with the CPU. LLMs, CAD applications, and even computer games all perform calculations that can be processed very effectively in parallel. For parallel calculations, the number of available CPU cores is a crucial factor. The more cores, the more parallel calculations can be performed.

Of course, the processors need to be able to quickly request the data for the calculations. This is where RAM comes into play. The more RAM is available, the more efficiently the data can be provided for the calculations. Affordable laptops with 32 GB of RAM are already available. Of course, the purchase price increases exponentially with more RAM. While there are certainly some high-end gaming devices in the consumer market, I wouldn’t recommend them due to their typically short lifespan and comparatively high price.

The next logical step in the hardware chain is the hard drive. Simple SSDs significantly accelerate data transfer to RAM, but there are still improvements. NVMe cards with 2 GB of storage capacity or more can reach speeds of up to 7000 MB/s in the 4th generation.

We have some issues with graphics cards in laptops. Due to their size and the required performance, the graphics cards built into laptops are more of a compromise than a true highlight. A good graphics card would be ideal for parallel calculations, such as those performed in LLMs (Large Linear Machines). As a solution, we can connect the laptop to an external graphics card. Thanks to Bitcoin miners in the crypto community, considerable experience has already been gained in this area. However, to connect an external graphics card to the laptop, you need a port that can handle that amount of data. USB 3 is far too slow for our purposes and would severely limit the advantages of the external graphics card due to its low data rate.

The solution to our problem is Thunderbolt. Thunderbolt ports look like USB-C, but are significantly faster. You can identify Thunderbolt by the small lightning bolt symbol (see Figure 1) on the cables or connectors. These are not the power supply connections. To check if your computer has Thunderbolt, you can use a simple Linux shell command.

ed@local: $ lspci | grep -i thunderbolt
00:07.0 PCI bridge: Intel Corporation Raptor Lake-P Thunderbolt 4 PCI Express Root Port #0
00:07.2 PCI bridge: Intel Corporation Raptor Lake-P Thunderbolt 4 PCI Express Root Port #2
00:0d.0 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 USB Controller
00:0d.2 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #0
00:0d.3 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #1

In my case, my computer’s output shows that two Thunderbolt 4 ports are available.

To connect an external graphics card, we need a mounting system onto which a PCI card can be inserted. ANQUORA offers a good solution here with the ANQ-L33 eGPU Enclosure. The board can accommodate a graphics card with up to three slots. It costs between €130 and €200. A standard ATX power supply is also required. The required power supply wattage depends on the graphics card’s power consumption. It’s advisable not to buy the cheapest power supply, as the noise level might bother some users. The open design of the board provides ample flexibility in choosing a graphics card.

Selecting a graphics card is a whole other topic. Since I use Linux as my operating system, I need a graphics card that is supported by Linux. For accelerating LLMs, a graphics card with as many GPU cores as possible and a correspondingly large amount of internal memory is necessary. To make the purchase worthwhile and actually notice a performance boost, the card should be equipped with at least 8 GB of RAM. More is always better, of course, but the price of the card will then increase exorbitantly. It’s definitely worth checking the used market.

If you add up all the costs, the investment for an external GPU amounts to at least 500 euros. Naturally, this only includes an inexpensive graphics card. High-end graphics cards can easily exceed the 500-euro price point on their own. Anyone who would like to contribute their expertise in the field of graphics cards is welcome to contribute an article.

To avoid starting your shopping spree blindly and then being disappointed with the result, it’s highly advisable to consider beforehand what you want to do with the local LLM. Supporting programming requires less processing power than generating graphics and audio. Those who use LLMs professionally can save considerably by purchasing a high-end graphics card with self-hosted models compared to the costs of, for example, cloud code. The specification of LLMs depends on the available parameters. The more parameters, the more accurate the response and the more computing power is required. Accuracy is further differentiated by:

  • FP32 (Single-Precision Floating Point): Standard precision, requires the most memory. (e.g., 32 bits per parameter)
  • FP16 (Half-Precision Floating Point): Half the precision, halves the memory requirement compared to FP32, but can slightly reduce precision. (e.g., 16 bits per parameter / 4 bytes)
  • BF16 (Brain Floating Point): Another option for half-precision calculations, often preferred in deep learning due to its better performance in certain operations. (e.g., 16 bits per parameter / 2 bytes)
  • INT8/INT4 (Integer Quantization): Even lower precision, drastically reduces memory requirements and speeds up inference, but can lead to a greater loss of precision. (e.g., 8 bits per parameter / 1 byte)

Other factors influencing the hardware requirements for LLM include:

  • Batch Size: The number of input requests processed simultaneously.
  • Context Length: The maximum length of text that the model can consider in a query. Longer context lengths require more memory because the entire context must be held in memory.
  • Model Architecture: Different architectures have different memory requirements.

To estimate the memory consumption of a model, you can use the following calculation: Parameters * Accuracy = Memory consumption for the model.

7,000,000,000 parameters * 2 bytes/parameter (BF16) = 14,000,000,000 bytes = 14 GB

When considering hardware recommendations, you should refer to the model’s documentation. This usually only specifies the minimum or average requirements. However, there are general guidelines you can use.

  • Small models (up to 7 billion parameters): A GPU with at least 8 GB of VRAM should be sufficient, especially if you are using quantization.
  • Medium-sized models (7-30 billion parameters): A GPU with 16 GB to 24 GB of VRAM is recommended.
  • Large models (over 30 billion parameters): Multiple GPUs, each with at least 24 GB of VRAM, or a single GPU with a very large amount of VRAM (e.g., 48 GB, 80 GB) are required.
  • CPU-only: For small models and simple experiments, the CPU may suffice, but inference will be significantly slower than on a GPU. Here, a large amount of RAM is crucial (several GB / 32+).

We can see that using locally running LLMs can be quite realistic if you have the necessary hardware available. It doesn’t always have to be a supercomputer; however, most solutions from typical electronics retailers are off-the-shelf and not really suitable. Therefore, with this article, I have laid the groundwork for your own experiments.


Risk Cloud & Serverless

The cloud is one of the most innovative developments since the turn of the millennium and enables us to make widespread use of neural networks, which we popularly refer to as Large Language Models (LLM). This technological leap can only be surpassed by quantum computing. But enough of the buzzwords for SEO optimization, instead let’s take a look behind the scenes. Let’s start with what the cloud actually is and put all the marketing terms aside.

The best way to imagine the cloud is as a gigantic supercomputer made up of many small computers like building blocks. This theoretically allows you to combine any amount of CPU power, RAM and hard drive space. On this supercomputer, which runs in a data center, virtual machines can now be provided that simulate a real computer with freely definable hardware. In this way, the physical hardware resources can be optimally distributed among the provided virtual machines.

When it comes to cloud, we roughly distinguish between three different operating levels: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service. The image below gives an idea of ​​how these levels are divided.

To put it simply, you can say that with IaaS the provider only provides the hardware specification. So CPU, RAM, hard drive and internet connection. Via the administration software e.g. B. Kubernetes, you can now create your own virtual machines/containers and install the corresponding operating systems and services yourself. The entire responsibility for security and network routing lies with the customer. PaaS, on the other hand, already provides a rudimentary virtual machine including the selected operating system. What you ultimately install on this system above the operating system level is up to you. But here too, the issue of security is largely in the hands of the customer. For most hosting providers, typical PaaS products are so-called virtual servers. Users have the least freedom with SaaS. Here you usually only have permission to use software through a user account. Very typical SaaS products are email accounts, but also so-called managed servers. Managed servers are mostly used to provide your own websites. Here the version of the programming language and the database is specified by the server operator.

Managed servers in particular have a long tradition. They emerged at the turn of the millennium to provide an immediately usable environment for dynamic PHP websites with a MySQL database connection. The situation is similar with the serverless products that have recently become fashionable. Depending on your level of experience, you can now buy corresponding products from the major providers AWS, Google and Microsoft Azure.

The idea is to no longer operate your own servers for the services and thus outsource the entire hardware, operation and security effort to the cloud operators. In principle, this isn’t a bad idea, especially when it comes to small companies or startups that don’t have a lot of financial resources at their disposal or simply lack the administrative know-how for networks, Linux and server security.

Of course, serverless offerings that are completely managed externally quickly reach their limits. Especially if you want to provide your own developed individual serverless software in the cloud with as little effort as possible, you will come across many a stumbling block. A problem is often the flexible expandability when requirements change. You can certainly buy products from the various providers’ portfolios and combine them as you like like a building block set, but the costs incurred can quickly add up.

Basically, there is nothing wrong with a pay per use model (i.e. pay for what you use). At first glance, this is not a bad solution for people and organizations with small budgets. But here too, it’s the little details that can quickly grow into serious problems.

If you choose any cloud provider, you are well advised to avoid its proprietary management and automation products and instead use established general products if possible. If you commit yourself to one provider with all the consequences, it will only be possible to switch to another provider with great effort. Changes to the terms and conditions or continuously increasing costs are possible reasons for a forced change. Therefore, test whoever binds himself forever.

But also careless use of resources in cloud systems, e.g. B. due to incorrect configurations or unfavorable deployment strategies, can lead to an explosion in costs. Here you are well advised if there is the option to set limits and activate them. So that once you reach a certain amount, you will be informed that only a ‘certain’ quota is available. Especially with highly available services that suddenly receive an enormous number of new users, such limits can quickly lead to them being disconnected from the network. It is therefore always a good idea to use two cloud solutions, one for development and a separate one for the productive system, in order to minimize the offline risk.

Similar to stock market trading, you can also define limits for cloud services like AWS. Stop-loss orders on the stock market prevent you from selling a stock too cheaply or buying it too expensively. With the pay-per-use model, it’s not much different in the cloud. Here, you need to set appropriate limits with your provider to prevent bills from exceeding your available budget. These limits are also dynamic in the cloud. This means that the framework conditions are constantly changing, requiring the necessary limits to be regularly adjusted to meet current needs. To identify bottlenecks early, a robust monitoring system should be in place. The minimum requirement for an AWS node is determined by its requests. The upper limit of available resources is defined by the limit. Tools like IBM’s Kubecost can largely automate cost monitoring in Kubernetes clusters.

For cloud development environments, you should also keep a close eye on your own development and DevOps team. If an NPM Docker container of over 2 GB is created on the fly every time for a simple JavaScript Angular app, this strategy should definitely be questioned. Even if the cloud can allocate seemingly infinite resources dynamically, that doesn’t mean that this happens for free.

Of course, the issue of security is also an important factor. Of course, you can trust the cloud operator when he says that everything is encrypted and access to customer data and business secrets is not possible. One can certainly assume that the information that is to be accessed in most ventures rarely has any exciting or even exciting content that could be of interest to large cloud operators. If you still want to be on the safe side, you should write off the idea of ​​serverless completely and consider running your own cloud. Thanks to modern and free software, this is now easier than expected.

I have learned from personal experience that, given the complexity of modern web applications, efficient monitoring with Grafana and Prometheus or other solutions such as the ELK Stack or Slunk is essential. But some DevOps teams have difficulties with data collection and proper evaluation. IT decision-makers in particular are asked to get a technical overview so as not to fall for the well-sounding marketing traps of cloud and serverless.


The Future of Build Management

It’s not just high-level languages, which need to convert source code into machine code to make it executable, that require build tools. These tools are now also available for modern scripting languages ​​like Python, Ruby, and PHP, as their scope of responsibility continues to expand. Looking back at the beginnings of this tool category, one inevitably encounters make, the first official representative of what we now call a build tool. Make’s main task was to generate machine code and package the files into a library or executable. Therefore, build tools can be considered automation tools. It’s logical that they also take over many other recurring tasks that arise in a developer’s daily work. For example, one of the most important innovations responsible for Maven’s success was the management of dependencies on other program libraries.

Another class of automation tools that has almost disappeared is the installer. Products like Inno Setup and Wise Installer were used to automate the installation process for desktop applications. These installation routines are a special form of deployment. The deployment process, in turn, depends on various factors. First and foremost, the operating system used is, of course, a crucial criterion. But the type of application also has a significant influence. Is it, for example, a web application that requires a defined runtime environment (server)? We can already see here that many of the questions being asked now fall under the umbrella of DevOps.

As a developer, it’s no longer enough to simply know how to write program code and implement functions. Anyone wanting to build a web application must first get the corresponding server running on which the application will execute. Fortunately, there are now many solutions that significantly simplify the provisioning of a working runtime. But especially for beginners, it’s not always easy to grasp the whole topic. I still remember questions in relevant forums about downloading Java Enterprise, but only finding that the application server was included.

Where automation solutions were lacking in the early 2000s, the challenge today is choosing the right tool. There’s an analogy here from the Java universe. When the Gradle build tool appeared on the market, many projects migrated from Maven to Gradle. The argument was that it offered greater flexibility. Often, the ability to define orchestrated builds was needed—that is, the sequence in which subprojects are created. Instead of acknowledging that this requirement represented an architectural shortcoming and addressing it, complex and difficult-to-manage build logic was built in Gradle. This, in turn, made customizations difficult to implement, and many projects were migrated back to Maven.

From DevOps automations, so-called pipelines have become established. Pipelines can also be understood as processes, and these processes can, in turn, be standardized. The best example of a standardized process is the build lifecycle defined in Maven, also known as the default lifecycle. This process defines 23 sequential steps, which, broadly speaking, perform the following tasks:

  • Resolving and deploying dependencies
  • Compiling the source code
  • Compiling and running unit tests
  • Packaging the files into a library or application
  • Deploying the artifact locally for use in other local development projects
  • Running integration tests
  • Deploying the artifacts to a remote repository server.

This process has proven highly effective in countless Java projects over the years. However, if you run this process as a pipeline on a CI server like Jenkins, you won’t see much. The individual steps of the build lifecycle are interdependent and cannot be triggered individually. It’s only possible to exit the lifecycle prematurely. For example, after packaging, you can skip the subsequent steps of local deployment and running the integration tests.

A weakness of the build process described here becomes apparent when creating web applications. Web frontends usually contain CSS and JavaScript code, which is also automatically optimized. To convert variables defined in SCSS into correct CSS, a SASS preprocessor must be used. Furthermore, it is very useful to compress CSS and JavaScript files as much as possible. This obfuscation process optimizes the loading times of web applications. However, there are already countless libraries for CSS and JavaScript that can be managed with the NPM tool. NPM, in turn, provides so-called development libraries like Grunt, which enable CSS processing and optimization.

We can see how complex the build process of modern applications can become. Compilation is only a small part of it. An important feature of modern build tools is the optimization of the build process. An established solution for this is creating incremental builds. This is a form of caching where only changed files are compiled or processed.

Jenkins Pipelines

But what needs to be done during a release? This process is only needed once an implementation phase is complete, to prepare the artifact for distribution. While it’s possible to include all the steps involved in a release in the build process, this would lead to longer build times. Longer local build times disrupt the developer’s workflow, making it more efficient to define a separate process for this.

An important condition for a release is that all used libraries must also be in their final release versions. If this isn’t the case, it cannot be guaranteed that subsequent releases of this version are identical. Furthermore, all test cases must run correctly, and a failure will abort the process. Additionally, a corresponding revision tag should be set in the source control repository. The finished artifacts must be signed, and API documentation must be created. Of course, the rules described here are just a small selection, and some of the tasks can even be parallelized. By using sophisticated caching, creating a release can be accomplished quickly, even for large monoliths.

Furthermore, by utilizing sophisticated caching, creating a release can be accomplished quickly, even for large monoliths. For Maven, for example, no complete release process, similar to the build process, has been defined. Instead, the community has developed a special plugin that allows for the semi-automation of simple tasks that arise during a release.

If we take a closer look at the topic of documentation and reporting, we find ample opportunities to describe a complete process. Creating API documentation would be just one minor aspect. Far more compelling about standardized reporting are the various code inspections, some of which can even be performed in parallel.

Of course, deployment is also essential. Due to the diversity of potential target environments, a different strategy is appropriate here. One possible approach would be broad support for configuration tools like Ansible, Chef, and Puppet. Virtualization technologies such as Docker and LXC containers are also standard in the age of cloud computing. The main task of deployment would then be provisioning the target environment and deploying the artifacts from a repository server. A wealth of different deployment templates would significantly simplify this process.

If we consistently extrapolate from these assumptions, we conclude that there can be different types of projects. These would be classic development projects, from which artifacts for libraries and applications are created; test projects, which in turn contain the created artifacts as dependencies; and, of course, deployment projects for providing the infrastructure. The area of ​​automated deployment is also reflected in the concepts of Infrastructure as Code and GitOps, which can be taken up and further developed here.


Clean Desk – More Than Just Security

As a child, I liked to reply to my mother that only a genius could master chaos when she told me to tidy my room. A very welcome excuse to shirk my responsibilities. When I started an apprenticeship in a trade after finishing school, the first thing my master craftsman emphasized was: keeping things tidy. Tools had to be put back in their bags after use, opened boxes of the same materials had to be refilled, and of course, there was also the need to sweep up several times a day. I can say right away that I never perceived these things as harassment, even if they seemed annoying at first. Because we quickly learned the benefits of the motto “keep things clean.”

Tools that are always put back in their place give us a quick overview of whether anything is missing. So we can then go looking for it, and the likelihood of things being stolen is drastically reduced. With work materials, too, you maintain a good overview of what’s been used up and what needs to be replaced. Five empty boxes containing only one or two items not only take up space but also lead to miscalculations of available resources. Finally, it’s also true that one feels less comfortable in a dirty environment, and cleanliness demonstrates to the client that one works in a focused and planned manner.

Due to this early experience, when the concept of Clean Desk was introduced as a security measure in companies a few years ago, I didn’t immediately understand what was expected of me. After all, the Clean Desk principle had been second nature to me long before I completed my computer science degree. But let’s start at the beginning. First, let’s look at what Clean Desk actually is and how to implement it.

Anyone who delves deeply into the topic of security learns one of the first things they learn: most successful attacks aren’t carried out using complicated technical maneuvers. They’re much more mundane and usually originate from within, not from the outside. True to the adage, opportunity makes the thief. When you combine this fact with the insights of social engineering, a field primarily shaped by the hacker Kevin Mitnick, a new picture emerges. It’s not always necessary to immediately place your own employees under suspicion. In a building, there are external cleaning staff, security personnel, or tradespeople who usually have easy access to sensitive areas. Therefore, the motto should always be: trust is good, but control is better, which is why a Clean Desk Policy is implemented.

The first rule is: anyone leaving their workstation for an extended period must switch off their devices. This applies especially at the end of the workday. Otherwise, at least the desktop should be locked. The concept behind this is quite simple: Security vulnerabilities cannot be exploited from switched-off devices to hack into the company network from the outside. Furthermore, it reduces power consumption and prevents fires caused by short circuits. To prevent the devices from being physically stolen, they are secured to the desk with special locks. I’ve personally experienced devices being stolen during lunch breaks.

Since I myself have stayed in hotels a lot, my computer’s hard drive is encrypted as a matter of course. This also applies to all external storage devices such as USB sticks or external SSDs. If the device is stolen, at least no one can access the data stored on it.

It goes without saying that secure encryption is only possible with a strong password. Many companies have specific rules that employee passwords must meet. It’s also common practice to assign a new password every 30 to 90 days, and this new password must be different from the last three used.

It’s often pointed out that passwords shouldn’t be written on a sticky note stuck to the monitor. I’ve never personally experienced this. It’s much more typical for passwords to be written under the keyboard or mousepad.

Another aspect to consider is notes left on desks, wall calendars, and whiteboards. Even seemingly insignificant information can be quite valuable. Since it’s rather difficult to decide what truly needs protecting and what doesn’t, the general rule is: all notes should be stored securely at the end of the workday, inaccessible to outsiders. Of course, this only works if lockable storage space is available. In sensitive sectors like banking and insurance, the policy even goes so far as to prohibit colleagues from entering their vacation dates on wall calendars.

Of course, these considerations also include your own wastebasket. It’s essential to ensure that confidential documents are disposed of in specially secured containers. Otherwise, the entire effort to maintain confidentiality becomes pointless if you can simply pull them out of the trash after work.

But the virtual desktop is also part of the Clean Desk Policy. Especially in times of virtual video conferences and remote work, strangers can easily catch a glimpse of your workspace. This reminds me of my lecture days when a professor had several shortcuts to the trash on his desktop. We always joked that he was recycling. Separate trash folders for Word, Excel, etc. files.

The Clean Desk Policy has other effects as well. It’s much more than just a security concept. Employees who consistently implement this policy also bring more order to their thoughts and can thus work through tasks one by one with greater focus, leading to improved performance. Personal daily planning is usually structured so that all started tasks can be completed by the end of the workday. This is similar to the trades. Tradespeople also try to complete their jobs by the end of the workday to avoid having to return for a short time the next day. A considerable amount of time is spent on preparation.

Implementing a Clean Desk Policy follows the three Ps (Plan, Protect & Pick). At the beginning of the day, employees decide which tasks need to be completed (Plan), and select the corresponding documents and necessary materials for easy access. At the end of the day, everything is securely stored. During working hours, it must also be ensured that no unauthorized persons have access to information, for example, during breaks. This daily, easy-to-implement routine of preparation and follow-up quickly becomes a habit, and the time required can be reduced to just a few minutes, so that hardly any work time is wasted.

With a Clean Desk Policy, the overwhelming piles of paper disappear from your desk, and by considering which tasks need to be completed each day, you can focus better on them, which significantly improves productivity. At the end of the day, you can also mentally cross some items off your to-do list, leading to greater satisfaction.