Installing Python programs via PIP on Linux

Many years ago, the scripting language Python, named after the British comedy troupe, replaced the venerable Perl on Linux. This means that every Linux distribution includes a Python interpreter by default. A pretty convenient thing, really. Or so it seems! If it weren’t for the pesky issue of security. But let’s start at the beginning, because this short article is intended for people who want to run software written in Python on Linux, but who don’t know Python or have any programming experience. Therefore, a little background information to help you understand what this is all about.

All current Linux distributions derived from Debian, such as Ubuntu, Mint, and so on, throw a cryptic error when you try to install a Python program. To prevent important system libraries written in Python from being overwritten by the installation of additional programs and causing malfunctions in the operating system, a safeguard has been built in. Unfortunately, as is so often the case, the devil is in the details.

ed@P14s:~$ python3 -m pip install ansible
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.
    
    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.
    
    See /usr/share/doc/python3.13/README.venv for more information.

As a solution, a virtual environment will now be set up. Debian 12, and also Debian 13, which was just released in August 2025, use Python version 3. Python 2 and Python 3 are not compatible with each other. This means that programs written in Python 2 will not run in Python 3 without modification.

If you want to install any program in Python, this is done by the so-called package manager. Most programming languages ​​have such a mechanism. The package manager for Python is called PIP. This is where the first complications arise. There are pip, pip3, and pipx. Such naming inconsistencies can also be found with the Python interpreter itself. Version 2 is started on the console with python, and version 3 with python3. Since this article refers to Debian 12 / Debian 13 and its derivatives, we know that at least Python 3 is used. To find out the actual Python version, you can also enter python3 -V in the shell, which shows version 3.13.5 in my case. If you try python or python2, you get an error message that the command could not be found.

Let’s first look at what pip, pip3, and pipx actually mean. PIP itself simply stands for Package Installer for Python [1]. Up to Python 2, PIP was used, and from version 3 onwards, we have PIP3. PIPX [2] is quite special and designed for isolated environments, which is exactly what we need. Therefore, the next step is to install PIPX. We can easily do this using the Linux package manager: sudo apt install pipx. To determine which PIP version is already installed on the system, we need the following command: python3 -m pipx --version, which in my case outputs 1.7.1. This means that I have the original Python 3 installed on my system, along with PIPX.

With this prerequisite, I can now install all possible Python modules using PIPX. The basic command is pipx install <module>. To create a practical example, we will now install Ansible. The use of pip and pip3 should be avoided, as they require installation and can lead to the cryptic error mentioned earlier.

Ansible [3] is a program written in Python and migrated to Python 3 starting with version 2.5. Here’s a brief overview of what Ansible itself is. Ansible belongs to the class of configuration management programs and allows for the fully automated provisioning of systems using a script. Provisioning can be performed, for example, as a virtual machine and includes setting hardware resources (RAM, HDD, CPU cores, etc.), installing the operating system, configuring the user, and installing and configuring other programs.

First, we need to install Ansible with pipx install ansible. Once the installation is complete, we can verify its success with pipx list, which in my case produced the following output:

ed@local:~$ pipx list
venvs are in /home/ed/.local/share/pipx/venvs
apps are exposed on your $PATH at /home/ed/.local/bin
manual pages are exposed at /home/ed/.local/share/man
   package ansible 12.1.0, installed using Python 3.13.5
    - ansible-community

The installation isn’t quite finished yet, as the command ansible --version returns an error message. The problem here is related to the Ansible edition. As we can see from the output of pipx list, we have the Community Edition installed. Therefore, the command is ansible-community --version, which currently shows version 12.2.0 for me.

If you prefer to type ansible instead of ansible-community in the console, you can do so using an alias. Setting the alias isn’t entirely straightforward, as parameters need to be passed to it. How to do this will be covered in another article.

Occasionally, Python programs cannot be installed via PIPX. One example is streamdeck-ui [4]. For a long time, Elgato’s StreamDeck hardware could be used under Linux with the Python-based streamdeck-ui. However, there is now an alternative called Boatswain, which is not written in Python and should be used instead. Unfortunately, installing streamdeck-ui results in an error due to its dependency on the ‘pillow’ library. If you try to use the installation script from the streamdeck-ui Git repository, you’ll find a reference to installing pip3, which is where streamdeck-ui can be obtained. When you then get to the point where you execute the command pip3 install --user streamdeck_ui, you’ll receive the error message “externally-managed-environment” that I described at the beginning of this article. Since we’re already using PIPX, creating another virtual environment for Python programs isn’t productive, as it will only lead to the same error with the pillow library.

As I’m not a Python programmer myself, but I do have some experience with complex dependencies in large Java projects, and I actually found the streamdeck-ui program to be better than Boatwain, I took a look around the GitHub repository. The first thing I noticed is that the last activity was in spring 2023, making reactivation rather unlikely. Nevertheless, let’s take a closer look at the error message to get an idea of ​​how to narrow down the problem when installing other programs.

Fatal error from pip prevented installation. Full pip output in file:
    /home/ed/.local/state/pipx/log/cmd_pip_errors.log

pip seemed to fail to build package:
    'pillow'

A look at the corresponding log file reveals that the dependency on pillow is defined as less than version 7 and greater than version 6.1, resulting in the use of version 6.2.2. Investigating what pillow actually does, we learn that it was a Python 2 library used for rendering graphics. The version used in streamdeck-ui is a fork of pillow for Python 3 and will be available in version 12 by the end of 2025. The problem could potentially be resolved by using a more recent version of pillow. However, this will most likely require adjustments to the streamdeck-ui code, as some incompatibilities in the used functions have probably existed since version 6.2.2.

This analysis shows that the probability of getting streamdeck-ui to run under pip3 is the same as with pipx. Anyone who gets the idea to downgrade to Python 2 just to get old programs running should try this in a separate, isolated environment, for example, using Docker. Python 2 hasn’t received any support through updates and security patches for many years, which is why installing it alongside Python 3 on your operating system is not a good idea.

So we see that the error message described at the beginning isn’t so cryptic after all if you simply use PIPX. If you still can’t get your program to install, a look at the error message will usually tell you that you’re trying to use an outdated and no longer maintained program.

Resources

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.


PDF generation with TP-CORE

PDF is arguably the most important interoperable document format between operating systems and devices such as computers, smartphones, and tablets. The most important characteristic of PDF is the immutability of the documents. Of course, there are limits, and nowadays, pages can easily be removed or added to a PDF document using appropriate programs. However, the content of individual pages cannot be easily modified. Where years ago expensive specialized software such as Adobe Acrobat was required to work with PDFs, many free solutions are now available, even for commercial use. For example, the common office suites support exporting documents as PDFs.

Especially in the business world, PDF is an excellent choice for generating invoices or reports. This article focuses on this topic. I describe how simple PDF documents can be used for invoicing, etc. Complex layouts, such as those used for magazines and journals, are not covered in this article.

Technically, the freely available library openPDF is used. The concept is to generate a PDF from valid HTML code contained in a template. Since we’re working in the Java environment, Velocity is the template engine of choice. TP-CORE utilizes all these dependencies in its PdfRenderer functionality, which is implemented as a facade. This is intended to encapsulate the complexity of PDF generation within the functionality and facilitate library replacement.

While I’m not a PDF generation specialist, I’ve gained considerable experience over the years, particularly with this functionality, in terms of software project maintainability. I even gave a conference presentation on this topic. Many years ago, when I decided to support PDF in TP-CORE, the only available library was iText, which at that time was freely available with version 5. Similar to Linus Torvalds with his original source control management system, I found myself in a similar situation. iText became commercial, and I needed a new solution. Well, I’m not Linus, who can just whip up Git overnight, so after some waiting, I discovered openPDF. OpenPDF is a fork of iText 5. Now I had to adapt my existing code accordingly. This took some time, but thanks to my encapsulation, it was a manageable task. However, during this adaptation process, I discovered problems that were already causing me difficulties in my small environment, so I released TP-CORE version 3.0 to achieve functional stability. Therefore, anyone using TP-CORE version 2.x will still find iText5 as the PDF solution. But that’s enough about the development history. Let’s look at how we can currently generate PDFs with TP-CORE 3.1. Here, too, my main goal is to achieve the highest possible compatibility.

Before we can begin, we need to include TP-CORE as a dependency in our Java project. This example demonstrates the use of Maven as the build tool, but it can easily be switched to Gradle.

<dependency>
    <groupId>io.github.together.modules</groupId>
    <artifactId>core</artifactId>
    <version>3.1.0</version>
</dependency>

To generate an invoice, for example, we need several steps. First, we need an HTML template, which we create using Velocity. The template can, of course, contain placeholders for names and addresses, enabling mail merge and batch processing. The following code demonstrates how to work with this.

<h1>HTML to PDF Velocity Template</h1>

<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet.</p>

<p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p>

<h2>Lists</h2>
<ul>
    <li><b>bold</b></li>
    <li><i>italic</i></li>
    <li><u>underline</u></li>
</ul>

<ol>
    <li>Item</li>
    <li>Item</li>
</ol>

<h2>Table 100%</h2>
<table border="1" style="width: 100%; border: 1px solid black;">
    <tr>
        <td>Property 1</td>
        <td>$prop_01</td>
        <td>Lorem ipsum dolor sit amet,</td>
    </tr>
    <tr>
        <td>Property 2</td>
        <td>$prop_02</td>
        <td>Lorem ipsum dolor sit amet,</td>
    </tr>
    <tr>
        <td>Property 3</td>
        <td>$prop_03</td>
        <td>Lorem ipsum dolor sit amet,</td>
    </tr>
</table>

<img src="path/to/myImage.png" />

<h2>Links</h2>
<p>here we try a simple <a href="https://together-platform.org/tp-core/">link</a></p>

template.vm

The template contains three properties: $prop_01, $prop_02, and $prop_03, which we need to populate with values. We can do this with a simple HashMap, which we insert using the TemplateRenderer, as the following example shows.

Map<String, String> properties = new HashMap<>();
properties.put("prop_01", "value 01");
properties.put("prop_02", "value 02");
properties.put("prop_03", "value 03");

TemplateRenderer templateEngine = new VelocityRenderer();
String directory = Constraints.SYSTEM_USER_HOME_DIR + "/";
String htmlTemplate = templateEngine
                .loadContentByFileResource(directory, "template.vm", properties);

The TemplateRenderer requires three arguments:

  • directory – The full path where the template is located.
  • template – The name of the Velocity template.
  • properties – The HashMap containing the variables to be replaced.

The result is valid HTML code, from which we can now generate the PDF using the PdfRenderer. A special feature is the constant SYSTEM_USER_HOME_DIR, which points to the user directory of the currently logged-in user and handles differences between operating systems such as Linux and Windows.

PdfRenderer pdf = new OpenPdfRenderer();
pdf.setTitle("Title of my own PDF");
pdf.setAuthor("Elmar Dott");
pdf.setSubject("A little description about the PDF document.");
pdf.setKeywords("pdf, html, openPDF");
pdf.renderDocumentFromHtml(directory + "myOwn.pdf", htmlTemplate);

The code for creating the PDF is clear and quite self-explanatory. It’s also possible to hardcode the HTML. Using the template allows for a separation of code and design, which also makes later adjustments flexible.

The default format is A4 with the dimensions size: 21cm 29.7cm; margin: 20mm 20mm; and is defined as inline CSS. This value can be customized using the setFormat(String format); method. The first value represents the width and the second value the height.

The PDF renderer also allows you to delete individual pages from a document.

PdfRenderer pdf = new OpenPdfRenderer();
PdfDocument original = pdf.loadDocument( new File("document.pdf") );
PdfDocument reduced = pdf.removePage(original, 1, 5);
pdf.writeDocument(reduced, "reduced.pdf");

We can see, therefore, that great emphasis was placed on ease of use when implementing the PDF functionality. Nevertheless, there are many ways to create customized PDF documents. This makes the solution of defining the layout via HTML particularly appealing. Despite the PdfRenderer’s relatively limited functionality, Java’s inheritance mechanism makes it very easy to extend the interface and its implementation with custom solutions. This possibility also exists for all other implementations in TP-CORE.

TP-CORE is also free for commercial use and has no restrictions thanks to the Apache 2 license. The source code can be found at https://git.elmar-dott.com and the documentation, including security and test coverage, can be found at https://together-platform.org/tp-core/.

RTFM – usable documentation

An old master craftsman used to say: “He who writes, remains.” His primary intention was to obtain accurate measurements and weekly reports from his journeymen. He needed this information to issue correct invoices, which was crucial to the success of his business. This analogy can also be readily applied to software development. It wasn’t until Ruby, the programming language developed in Japan by Yukihiro Matsumoto, had English-language documentation that Ruby’s global success began.

We can see, therefore, that documentation can be of considerable importance to the success of a software project. It’s not simply a repository of information within the project where new colleagues can find necessary details. Of course, documentation is a rather tedious subject for developers. It constantly needs to be kept up-to-date, and they often lack the skills to put their own thoughts down on paper in a clear and organized way for others to understand.

I myself first encountered the topic of documentation many years ago through reading the book “Software Engineering” by Johannes Siedersleben. Ed Yourdon was quoted there as saying that before methods like UML, documentation often took the form of a Victorian novella. During my professional life, I’ve also encountered a few such Victorian novellas. The frustrating thing was: after battling through the textual desert—there’s no other way to describe the feeling than as overcoming and struggling—you still didn’t have the information you were looking for. To paraphrase Goethe’s Faust: “So here I stand, poor fool, no wiser than before.”

Here we already see a first criticism of poor documentation: inappropriate length and a lack of information. We must recognize that writing isn’t something everyone is born with. After all, you became a software developer, not a book author. For the concept of “successful documentation,” this means that you shouldn’t force anyone to do it and instead look for team members who have a knack for it. This doesn’t mean, however, that everyone else is exempt from documentation tasks. Their input is essential for quality. Proofreading, pointing out errors, and suggesting additions are all necessary tasks that can easily be shared.

It’s highly advisable to provide occasional rhetorical training for the team or individual team members. The focus should be on precise, concise, and understandable expression. This also involves organizing your thoughts so they can be put down on paper and follow a clear and coherent structure. The resulting improved communication has a very positive impact on development projects.

Up-to-date documentation that is easy to read and contains important information quickly becomes a living document, regardless of the format chosen. This is also a fundamental concept for successful DevOps and agile methodologies. These paradigms rely on good information exchange and address the avoidance of information silos.

One point that really bothers me is the statement: “Our tests are the documentation.” Not all stakeholders can program and are therefore unable to understand the test cases. Furthermore, while tests demonstrate the behavior of functions, they don’t inherently demonstrate their correct usage. Variations of usable solutions are also usually missing. For test cases to have a documentary character, it’s necessary to develop specific tests precisely for this purpose. In my opinion, this approach has two significant advantages. First, the implementation documentation remains up-to-date, because changes will cause the test case to fail. Another positive effect is that the developer becomes aware of how their implementation is being used and can correct a flawed design in a timely manner.

Of course, there are now countless technical solutions that are suitable for different groups of people, depending on their perspective on the system. Issue and bug tracking systems, such as the commercial JIRA or the open-source Redmine, map entire processes. They allow testers to assign identified problems and errors in the software to a specific release version. Project managers can use release management to prioritize fixes, and developers document the implemented fixes. That’s the theory. In practice, I’ve seen in almost every project how the comment function in these systems is misused as a chat to describe the change status. The result is a bug report with countless useless comments, and any real, relevant information is completely missing.

Another widespread technical solution in development projects is the use of enterprise wikis. They enhance simple wikis with navigation and allow the creation of closed spaces where only explicitly authorized user groups receive granular permissions such as read or write. Besides the widely used commercial solution Confluence, there’s also a free alternative called BlueSpice, which is based on MediaWiki. Wikis allow collaborative work on a document, and individual pages can be exported as PDFs in various formats. To ensure that the wiki pages remain usable, it’s important to maintain clean and consistent formatting. Tables should fit their content onto a single A4 page without unwanted line breaks. This improves readability. There are also many instances where bulleted lists are preferable to tables for the sake of clarity.

This brings us to another very sensitive topic: graphics. It’s certainly true that a picture is often worth a thousand words. But not always! When working with graphics, it’s important to be aware that images often require a considerable amount of time to create and can often only be adapted with significant effort. This leads to several conclusions to make life easier. A standard program (format) should be used for creating graphics. Expensive graphics programs like Photoshop and Corel should be avoided. Graphics created for wiki pages should be attached to the wiki page in their original, editable form. A separate repository can also be set up for this purpose to allow reuse in other projects.

If an image offers no added value, it’s best to omit it. Here’s a small example: It’s unnecessary to create a graphic depicting ten stick figures with a role name or person underneath. Here, it is advisable to create a simple list, which is also easier to supplement or adapt.

But you should also avoid overloaded graphics. True to the motto “more is better,” overly detailed information tends to cause confusion and can lead to misinterpretations. A recommended book is “Documenting and Communicating Software Architectures” by Stefan Zörner. In this book, he effectively demonstrates the importance of different perspectives on a system and which groups of people are addressed by a specific viewpoint. I would also like to take this opportunity to share his seven rules for good documentation:

  1. Write from the reader’s perspective.
  2. Avoid unnecessary repetition.
  3. Avoid ambiguity; explain notation if necessary.
  4. Use standards such as UML.
  5. Include the reasons (why).
  6. Keep the documentation up-to-date, but never too up-to-date.
  7. Review the usability.

Anyone tasked with writing the documentation, or ensuring its progress and accuracy, should always be aware that it contains important information and presents it correctly and clearly. Concise and well-organized documentation can be easily adapted and expanded as the project progresses. Adjustments are most successful when the affected area is as cohesive as possible and appears only once. This centralization is achieved through references and hyperlinks, so that changes in the original document are reflected in the references.

Of course, there is much more to say about documentation; it’s the subject of numerous books, but that would go beyond the scope of this article. My main goal was to raise awareness of this topic, as paradigms like Agile and DevOps rely on a good flow of information.


BugChaser – The limits of test coverage

The paradigms now established in software engineering, such as Test-Driven Development (TDD) and Behavior-Driven Development (BDD), along with correspondingly easy-to-use tools, have opened up a new, pragmatic perspective on the topic of software testing. Automated tests are a crucial factor in commercial software projects. Therefore, in this context, a successful testing strategy is one in which test execution proceeds without human intervention.

Test automation forms the basis for achieving stability and reducing risk in critical tasks. Such critical tasks include, in particular, refactoring, maintenance, and bug fixes. All these activities share one common goal: preventing new errors from creeping into the code.

In his 1972 article “The Humble Programmer,” Edsger W. Dijkstra stated the following:

„Program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence.“

Therefore, simply automating test execution is not sufficient to ensure that changes to the codebase do not have unintended effects on existing functionality. For this reason, the quality of the test cases must be evaluated. Proven tools already exist for this purpose.

Before we delve deeper into the topic, let’s first consider what automated testing actually means. This question is quite easy to answer. Almost every programming language has a corresponding unit test framework. Unit tests call a method with various parameters and compare the return value with an expected value. If both values ​​match, the test is considered passed. Additionally, it can also be checked whether an exception was thrown.

If a method has no return value or does not throw an error, it cannot be tested directly. Methods marked as private, or inner classes, are also not easily testable, as they cannot be called directly. These must be tested indirectly through public methods that call the ‘hidden’ methods.

When dealing with methods marked as private, it is not an option to access and test the functionality they represent using techniques such as the Reflection API. We must be aware that such methods are often also used to encapsulate code fragments to avoid duplication.

public boolean method() {
    boolean success = false;
    List collector = new ArryList();
    collector.add(1);
    collector.add(2);
    collector.add(3);
    
    sortAsc(collector);
    if(collector.getFirst().equals(1)) {
        success = true;
    }
    return success;
}

private void sortAsc(List collection) {
    collection.sort( 
            (a, b) -> { 
                return -1 * a.compareTo(b); 
            });
}

Therefore, to effectively write automated tests, it is necessary to follow a certain coding style. The preceding Listing 1 simply demonstrates what testable code can look like.

Since developers also write the corresponding component tests for their own implementations, the problem of difficult-to-test code is largely eliminated in projects that follow a test-driven approach. The motivation to test now lies with the developer, as this paradigm allows them to determine whether their implementation behaves as intended. However, we must ask ourselves: Is that all we need to do to develop good and stable software?

As we might expect with such questions, the answer is no. An essential tool for evaluating the quality of tests is achieving the highest possible test coverage. A ​​distinction is made between branch and line coverage. To illustrate the difference more clearly, let’s briefly look at the pseudocode in Listing 2.

if( Expression-A OR Expression-B ) {
print(‘allow‘);
} else {
print(‘decline‘);
}

Our goal is to execute every line of code if possible. To achieve this, we already need two separate test cases: one for entering the IF branch and one for entering the ELSE branch. However, to achieve 100% branch coverage, we must cover all variations of the IF branch. In this example, that means one test that makes Expression A true, and another test that makes Expression B true. This results in a total of three different test cases.

The screenshot from the TP-CORE project shows what such test coverage can look like in ‘real-world’ projects.

Of course, this example is very simple, and in real life, there are often constructs where, despite all efforts, it’s impossible to reach all lines or branches. Exceptions from third-party libraries that need to be caught but cannot be triggered under normal circumstances are a typical example.

For this reason, while we strive to achieve the highest possible test coverage and naturally aim for 100%, there are many cases where this is not feasible. However, a test coverage of 90% is quite achievable. The industry standard for commercial projects is 85% test coverage. Based on these observations, we can say that test coverage correlates with the testability of an application. This means that test coverage is a suitable measure of testability.

However, it must also be acknowledged that the test coverage metric has its limitations. Regular expressions and annotations for data validation are just a few simple examples of where test coverage alone is not a sufficient indicator of quality.

Without going too much into the implementation details, let’s imagine we had to write a regular expression to validate input against a correct 24-hour time format. If we don’t keep the correct interval in mind, our regular expression might be incorrect. The correct interval for the 24-hour format is 00:00 – 23:59. Examples of invalid values ​​are 24:00 or 23:60. If we are unaware of this fact, errors can remain hidden in our application despite test cases, only to surface and cause problems when the application is actually used.

„… In a few cases, participants were unable to think of alternative solutions …“

The question here was whether error correction always represents the optimal solution. Beyond that, it would be necessary to clarify what constitutes an optimal solution in commercial software development projects. The statement that there are cases in which developers only know or understand one way of doing things is very illustrative. This is also reflected in our example of regular expressions (RegEx). Software development is a thought process that cannot be accelerated. Our thinking is determined by our imagination, which in turn is influenced by our experience.

This already shows us another example of sources of errors in test cases. A classic example is incorrect comparisons in collections, such as comparing arrays. The problem we are dealing with here is how variables are accessed: call by value or call by reference. With arrays, access is via call by reference, i.e., directly to the memory location. If you now assign an array to a new variable and compare both variables, they will always be the same because you are comparing the array with itself. This is an example of a test case that is essentially meaningless. However, if the implementation is correct, this faulty test case will never cause any problems.

This realization shows us that blindly striving for complete test coverage is not conducive to quality. Of course, it’s understandable that this metric is highly valued by management. However, we have also been able to demonstrate that one cannot rely on it alone. We therefore see that there is also a need for code inspections and refactorings for test cases. Since it’s impossible to read and understand all the code from beginning to end due to time constraints, it’s important to focus on problematic areas. But how can we find these problem areas? A relatively new technique helps us here. The theoretical work on this is already somewhat older; it just took a while for corresponding implementations to become available.

Disk-Jock-Ey II

After discussing more general issues such as file systems and partitions in the first part of this workshop, we’ll turn to various diagnostic techniques in the second and final part of the series. Our primary tool for this will be Bash, as the following tools are all command-line based.

This section, too, requires the utmost care. The practices described here may result in data loss if used improperly. I assume no liability for any damages.

Let’s start with the possibility of finding out how much free space we still have on our hard drive. Anyone who occasionally messes around on servers can’t avoid the df command. After all, the SSH client doesn’t have a graphical interface, and you have to navigate the shell.

With df -hT, all storage, physical and virtual, can be displayed in human-readable format.

ed@local:~$ df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs   32G     0   32G   0% /dev
tmpfs          tmpfs     6.3G  2.8M  6.3G   1% /run
/dev/nvme0n1p2 ext4      1.8T  122G  1.6T   8% /
tmpfs          tmpfs      32G  8.0K   32G   1% /dev/shm
efivarfs       efivarfs  196K  133K   59K  70% /sys/firmware/efi/efivars
tmpfs          tmpfs     5.0M   16K  5.0M   1% /run/lock
tmpfs          tmpfs     1.0M     0  1.0M   0% /run/credentials/systemd-journald.service
tmpfs          tmpfs      32G   20M   32G   1% /tmp
/dev/nvme0n1p1 vfat      975M  8.8M  966M   1% /boot/efi
tmpfs          tmpfs     6.3G  224K  6.3G   1% /run/user/1000

As we can see in the output, the mount point / is an ext4 file system, with an NVMe SSD with a capacity of 1.8 TB, of which approximately 1.2 TB is still free. If other storage devices, such as external hard drives or USB drives, were present, they would also be included in the list. It certainly takes some practice to sharpen your eye for the relevant details. In the next step, we’ll practice a little diagnostics.

lsblk

If the output of df is too confusing, you can also use the lsblk tool, which provides a more understandable listing for beginners.

ed@local:~$ sudo lsblk
NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0          7:0    0     4K  1 loop /snap/bare/5
loop1          7:1    0  73.9M  1 loop /snap/core22/2139
loop2          7:2    0 516.2M  1 loop /snap/gnome-42-2204/226
loop3          7:3    0  91.7M  1 loop /snap/gtk-common-themes/1535
loop4          7:4    0  10.8M  1 loop /snap/snap-store/1270
loop5          7:5    0  50.9M  1 loop /snap/snapd/25577
loop6          7:6    0  73.9M  1 loop /snap/core22/2133
loop7          7:7    0  50.8M  1 loop /snap/snapd/25202
loop8          7:8    0   4.2G  0 loop 
└─veracrypt1 254:0    0   4.2G  0 dm   /media/veracrypt1
sda            8:0    1 119.1G  0 disk 
└─sda1         8:1    1 119.1G  0 part 
sr0           11:0    1  1024M  0 rom  
nvme0n1      259:0    0   1.9T  0 disk 
├─nvme0n1p1  259:1    0   976M  0 part /boot/efi
├─nvme0n1p2  259:2    0   1.8T  0 part /
└─nvme0n1p3  259:3    0  63.7G  0 part [SWAP]

S.M.A.R.T

To thoroughly test our newly acquired storage before use, we’ll use the so-called S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology) tools. This can be done either with the Disks program introduced in the first part of this article or with more detailed information via Bash. With df -hT, we’ve already identified the SSD /dev/nvme0, so we can call smartctl.

ed@local:~$ sudo smartctl --all /dev/nvme0 
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.48+deb13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZVL22T0HDLB-00BLL
Serial Number:                      S75ZNE0W602153
Firmware Version:                   6L2QGXD7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2,048,408,248,320 [2.04 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Utilization:            248,372,908,032 [248 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 b63101bf9d
Local Time is:                      Sat Oct 25 08:07:32 2025 CST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     83 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.41W       -        -    0  0  0  0        0       0
 1 +     8.41W       -        -    1  1  1  1        0     200
 2 +     8.41W       -        -    2  2  2  2        0     200
 3 -   0.0500W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        37 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    43,047,167 [22.0 TB]
Data Units Written:                 25,888,438 [13.2 TB]
Host Read Commands:                 314,004,907
Host Write Commands:                229,795,952
Controller Busy Time:               2,168
Power Cycles:                       1,331
Power On Hours:                     663
Unsafe Shutdowns:                   116
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               37 Celsius
Temperature Sensor 2:               37 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

A very useful source of information, especially if you’re planning to install a used drive. Fortunately, my system SSD shows no abnormalities after almost two years of use.

fdisk

The classic among hard drive programs is fdisk, which is also available for Windows systems. With fdisk, you can not only format drives but also extract some information. For this purpose, there are the -l parameter for list and -x for more details. The fdisk program is quite complex, and for formatting disks, I recommend the graphical versions Disks and Gparted, presented in the first part of this article. With the graphical interface, the likelihood of making mistakes is much lower than with the shell.

ed@local:~$ sudo fdisk -l
Disk /dev/nvme0n1: 1.86 TiB, 2048408248320 bytes, 4000797360 sectors
Disk model: SAMSUNG MZVL22T0HDLB-00BLL              
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 347D3F20-0228-436D-9864-22A5D36039D9

Device              Start        End    Sectors  Size Type
/dev/nvme0n1p1       2048    2000895    1998848  976M EFI System
/dev/nvme0n1p2    2000896 3867305983 3865305088  1.8T Linux filesystem
/dev/nvme0n1p3 3867305984 4000796671  133490688 63.7G Linux swap

Disk /dev/sda: 119.08 GiB, 127865454592 bytes, 249737216 sectors
Disk model: Storage Device  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xa82a04bd

Device     Boot Start       End   Sectors   Size Id Type
/dev/sda1        2048 249737215 249735168 119.1G 83 Linux

Basic repair with consistency check: fsck

If, contrary to expectations, problems arise, you can use the fsck (File System Consistency Check) tool to check the file system and repair it if necessary. However, you must specify the relevant partition.

sudo fsck /dev/sdc

As you can see in the screenshot, some time ago I had a partition with a defective superblock, making it impossible to access the data. The reason for the error was a defective memory cell in the SSD. With a little effort, I was able to access the data and copy it to another storage device. This doesn’t always work. Therefore, I would like to give a little advice: always be well prepared before such rescue operations. This means having a sufficiently large, functioning target drive ready so that you can immediately create a backup if the operation is successful. Many of the operations presented here change the data on the storage device, and it is not certain whether subsequent access will be successful.

Linux sysadmin joke

Linux system administrators often recommend that beginners delete the French language files, which aren’t needed, to save space. To do this, type the command sudo rm -fr / in the console and press Enter. This should not be done under any circumstances, as the command deletes the entire hard drive. It is considered the most dangerous thing you can do in Linux. You initiate the deletion with rm, the parameters -f and -r stand for force and recursive, respectively, and the inconspicuous / refers to the root directory.

Fake Check

Sometimes it happens that you’ve purchased storage devices that claim a high capacity, but that capacity isn’t even close to being there. These are so-called fake devices. The problem with these fake devices is that the data written to the device for which there is no longer any capacity ends up in oblivion. Unfortunately, you don’t receive an error message and often only notice the problem when you want to access the data again at a later time.

A very unpleasant way to obtain fake devices is through an Amazon hack. To ensure reliable and fast shipping, Amazon offers its sellers the option of storing goods in its own fulfillment center. Sellers who use this option are also given priority on the Amazon website. The problem is that the same products all end up in the same box, which makes perfect sense. Criminals shamelessly exploit this situation and send their fake products to Amazon. Afterwards, it’s impossible to identify the original supplier.

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

Network Attached Storage (NAS)

Another scenario for dealing with mass storage under Linux involves so-called NAS hard drives, which are connected to the router via a network cable and are then available to all devices such as televisions, etc. To ensure that access to the files is only granted to authorized users, a password can be set. Affordable solutions for home use are available, for example, from Western Digital with its MyCloud product series. It would be very practical if you could automatically register your NAS during the boot process on your Linux system, so that it can be used immediately without further login. To do this, you need to determine the NAS URL, for example, from the user manual or via a network scan. Once you have all the necessary information such as the URL/IP, login name, and password, you can register the NAS with an entry in the /etc/fstab file. We already learned about the fstab file in the section on the SWAP file.

First, we install NFS support for Linux to ensure easy access to the file systems commonly used in NAS systems.

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

In the next step, we need to create a file that enables automatic login. We’ll call this file nas-login and save it to /etc/nas-login. The contents of this file are our login information.

user=accountname
password=s3cr3t

Finally, we edit the fstab file and add the following information as the last line:

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

The example is for a Western Digital MyCloud drive, accessible via the URL //wdmycloud.local/account. The account must be configured for the corresponding user. The mount point under Linux is /media/nas. In most cases, you must create this directory beforehand with sudo mkdir /media/nas. In the credentials, we enter the file with our login information /etc/nas-login. After a reboot, the NAS storage will be displayed in the file browser and can be used. Depending on the network speed, this can extend the boot process by a few seconds. It takes even longer if the computer is not connected to the home network and the NAS is unavailable. You can also build your own NAS with a Raspberry Pi, but that could be the subject of a future article.

Finally, I would like to say a few words about Western Digital’s service. I have already had two replacement devices, which were replaced by WD every time without any problems. Of course, I sent the service department my analysis in advance via screenshot, which ruled out any improper use on my part. The techniques presented in this article have helped me a great deal, which is why I came up with the idea of ​​writing this text in the first place. I’ll leave it at that and hope the information gathered here is as helpful to you as it was to me.


Take professional screenshots

Take professional screenshots

Neo Diseno Sep 29, 2025 5 min read

Sure, you can take a screenshot with your smartphone, but creating screenshots isn’t always that easy. In this short article,…

Cryptography – more than just coincidence

In everyday language, we use the word “coincidence” rather unreflectively. Phrases like, “I happened to be passing by here” or “What a coincidence to meet you here” are familiar to everyone. But what do we mean by that? What we’re actually trying to say is that we didn’t expect the current situation.

Coincidence is actually a mathematical term that we’ve adopted into everyday language. Coincidence means something unpredictable. Things like the exact location of any electron in an atom at a given moment. While the path I take to reach a particular destination can be arbitrary, preferences can be derived from probabilities, which then make the choice quite predictable.

Circumstances for such a scenario can be distance, personal well-being (time pressure, discomfort, or boredom), or external circumstances (weather: sunshine, rain). If I’m bored AND the sun is shining, I choose an unknown route for a bit of distraction and curiosity. If I’m short on time AND it’s raining, I choose the shortest route I know, or a route that’s as sheltered as possible. This means that the better you know a person’s habits, the more predictable their decisions are. But predictability contradicts the concept of chance.

It’s nothing new that mathematical terms with very strict definitions are temporarily adopted into our everyday language as a fad. I’d like to briefly address a very popular example, one already cited by Joseph Weizenbaum: the term chaos. In mathematical terms, chaos actually describes the fact that a very small change over very long distances significantly distorts the result, so that it can’t even be used as an estimate or approximation. A typical application is astronomy. If I point a laser beam from Earth to the moon, a deviation of just a few millimeters causes the laser beam to miss the moon by kilometers. To explain such facts to a broader audience in popular science, an association was used that if a butterfly flaps its wings in Tokyo, it can cause a storm in Berlin. Unfortunately, there are quite a few pseudoscientists who seize on this image and sell it to their peers as fact. This is, of course, nonsense. The flapping of a butterfly’s wings cannot create a storm on the other side of the globe. Just think of the impact this would have on our world, just all the birds that take to the air every day.

“Why did the mathematician’s marriage fail? His wife was unpredictable.”

But why is randomness so important in mathematics? Specifically, it’s about the broad topic of cryptography. If we choose combinations for encryption that are easy to guess, the protection is quickly lost. Here’s a small example.

Internet pages are stateless. This means that after a website is accessed and a link is clicked to go to the next page, all information from the previous page is lost. To still be able to provide things like an online shop, a shopping cart, and all the other necessary shopping functions, there is the option of storing data on the server in so-called sessions. This data often includes the user’s login. To distinguish between sessions, they have an identification (ID). The programmer then specifies how this ID is generated. One property of these IDs is that they must be unique; no ID can occur twice.

Now, one might think of using the timestamp, including the milliseconds, to generate a hash. The hash prevents anyone from immediately recognizing that the ID is created from a timestamp. A patient hacker, with a little diligence, uncovered this secret relatively quickly. Added to that is the probability that two users could create a session at the same time, which would lead to an error.

Now, one might come up with the idea of ​​assembling the SessionID from various segments such as timestamps + usernames and other details. Although increasing complexity offers a certain degree of protection, this is not true security. Professionals have methods with manageable effort to guess these ‘avoidable’ secrets. The only real protection is the use of cryptographically secure randomness. As a segment that cannot be guessed, no matter how much effort is put into it.

Before I reveal how we can address the problem, I would like to briefly discuss the typical attack vector and the damage it causes to SessionIDs. If the SessionID has been guessed by an attacker and this session is still active, the hacker can take over this session in their browser. This process is called session hijacking or session riding. The attacker who has managed to take over an active session is logged into an online service as a foreign user with a profile that does not belong to them. This allows them to perform all the actions that a legitimate user can do. It would therefore be possible to place an order in an online shop and have the goods shipped to a different address. This is a situation that must be prevented at all costs.

There are various strategies used to prevent the theft of an active session. Each of these strategies offers a certain level of protection, but the full strength is only achieved by combining the various options, as hackers are constantly evolving and looking for new opportunities. In this short article, we will only consider the aspect of how to generate a cryptographically secure session ID.

Almost all common programming languages ​​have a random() function that generates a random number. The implementation of this random number varies. Unfortunately, these generated numbers are not as random as they should be for attackers. Therefore, developers should always avoid this simple random function. Instead, there are cryptographically secure implementations for random numbers for backend languages ​​such as PHP and Java.

For Java programs, you can use the java.security.SecureRandom class. An important feature of this class is the ability to choose from various cryptographic algorithms [1]. Additionally, the starting value can be specified using the so-called seed. To demonstrate its use, here is a short code snippet:

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

As we can see, its use is quite simple and can be easily adapted. Generating randomness is even easier in PHP. To do so, simply call the function random_int ( $min, $max ); [2]. The interval can be specified optionally.
Thus, we see that the assumption of many people that our world is largely computable is not entirely true. In many areas of the natural sciences, there are processes that we cannot calculate. These, in turn, form the basis for generating ‘true’ randomness. For applications that require very strong protection, hardware is often used. These might be devices that measure the radioactive decay of a low-radiation isotope.

The fields of cryptography and web application security are, of course, much more extensive. This article is intended to draw attention to the necessity of this topic using a fairly simple example. In doing so, I have avoided confusing and ultimately alienating potential interested parties with complicated mathematics.

Resources

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.


Disk-Jock-Ey

Working with mass storage devices such as hard disks (HDDs), solid-state drives (SSDs), USB drives, memory cards, or network-attached storage devices (NAS) isn’t as difficult under Linux as many people believe. You just have to be able to let go of old habits you’ve developed under Windows. In this compact course, you’ll learn everything you need to master potential problems on Linux desktops and servers.

Before we dive into the topic in depth, a few important facts about the hardware itself. The basic principle here is: Buy cheap, buy twice. The problem isn’t even the device itself that needs replacing, but rather the potentially lost data and the effort of setting everything up again. I’ve had this experience especially with SSDs and memory cards, where it’s quite possible that you’ve been tricked by a fake product and the promised storage space isn’t available, even though the operating system displays full capacity. We’ll discuss how to handle such situations a little later, though.

Another important consideration is continuous operation. Most storage media are not designed to be switched on and used 24 hours a day, 7 days a week. Hard drives and SSDs designed for laptops quickly fail under constant load. Therefore, for continuous operation, as is the case with NAS systems, you should specifically look for such specialized devices. Western Digital, for example, has various product lines. The Red line is designed for continuous operation, as is the case with servers and NAS. It is important to note that the data transfer speed of storage media is generally somewhat lower in exchange for an increased lifespan. But don’t worry, we won’t get lost in all the details that could be said about hardware, and will leave it at that to move on to the next point.

A significant difference between Linux and Windows is the file system, the mechanism by which the operating system organizes access to information. Windows uses NTFS as its file system, while USB sticks and memory cards are often formatted in FAT. The difference is that NTFS can store files larger than 4 GB. FAT is preferred by device manufacturers for navigation systems or car radios due to its stability. Under Linux, the ext3 or ext4 file systems are primarily found. Of course, there are many other specialized formats, which we won’t discuss here. The major difference between Linux and Windows file systems is the security concept. While NTFS has no mechanism to control the creation, opening, or execution of files and directories, this is a fundamental concept for ext3 and ext4.

Storage devices formatted in NTFS or FAT can be easily connected to Linux computers, and their contents can be read. To avoid any risk of data loss when writing to network storage, which is often formatted as NTFS for compatibility reasons, the SAMBA protocol is used. Samba is usually already part of many Linux distributions and can be installed in just a few moments. No special configuration of the service is required.

Now that we’ve learned what a file system is and what it’s used for, the question arises: how to format external storage in Linux? The two graphical programs Disks and Gparted are a good combination for this. Disks is a bit more versatile and allows you to create bootable USB sticks, which you can then use to install computers. Gparted is more suitable for extending existing partitions on hard drives or SSDs or for repairing broken partitions.

Before you read on and perhaps try to replicate one or two of these tips, it’s important that I offer a warning here. Before you try anything with your storage media, first create a backup of your data so you can fall back on it in case of disaster. I also expressly advise you to only attempt scenarios you understand and where you know what you’re doing. I assume no liability for any data loss.

Bootable USB & Memory Cards with Disks

One scenario we occasionally need is the creation of bootable media. Whether it’s a USB flash drive for installing a Windows or Linux operating system, or installing the operating system on an SD card for use on a Raspberry Pi, the process is the same. Before we begin, we need an installation medium, which we can usually download as an ISO from the operating system manufacturer’s website, and a corresponding USB flash drive.

Next, open the Disks program and select the USB drive on which we want to install the ISO file. Then, click the three dots at the top of the window and select Restore Disk Image from the menu that appears. In the dialog that opens, select our ISO file for the Image to Restore input field and click Start Restoring. That’s all you need to do.

Repairing Partitions and MTF with Gparted

Another scenario you might encounter is that data on a flash drive, for example, is unreadable. If the data itself isn’t corrupted, you might be lucky and be able to solve the problem with GParted. In some cases, (A) the partition table may be corrupted and the operating system simply doesn’t know where to start. Another possibility is (B) the Master File Table (MFT) may be corrupted. The MTF contains information about the memory location in which a file is located. Both problems can be quickly resolved with GParted.

Of course, it’s impossible to cover the many complex aspects of data recovery in a general article.

Now that we know that a hard drive consists of partitions, and these partitions contain a file system, we can now say that all information about a partition and the file system formatted on it is stored in the partition table. To locate all files and directories within a partition, the operating system uses an index, the so-called Master File Table, to search for them. This connection leads us to the next point: the secure deletion of storage media.

Data Shredder – Secure Deletion

When we delete data on a storage medium, only the entry where the file can be found is removed from the MFT. The file therefore still exists and can still be found and read by special programs. Securely deleting files is only possible if we overwrite the free space multiple times. Since we can never know where a file was physically written on a storage medium, we must overwrite the entire free space multiple times after deletion. Specialists recommend three write processes, each with a different pattern, to make recovery impossible even for specialized labs. A Linux program that also sweeps up and deletes “data junk” is BleachBit.

Securely overwriting deleted files is a somewhat lengthy process, depending on the size of the storage device, which is why it should only be done sporadically. However, you should definitely delete old storage devices completely when they are “sorted out” and then either disposed of or passed on to someone else.

Mirroring Entire Hard Drives 1:1 – CloneZilla

Another scenario we may encounter is the need to create a copy of the hard drive. This is relevant when the existing hard drive or SSD for the current computer needs to be replaced with a new one with a higher storage capacity. Windows users often take this opportunity to reinstall their system to keep up with the practice. Those who have been working with Linux for a while appreciate that Linux systems run very stably and the need for a reinstallation only arises sporadically. Therefore, it is a good idea to copy the data from the current hard drive bit by bit to the new drive. This also applies to SSDs, of course, or from HDD to SSD and vice versa. We can accomplish this with the free tool CloneZilla. To do this, we create a bootable USB with CloneZilla and start the computer in the CloneZilla live system. We then connect the new drive to the computer using a SATA/USB adapter and start the data transfer. Before we open up our computer and swap the disks after finishing the installation, we’ll change the boot order in the BIOS and check whether our attempt was successful. Only if the computer boots smoothly from the new disk will we proceed with the physical replacement. This short guide describes the basic procedure; I’ve deliberately omitted a detailed description, as the interface and operation may differ from newer Clonezilla versions.

SWAP – The Paging File in Linux

At this point, we’ll leave the graphical user interface and turn to the command line. We’ll deal with a very special partition that sometimes needs to be expanded. It’s the SWAP file. The SWAP file is what Windows calls the swap file. This means that the operating system writes data that no longer fits into RAM to this file and can then read this data back into RAM more quickly when needed. However, it can happen that this swap file is too small and needs to be expanded. But that’s not rocket science, as we’ll see shortly.

Abonnement / Subscription

[English] This content is only available to subscribers.

[Deutsch] Diese Inhalte sind nur für Abonnenten verfügbar.

We’ve already discussed quite a bit about handling storage media under Linux. In the second part of this series, we’ll delve deeper into the capabilities of command-line programs and look, for example, at how NAS storage can be permanently mounted in the system. Strategies for identifying defective storage devices will also be the subject of the next part. I hope I’ve piqued your interest and would be delighted if you would share the articles from this blog.

Working with textfiles on the Linux shell

Working with textfiles on the Linux shell

Elmar Dott Jan 27, 2024 4 min read

The command line is a powerful tool under Linux. In this article, you will learn about various little helpers for…

Pathfinder

So that we can call console programs directly across the system without having to specify the full path, we use the so-called path variable. So we save the entire path including the executable program, the so-called executable, in this path variable so that we no longer have to specify the path including the executable on the command line. By the way, the word executable derives the file extension exe, which is common in Windows. Here we also have a significant difference between the two operating systems Windows and Linux. While Windows knows whether it is a pure ASCII text file or an executable file via the file extension such as exe or txt, Linux uses the file’s meta information to make this distinction. That’s why it’s rather unusual to use these file extensions txt and exe under Linux.

Typical use cases for setting the path variable are programming languages ​​such as Java or tools such as the Maven build tool. For example, if we downloaded Maven from the official homepage, we can unpack the program anywhere on our system. On Linux the location could be /opt/maven and on Microsoft Windows it could be C:/Program Files/Maven. In this installation directory there is a subdirectory /bin in which the executable programs are located. The executable for Maven is called mvn and in order to output the version, under Linux without the entry in the path variable the command would be as follows: /opt/maven/bin/mvn -v. So it’s a bit long, as we can certainly admit. Entering the Maven installation directory in the path shortens the entire command to mvn -v. By the way, this mechanism applies to all programs that we use as a command in the console.

Before I get to how the path variable can be adjusted under Linux and Windows, I would like to introduce another concept, the system variable. System variables are global variables that are available to us in Bash. The path variable also counts as a system variable. Another system variable is HOME, which points to the logged in user’s home directory. System variables are capitalized and words are separated with an underscore. For our example with entering the Maven Executable in the path, we can also set our own system variable. The M2_HOME convention applies to Maven and JAVA_HOME applies to Java. As a best practice, you bind the installation directory to a system variable and then use the self-defined system variable to expand the path. This approach is quite typical for system administrators who simplify their server installation using system variables. Because these system variables are global and can also be read by automation scripts.

The command line, also known as shell, bash, console and terminal, offers an easy way to output the value of the system variable with echo. Using the example of the path variable, we can immediately see the difference to Linux and Windows. Linux: echo $PATH Windows: echo %PATH%

ed@local:~$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin:/home/ed/Programs/maven/bin:/home/ed/.local/share/gem//bin:/home/ed/.local/bin:/usr/share/openjfx/lib

Let’s start with the simplest way to set the path variable. In Linux we just need to edit the hidden .bashrc file. At the end of the file we add the following lines and save the content.

export M2_HOME="/opt/maven"
export PATH=$PATH:$M2_HOME/bin

We bind the installation directory to the M2_HOME variable. We then expand the path variable to include the M2_HOME system variable with the addition of the subdirectory of executable files. This procedure is also common on Windows systems, as it allows the installation path of an application to be found and adjusted more quickly. After modifying the .bashrc file, the terminal must be restarted for the changes to take effect. This procedure ensures that the entries are not lost even after the computer is restarted.

Under Windows, the challenge is simply to find the input mask where the system variables can be set. In this article I will limit myself to the version for Windows 11. It may of course be that the way to edit the system variables has changed in a future update. There are slight variations between the individual Windows versions. The setting then applies to both the CMD and PowerShell. The screenshot below shows how to access the system settings in Windows 11.

To do this, we right-click on an empty area on the desktop and select the System entry. In the System – About submenu you will find the system settings, which open the System properties popup. In the system settings we press the Environment Variables button to get the final input mask. After making the appropriate adjustments, the console must also be restarted for the changes to take effect.

In this little help, we learned about the purpose of system variables and how to store them permanently on Linux and Windows. We can then quickly check the success of our efforts in the shell using echo by outputting the contents of the variables. And we are now one step closer to becoming an IT professional.


PHP Elegant Testing with Laravel

The PHP programming language has been the first choice for many developers in the field of web applications for decades. Since the introduction of object-oriented language features with version 5, PHP has come of age. Large projects can now be implemented in a clean and, above all, maintainable architecture. A striking difference between commercial software development and a hobbyist who has assembled and maintains a club’s website is the automated verification that the application adheres to specified specifications. This brings us into the realm of automated software testing.

A key principle of automated software testing is that it verifies, without additional interaction, that the application exhibits a predetermined behavior. Software tests cannot guarantee that an application is error-free, but they do increase quality and reduce the number of potential errors. The most important aspect of automated software testing is that behavior already defined in tests can be quickly verified at any time. This ensures that if developers extend an existing function or optimize its execution speed, the existing functionality is not affected. In short, we have a powerful tool for ensuring that we haven’t broken anything in our code without having to laboriously click through all the options manually each time.

To be fair, it’s also worth mentioning that the automated tests have to be developed, which initially takes time. However, this ‘supposed’ extra effort quickly pays off once the test cases are run multiple times to ensure that the status quo hasn’t changed. Of course, the created test cases also have to be maintained.

If, for example, an error is detected, you first write a test case that replicates the error. The repair is then successfully completed if the test case(s) pass. However, changes in the behavior of existing functionality always require corresponding adaptation of the associated tests. This concept of writing tests in parallel to implement the function is feasible in many programming languages ​​and is called test-driven development. From my own experience, I recommend taking a test-driven approach even for relatively small projects. Small projects often don’t have the complexity of large applications, which also require some testing skills. In small projects, however, you have the opportunity to develop your skills within a manageable framework.

Test-driven software development is nothing new in PHP either. Sebastian Bergmann’s unit testing framework PHPUnit has been around since 2001. The PEST testing framework, released around 2021, builds on PHPUnit and extends it with a multitude of new features. PEST stands for PHP Elegant Testing and defines itself as a next-generation tool. Since many agencies, especially smaller ones, that develop their software in PHP generally limit themselves to manual testing, I would like to use this short article to demonstrate how easy it is to use PEST. Of course, there is a wealth of literature on the topic of test-driven software development, which focuses on how to optimally organize tests in a project. This knowledge is ideal for developers who have already taken their first steps with testing frameworks. These books teach you how to develop independent, low-maintenance, and high-performance tests with as little effort as possible. However, to get to this point, you first have to overcome the initial hurdle: installing the entire environment.

A typical environment for self-developed web projects is the Laravel framework. When creating a new Laravel web project, you can choose between PHPUnit and PEST. Laravel takes care of all the necessary details. A functioning PHP environment is required as a prerequisite. This can be a Docker container, a native installation, or the XAMPP server environment from Apache Friends. For our short example, I’ll use the PHP CLI on Debian Linux.

sudo apt-get install php-cli php-mbstring php-xml php-pcov

After executing the command in the console, you can test the installation success using the php -v command. The next step is to use a package manager to deploy other PHP libraries for our application. Composer is one such package manager. It can also be quickly deployed to the system with just a few instructions.

php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
php -r "if (hash_file('sha384', 'composer-setup.php') === 'ed0feb545ba87161262f2d45a633e34f591ebb3381f2e0063c345ebea4d228dd0043083717770234ec00c5a9f9593792') { echo 'Installer verified'.PHP_EOL; } else { echo 'Installer corrupt'.PHP_EOL; unlink('composer-setup.php'); exit(1); }"
php composer-setup.php
php -r "unlink('composer-setup.php');"

This downloads the current version of the composer.phar file to the current directory in which the command is executed. The correct hash is also automatically checked. To make Composer globally available via the command line, you can either include the path in the path variable or link composer.phar to a directory whose path is already integrated into Bash. I prefer the latter option and achieve this with:

ln -d composer.phar $HOME/.local/bin/composer

If everything was executed correctly, composer list should now display the version, including the available commands. If this is the case, we can install the Lavarel installer globally in the Composer repository.

php composer global require laravel/installer

To install Lavarel via Bash, the path variable COMPOSER_HOME must be set. To find out where Composer created the repository, simply use the command composer config -g home. The resulting path, which in my case is /home/ed/.config/composer, is then bound to the variable COMPOSER_HOME. We can now run

php $COMPOSER_HOME/vendor/bin/laravel new MyApp

in an empty directory to create a new Laravel project. The corresponding console output looks like this:

ed@P14s:~/Downloads/test$ php $COMPOSER_HOME/vendor/bin/laravel new MyApp

   _                               _
  | |                             | |
  | |     __ _ _ __ __ ___   _____| |
  | |    / _` |  __/ _` \ \ / / _ \ |
  | |___| (_| | | | (_| |\ V /  __/ |
  |______\__,_|_|  \__,_| \_/ \___|_|


 ┌ Which starter kit would you like to install? ────────────────┐
 │ None                                                         │
 └──────────────────────────────────────────────────────────────┘

 ┌ Which testing framework do you prefer? ──────────────────────┐
 │ Pest                                                         │
 └──────────────────────────────────────────────────────────────┘

Creating a "laravel/laravel" project at "./MyApp"
Installing laravel/laravel (v12.4.0)
  - Installing laravel/laravel (v12.4.0): Extracting archive
Created project in /home/ed/Downloads/test/MyApp
Loading composer repositories with package information

The directory structure created in this way contains the tests folder, where the test cases are stored, and the phpunit.xml file, which contains the test configuration. Laravel defines two test suites: Unit and Feature, each of which already contains a demo test. To run the two demo test cases, we use the artisan command-line tool [1] provided by Laravel. To run the tests, simply enter the php artisan test command in the root directory.

In order to assess the quality of the test cases, we need to determine the corresponding test coverage. We also obtain the coverage using artisan with the test statement, which is supplemented by the --coverage parameter.

php artisan test --coverage

The output for the demo test cases provided by Laravel is as follows:

Unfortunately, artisan’s capabilities for executing test cases are very limited. To utilize PEST’s full functionality, the PEST executor should be used right from the start.

php ./vendor/bin/pest -h

The PEST executor can be found in the vendor/bin/pest directory, and the -h parameter displays help. In addition to this detail, we’ll focus on the tests folder, which we already mentioned. In the initial step, two test suites are preconfigured via the phpunit.xml file. The test files themselves should end with the suffix Test, as in the ExampleTest.php example.

Compared to other test suites, PEST attempts to support as many concepts of automated test execution as possible. To maintain clarity, each test level should be stored in its own test suite. In addition to classic unit tests, browser tests, stress tests, architecture tests, and even the newly emerging mutation testing are supported. Of course, this article can’t cover all aspects of PEST, and there are now many high-quality tutorials available for writing classic unit tests in PEST. Therefore, I’ll limit myself to an overview and a few less common concepts.

Architecture test

The purpose of architectural tests is to provide a simple way to verify whether developers are adhering to the specifications. This includes, among other things, ensuring that classes representing data models are located in a specified directory and may only be accessed via specialized classes.

test('models')
->expect('App\Models')
->toOnlyBeUsedOn('App\Repositories')
->toOnlyUse('Illuminate\Database');

Mutation-Test

This form of testing is something new. The purpose of the exercise is to create so-called mutants by making changes, for example, to the conditions of the original implementation. If the tests assigned to the mutants continue to run correctly instead of failing, this can be a strong indication that the test cases may be faulty and lack meaningfulness.

Original: if(TRUE) → Mutant: if(FALSE)

Stress-Test

Another term for stress tests is penetration testing, which focuses specifically on the performance of an application. This allows you to ensure that the web app, for example, can handle a defined number of accesses.

Of course, there are many other helpful features available. For example, you can group tests and then run the groups individually.

// definition
pest()->extend(TestCase::class)
->group('feature')
->in('Feature');

// calling
php ./vendor/bin/pest --group=feature

For those who don’t work with the Lavarel framework but still want to test in PHP with PEST, you can also integrate the PEST framework into your application. All you need to do is define PEST as a corresponding development dependency in the Composer project configuration. Then, you can initiate the initial test setup in the project’s root directory.

php ./vendor/bin/pest --init

As we’ve seen, the options briefly presented here alone are very powerful. The official PEST documentation is also very detailed and should generally be your first port of call. In this article, I focused primarily on minimizing the entry barriers for test-driven development in PHP. PHP now also offers a wealth of options for implementing commercial software projects very efficiently and reliably.

Ressourcen

Take professional screenshots

Over the course of the many hours they spend in front of this amazing device, almost every computer user will find themselves in need of saving the screen content as a graphic. The process of creating an image of the monitor’s contents is what seasoned professionals call taking a screenshot.

As with so many things, there are many ways to achieve a screenshot. Some very resourceful people solve the problem by simply pointing their smartphone at the monitor and taking a photo. Why not? As long as you can still recognize something afterwards, everything’s fine. But this short guide doesn’t end there; we’ll take a closer look at the many ways to create screenshots. Even professionals who occasionally write instructions, for example, have to overcome one or two pitfalls.

Before we get to the nitty-gritty, it’s important to mention that it makes a difference whether you want to save the entire screen, the browser window, or even the invisible area of ​​a website as a screenshot. The solution presented for the web browser works pretty much the same for all web browsers on all operating systems. Screenshots intended to cover the monitor area and not a web page use the technologies of the existing operating system. For this reason, we also differentiate between Linux and Windows. Let’s start with the most common scenario: browser screenshots.

Browser

Especially when ordering online, many people feel more comfortable when they can additionally document their purchase with a screenshot. It’s also not uncommon to occasionally save instructions from a website for later use. When taking screenshots of websites, one often encounters the problem that a single page is longer than the area displayed on the monitor. Naturally, the goal is to save the entire content, not just the displayed area. For precisely this case, our only option is a browser plugin.
Fireshot is a plug-in available for all common browsers, such as Brave, Firefox, and Microsoft Edge, that allows us to create screenshots of websites, including hidden content. Fireshot is a browser extension that has been on the market for a very long time. Fireshot comes with a free version, which is already sufficient for the scenario described. Anyone who also needs an image editor when taking screenshots, for example, to highlight areas and add labels, can use the paid Pro version. The integrated editor has the advantage of significantly accelerating workflows in professional settings, such as when creating manuals and documentation. Of course, similar results can be achieved with an external photo editor like GIMP. GIMP is a free image editing program, similarly powerful and professional as the paid version of Photoshop, and is available for Windows and Linux.

Linux

If we want to take screenshots outside of the web browser, we can easily use the operating system’s built-in tools. In Linux, you don’t need to install any additional programs; everything you need is already there. Pressing the Print key on the keyboard opens the tool. You simply have to drag the mouse around the area you want to photograph and press Capture in the control field that appears. It’s not a problem if the control area is in the visible area of ​​the screenshot; it won’t be shown in the screenshot. On German keyboards, you often find the Print key instead of Print. The finished screenshot then ends up in the Screenshots folder with a timestamp in the file name. This folder is a subfolder of Pictures in the user directory.

Windows

The easiest way to take screenshots in Windows is to use the Snipping Tool, which is usually included with your Windows installation. It’s also intuitive to use.

Another very old way in Windows, without a dedicated screenshot creation program, is to press the Ctrl and Print Screen keys simultaneously. Then, open a graphics program, such as Paint, which is included in every Windows installation. In the drawing area, press Ctrl + V simultaneously, and the screenshot appears and can be edited immediately.

These screenshots are usually created in JPG format. JPG is a lossy compression method, so you should check the readability after taking the screenshot. Especially with current monitors with resolutions around 2000 pixels, using the image on a website requires manual post-processing. One option is to reduce the resolution from just under 2000 pixels to the usual 1000 pixels on a website. Ideally, the scaled and edited graphic should be saved in the new WEBP format. WEBP is a lossless graphics compression method that further reduces the file size compared to JPG, which is very beneficial for website loading times.

This already covers a good range of possibilities for taking screenshots. Of course, more could be said about this, but that falls into the realm of graphic design and the efficient use of image editing software.