qemu-img shipped with fedora corrupts qcow2 disk images

Up to and including F38 there were no problems with using qemu-img to compress disk images.

On both F29 and F30 using qemu-img to compress a qcow2 disk image results in severe virtual disk corruption, visable as virtual machines dying a slow death due to IO errors and ‘qemu-img check’ on the disk image reporting lots of errors.

How to repeat:

  1. take a qcow2 disk image that ‘qemu-img check’ reports has no issues (‘qemu-img check diskname.qcow2’ to ensure no errors)
  2. use qemu-img to create a re-compressed image of that virtual disk (‘qemu-img convert -c -f qcow2 -O qcow2 diskname.qcow2 diskname_compressed.qcow2 -p’)
  3. and check if the new qcow2 file is corrupt (‘qemu-img check diskname_compressed.qcow2’)
  4. result: the new qcow2 virtual disk is unusable

Unfortunately disk compression is essential, that is the reason people use qco2 format disk images. One reason is massive disk space savings, from my own use a 100Gb virtual disk after compression only uses 26Gb of disk space.

Another reason compression is needed is that qcow2 images grow, for example the 100Gb image refered to above is only a 70Gb disk image. qcow2 disks are ‘copy on write’ (thats the ‘cow’ part of the qcow2 name) which simply means that if you edit or replace a 1Gb file an additional 1Gb of space is used by the qcow2 disk image on the host filesystem. Over time this means the qcow2 virtual disk image can consume much more space on the host filesystem than the actual disk size allocated to the qcow2 disk; so a 70Gb qcow2 disk can use 100Gb of host filesystem space easily. Compression to reclaim space is important.

I found the options to try to repair a corrupt qcow2 disk at http://www.geekpills.com/virtulization/repair-qcow2-disk; after running the repair against the corrupt disk image ‘qemu-img check’ reported no errors in the image. However using the ‘repaired’ disk image resulted in the virtual machine using the disk image reporting IO errors within a few hours of starting the VM even though ‘qemu-img check’ could find no further errors in the virtual disk.
Yes, filesystem checks were run by the guest OS against the disk on OS boot before filesystems were mounted (using the kernel boot parameter I posted on earlier) plus on OS boot from the /forcefsck file, no errors found in the filesystems so it is the disk structure that is corrupted using qemu-img to compress a qcow2 file.

Workaround

The only workaround I could find to reclaim space was to copy the image between formats without compression as below

  1. take a qcow2 disk image that ‘qemu-img check’ reports has no issues (‘qemu-img check diskname.qcow2’ to ensure no errors)
  2. convert the qcow2 file to a RAW file format (‘qemu-img convert -f qcow2 -O raw diskname.qcow2 diskname_compressed.raw -p’)
  3. convert the RAW file back to qcow2 format (‘qemu-img convert -f raw -O qcow2 diskname.raw disknamenew.qcow2 -p’)
  4. check the new qcow2 file to ensure there are no errors (‘qemu-img check disknamenew.qcow2’)
  5. and the file should have no errors, I have been running a VM with a qcow2 image shrunk this way without any IO errors for 4hrs without any IO errors (they normally occur within a few hours with a corrupt image)

The only issue with this approach of course is that there is no compression, you can only reclaim space used by the copy-only portion of the disk image so this approach will only reclaim space if you have been working with some huge files. Below is what I achieved using this method; I shaved off around 50Gb off a disk image (compression saved around 75Gb but as noted since F29 onward compressed disks are corrupted and unusable).

[root@vmhost1 VM_disks]# ls -ltr *osprey*
-rw-r--r--. 1 qemu qemu 103274446848 Nov 17 14:53 osprey_good.qcow2
-rw-r--r--. 1 qemu qemu 107374182400 Nov 18 11:48 osprey.raw
-rw-r--r--. 1 qemu qemu  47915925504 Nov 18 19:43 osprey.qcow2                     
-rw-r--r--. 1 root root  26832449024 Nov 18 19:36 osprey_compressed.qcow2          # corrupted
[root@vmhost1 VM_disks]# 

Update 18 Nov 2019

The Fedora 31 repositories have been updated with a new version of the qemu-img package.

  Upgrading        : qemu-img-2:4.1.0-6.fc31.x86_64      # new version is even worse
  Cleanup          : qemu-img-2:4.1.0-5.fc31.x86_64      # definately buggy
  Running scriptlet: qemu-img-2:4.1.0-5.fc31.x86_64   
  Verifying        : qemu-img-2:4.1.0-6.fc31.x86_64  
  Verifying        : qemu-img-2:4.1.0-5.fc31.x86_64  

Upgraded:
  qemu-img-2:4.1.0-6.fc31.x86_64                                                                                                            

The new version of qemu-img creates a compressed qcow2 file that ‘qemu-img check’ reports no errors on the new disk. However it is worse than the version it replaced, compressed disks output by the latest version start causing IO errors almost immediately, and after the first boot it is impossible to boot off the disk again. The major change in the new version seems to be that ‘qemu-img check’ has been changed to report no errors when the disk image is a mess.

So it is still no longer possible to compress qcow2 disk images using qemu-img. My workaround is the only way I can manage my files for now.

I will investigate how this utility works under CentOS7 when I can get a spare physical machine (physical as creating a VM with a 300Gb virtual disk just to try to compress a 100Gb virtual disk is just a waste of space).

Posted in Unix | Comments Off on qemu-img shipped with fedora corrupts qcow2 disk images

Free OpenSource Project Managers for small business and home users running Linux servers, I chose WebCollab

This is by no means a comprehensive post, it only covers three, two I looked at and discarded and the one I finally settled on for home use.

OpenProject – discarded, I did not evaluate it

First there must be a mention made of the OpenProject application. From what I can see it is the most popular and full function application for project management available easily rivalling Microsoft Project. It claims to support both waterfall and agile development.

The reason I did not look at this is that it requires a server running PostGreSQL database, they no longer maintain mysql/mariadb compatability deciding to focus only on PostGreSQL. As I am totally mariadb and do not want any other database installed I could not evaluate this.

Collabtive – discarded, too complicated for home use

Another full featured option is Collabtive which is another highly rated application for project management.

It is easy to install and setup, but it is incredibly difficult to use at first. To setup simply download from https://collabtive.o-dyn.de/ and…

...Unzip the files into a new folder on your webserver
create a directory on your webserver, copy zip file into it, unzip
check instructions in ./doc/ for you language
chmod 777 files template_c config/standard/config.php
chmod -R apache:apache the directory you created

...Create the databases and database user
MariaDB [(none)]> create database collabtive collate utf8_general_ci;
MariaDB [(none)]> create user 'collabtive'@'localhost' identified by 'insertagoodpasswordhere';
MariaDB [(none)]> grant all privileges on collabtive.* to 'collabtive'@'localhost';
MariaDB [(none)]> FLUSH PRIVILEGES;
MariaDB [(none)]> \q

...Initial installation and configuration
Point your web browser at the web location you installed it to and the install.php
use mysql dbname.user/dbpassword you set above
create the first admin user and password 
login as that admin user
select spanner icon, and add customer, add a customer (Internal) -- a customer is needed to add a project
select spanner icon, and add user icon, add a new non-admin user
logout of the web application

...secure the config file
cd back to install directory and chmod 755 config/standard/config.php

 

At this point you can logon to the web application as the non-admin user you created.

I found this incredibly complicated, for example I could create a project and task but then find no way to edit or delete the task entry. It was only when about to uninstall it I somehow clicked on a time-tracker page where you could enter start/stop times for things you are working on.

It is targeted at businesses. When creating a new customer you can set things like hourly rates per customer, and as projects can only be created for a customer I guess everything can be magically calculated. If you are a business that can take the time to figure it out and train staff then this may be the one for you, but it is not for home users.

Webcollab – I have started using this one

The Webcollab application is totally intuitive and easy to use, ideal for home users/developers to manage their activities and even small businesses that cost a ‘total project’ so don’t need to track every minute spent working on a project.

Not only is it easy to use, features include

  • the home page for a user by default lists all active projects and percentage complete that they are able to work on and all uncompleted tasks associated with those projects, so it is easy to see what is happening (screen shot at the end of this post)
  • by default only active projects are shown on a users home page, but all completed projects can be also be reviewed simply by selecting the ‘show all projects’ option at the top of the home page
  • agile workflow supported by tasks (and projects) having a ‘I don’t want it’ option, so if a project owner has been assigning tasks randomly a user can un-assign the task from themselves and let someone else in the taskgroup pick it up; plus any other user in the group can select ‘take over task’ to assign it to themselves if they have free time. Plus of course as each project and task has a public user forum section notes and comments can be shared amoungst team members
  • a notes/forum section for each project and each task that all members of the group can contribute to during the project life
  • files and notes associated with the project can be attached at the project or even individual task level
  • every action you would ever need to take for a project or task is on the appropriate page, no need to hunt for frequently used actions in menubars
  • it is not customer-centric, roles are defined by usergroups and taskgroups, a usergroup owns a project and a task can be assigned to usergroups and taskgroups, presumably to allow the usergroup to keep an eye on it while an unrelated taskgroup can view the task and work on it
  • admin level users have an option on their home page to view all uploaded files, so if somebody gets too enthusiastic with large files an admin user can clean up
  • a simple calendar view available for users showing which tasks are due soon
  • it is easy to edit project and task details at any time, as well as add new tasks

Some very minor irritations (not issues, it is working as designed)

  • if you attach a file to a task entry it is only visible when viewing that task, not from the files displayed for the project entry. My current resolution is just to attach files to the project itself and not the task, for small projects anyway
  • I had a project at 75% complete with no remaining tasks showing. The issue was that even though the remaining task was assigned to me I could not see it; I had to use the admin user to edit the task and mark it as viewable by all users before my non-admin userid could see and update the task. As all other tasks were OK I must have been playing with default check-boxes or did not assign a correct user or task group when adding that one, but it is important to note that projects/tasks can be assigned to (and only visible to) customer and task action groups which has the benefit of keeping a users home page of projects and tasks reasonably clean and actually a good design, if you add the task correctly in the first place

Installation is simply a case of following the instructions at https://webcollab.sourceforge.io/manual_install.html with one exception.

MariaDB [(none)]> create user 'webcollab'@'localhost' identified by 'somegoodpassword';
MariaDB [(none)]> create database webcollab collate utf8_general_ci;
MariaDB [(none)]> grant all privileges on webcollab.* to 'webcollab'@'localhost';
MariaDB [(none)]> FLUSH PRIVILEGES;
MariaDB [(none)]>\q
 
This step in the instructions will fail
[root@vosprey2 db]# cd db
[root@vosprey2 db]# mysql -uwebcollab -psomegoodpassword < schema_mysql_innodb.sql
ERROR 1046 (3D000) at line 2: No database selected

replace that step with
[root@vosprey2 db]# mysql -uwebcollab -psomegoodpassword
MariaDB [(none)]> use webcollab;
MariaDB [webcollab]> \. schema_mysql_innodb.sql
MariaDB [webcollab]> \q
Bye

And continue on

[root@vosprey2 db]# cd ../..
[root@vosprey2 php]# chown -R apache:apache webcollab-3.50
[root@vosprey2 php]# chcon -R system_u:object_r:httpd_sys_content_t:s0 webcollab-3.50

Manually Customise the config file for your site

[root@vosprey2 db]# cd webcollab-3.50/config
[root@vosprey2 db]# vi config.php      (and set the correct values for the below blank entries)
  define('BASE_URL', "" );
  define('DATABASE_NAME', "" );
  define('DATABASE_USER', "" );
  define('DATABASE_PASSWORD', "" );
                                       (also change the below for your site)
  define('FILE_BASE', "/var/www/html/webcollab/files/filebase" );

Web Browse to your sites http://..../webcollab-3.50/index.php page

Follow instructions for 'edit user details' to change admin password
Follow instructions for updating the 'Admin Config'
Add a new usergroup ('Internal') with the default member of admin
Add a new taskgroup ('Internal')
Add a new non-admin user with the only available usergroup you have just created ('Internal')

 

You can now login to the web application as the non-admin user and start creating projects and tasks for those projects.

A users Webcollab home page, all active projects and all uncompleted tasks visible to the user for those projects are shown in one place.

Posted in Unix | Comments Off on Free OpenSource Project Managers for small business and home users running Linux servers, I chose WebCollab

Obtaining CherryTree under f31

I, like many people, when Fedora no longer included the essential Desktop note taking app BasketNotes in its repositories migrated to the CherryTree application which was functionally compatible with BasketNotes and could import the BasketNotes data files, so no data was lost during the application migration.

CherryTree used to be available in the Fedora repositories until F30.
Not only is this no longer in the Fedora repositories (from F31+) but due to dependencies must be uninstalled from a F30 system before an upgrade to F31 will succeed, fortunately with no loss of user data.

While it is not in the F31 repositories it can still be installed via COPR.

The F30 RPM information is

[root@phoenix xfer]# rpm -qi cherrytree-0.38.5-5.fc30.noarch
Name        : cherrytree
Version     : 0.38.5
Release     : 5.fc30
Architecture: noarch
Install Date: Mon 04 Nov 2019 12:30:48 NZDT
Group       : Unspecified
Size        : 3919968
License     : GPLv3+
Signature   : RSA/SHA256, Thu 07 Feb 2019 17:22:51 NZDT, Key ID ef3c111fcfc659b9
Source RPM  : cherrytree-0.38.5-5.fc30.src.rpm
Build Date  : Fri 01 Feb 2019 04:33:33 NZDT
Build Host  : buildhw-03.phx2.fedoraproject.org
Relocations : (not relocatable)
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : http://www.giuspen.com/cherrytree/
Bug URL     : https://bugz.fedoraproject.org/cherrytree
Summary     : Hierarchical note taking application
Description :
CherryTree is a hierarchical note taking application, featuring rich text and
syntax highlighting, storing all the data (including images) in a single XML
file with extension ".ctd".

The URL referenced by the package has a download page, that shows the workaround for F31+.

dnf copr enable bcotton/cherrytree
dnf install cherrytree

So F31 users can still use the CherryTree application using the COPR version. Also when running the COPR version the users local settings from th previous CherryTree install still exist and it happily opens up the last note taking session you were using.

To the best of my knowledge there is no similar application available.
And the COPR repository page https://copr.fedorainfracloud.org/coprs/bcotton/cherrytree/ already showed 43 downloads within a few days of the release of f31 it shows a lot of people are actively searching for the location of this application.

Update 20 Jan 2020 The application developer found time to update the app and CherryTree is now in the Fedora repositories again.

Update 14 May 2020 CherryTree is not available in the Fedora32 repositories, and no longer available in copr. To use CherryTree on F32 you must use the snap packages application. How to do this is documented at https://snapcraft.io/install/cherrytree/fedora however you must perform the steps “setenforce 0” as the selinux rules downloaded as package snapd-selinux when you install snapd from the f32 repo do not work, then “systemctl enable snapd.seeded” and “systemctl start snapd.seeded”, then periodically “systemctl status snapd.seeded” until you get the message “systemd[1]: Finished Wait until snapd is fully seeded.”; only then can you use “snap install cherrytree” without getting the error message “error: too early for operation, device not yet seeded or device model not acknowledged”. You must also add /var/lib/snapd/snap/bin to your path. Cherrytree will then install, unfortunately missing fonts needed to be usable… also with lots of errors about cgroups2 not being fully supported. So… your choices are wait until it may appear in the f32 repos or wait until it is correctly packaged for fedora as a snap application.

Posted in Unix | Comments Off on Obtaining CherryTree under f31

How to fsck a Linux system at boot time

The old method of creating a file with “touch /forcefsck” should still work on modern systems even though it is a hangover from the old sysvinit days, however it obviously relies on the root partition being mountable in order to read that flag.

I have found a new method mentioned on lists.fedoraproject.org which I can confirm works perfectly on Fedora(30).

At the grub boot menu, use “e” to edit the boot entry to be used, and at the end of the boot parameter line add fsck.mode=force , this will fsck all the filesystems which is useful if the root partition was not mountable at boot time.

Posted here for my later reference as it is the sort of thing I will need again.

Posted in Unix | Comments Off on How to fsck a Linux system at boot time

Fedora31 released, buggy as always

In the upgrade from f29 to f30 I lost ownCloud, which is not supported with the version of php shipped with f30. While a pain I could live with that.

However in the upgrade from f30 to f31, while it was a painless upgrade, yet more essential tools are no longer supported.

The one that is a show-stopper for me is that Docker is not supported under f31 due to changes in cgroups. This was a deliberate choice my the Fedora release team documented at https://bugzilla.redhat.com/show_bug.cgi?id=1746355. Additionally the workaround of adjusting kernel parameters in /etc/sysconfig/grub does not work for me, so no Docker.
Looks like podman is supported for containers (uses runc instead of the docker daemon) but that is a huge change in workflow and results in no global overview of what containers are running.

Fedora is losing relevance.

* Linux is the kernel
* a working unix operating system (that many call Linux) is the kernel plus GNU utilities
* a useable system is a working OS plus useful utilities that enable work to be done

As each release of Fedora causes the loss of more useful (even essential) utilities it?s relevance as a viable linux solution decreases.

At this point, due to the additional tools provided by rpmfusion, it is still a viable Desktop only solution, although useful tools are being dropped from there also; for example I had to replace basketnotes with cherrytree when basketnotes was dropped from the repositories, and now cherrytree is also gone in f31, leaving me no choice but to keep my desktop and laptop on f30.

But Fedora is no longer suitable as a server operating system, as more and more essential tools will be dropped over time.

The server I upgraded to f31 has been reverted to f30 so the containers I use can keep running while I embark on my next large project, migrating a web server that has been running on fedora since fedora-core-3 onto a CentOS environment; as I need a working server environment that is stable, or at least does not break software as often as the Fedora quick release cycle does.

Posted in Unix | Comments Off on Fedora31 released, buggy as always

Using apache rewrite to automatically add iptables drop rules

In these days of firewalld not many people still use native iptables rules, but they certainly still have their place. I still use them on my main webserver simply because of the ease with which new drop rules can be added.

My old method of having all the rules in a simple script file and reloading everything when a new DROP rule needed to be added had to be thrown out when I started using docker, as my method flushed all chains including the docker chain which of course immediately started causing problems. As that makes my post of many many years ago on dynamic blacklisting out of date, so I thought I would document the main changes I had to make here for later reference as a glimpse into my reasoning for doing so.

First I documented the existing firewall chains in a flow diagram, made easy as the script file was of course fully commented :-), I then decided rather than updating existing chains I would insert a new one, so I

  • created a completely new chain called “blacklist”. I decided to insert this chain into the chaining path used for external internet traffic only, it was jumped to instead of the normal chain for this path and if there were no drop matches it then jumped onward to the origional chain for this path
  • changed my apache rewrite rules script to flush only the contents of that chain when replacing blacklist DROP rules, and of course reload the rules within that chain
  • changed my daily batch job that removed duplicate DROP entries and adds new rules based on apache logfile scans to only work on that new blacklist chain

I had issues getting that working, due to a completely unrelated change, I had switched back to SELinux enforcing mode. In SELinux enforcing mode despite that fact there are zero AVC denies in any of the audit logs it is not possible to use the “sudo” command. I had forgotten about that, but after a bit of debugging tracked it down. Spent ages trying to figure out what SELinux was doing but with no denies being logged anywhere got nowhere, and reverted to SELinux permissive mode.
Normally that would not be an issue as I have regular batch jobs checking for denies to determine if new rules are needed to accomodate changes I make, in this case as there are no denies it is an issue to investigate later, in a seperate post.

In this case after reverting to permissive mode everythging appeared to be working as expected. At least the apache logs showed requests for resources that were obviously hacking attempts were being added to the new blacklist chain. There were however two issues with my placement of the new chain, which were

  • a minor issue, another chain depth added which was hard to describe with script comments so relied on the flow diagram to show where it fitted (the flow diagram has been made part of the script comments)
  • a slightly larger issue was as it was in the internet traffic chain I could not actually test it was working without leaving the house and connecting to a public wi-fi network

So I just moved the blacklist chain location directly onto the top of the global INPUT chain and and the end of the drop statements replaced the jump to another chanin with a simple jump RETURN back to the input chain. A minor benefit of that is a simple jump/return at the top of the INPUT chain makes the flow diagram and commenting in the script much easier, the major benefit of course is that as the chain is used on all INPUT not just that filtered as being from an internet source I am able to test the automatic blacklisting is working from my internal network. While a downside is that the drop rules are now parsed even for my trusted internal network iptables processing is fast and has no visible impact on normal traffic.

So, my setup for web server monitoring and automatic blacklisting is now

  • the main firewall setup script run at server boot time to setup the iptables chains and rules, including the blacklist chain and DROP statements from the existing file built from the steps below
  • apache rewrite rules in the httpd configuration, these rewrite rules are in a seperate file sourced into the httpd.conf with an include statement; having them in a seperate file allows them to be managed by puppet and deployed to any server I feel needs them. The rewrite rules trigger a script to automatically blacklist the ip-address that triggered a rewrite rule
  • the above mentioned script logs why the ip-address was blacklisted, adds an iptables command to add a DROP for that source ip-address to the end of a list of blacklisted ip iptables commands, and uses “sudo” to run a seperate script to update the iptables blacklist chain (as of course the apache user cannot do that itself)
  • the script run by “sudo” that flushes the blacklist chain rules and reloads the DROP statements from the file build by the rewrite rules script and add the needed final RETURN statement for that chain
  • yes, there has to be one. A batch job that runs daily to remove duplicate ip-addresses from the file built by the rewrite rules, plus it also scans the apache access and error logs searching for other messages that indicate the need for ip-addresses to be blacklisted such as clients trying to force negotiation downgrades. That batch job then runs the script run above to flush the blacklist chain rules and reloads the DROP statements from the file built by the rewrite rules script and add the needed final RETURN statement for that chain

The above allows almost real-time blacklisting of client ip-addresses that attempt to access URLs that are obviously hacking attempts.
Not quite real-time in that it takes time for the scripts to run and hacking attempts hit with quite a few requests per-second which results in 4-5 duplicate ip-addresses added to the blacklist iptables command file before the first trigger completes and blocks it, which is why one of the tasks in the batch job is to clean up duplicate ip-addresses. And the batch job of course also is able to scan for problems in the apache error logs which the apache rewrite rules cannot detect.

By using a seperate chain I can flush and change the rules as needed without upsetting the chaining added/deleted as the docker service start/stops; which was the goal of my changes.

Obviously you should not reply just on apache log files to detect hacking attempts, even in the case where you only have http ports exposed to the internet. I would recomend at a minimum the use of the community edition of “snort” (even though you need to compile it and some pre-requisites from source) as an additional logging mechanism. In my environment where I run the webserver in a KVM instance the below message was logged by snort on both the host machine and the guest KVM instance running the webserver, which is fortunately not Drupal :-),


10/03-20:07:09.079690  [**] [1:46316:4] SERVER-WEBAPP Drupal 8 remote code execution attempt [**] [Classification: Attempted Administrator Privilege Gain] [Priority: 1] {TCP} 122.114.49.139:41577 -> 192.168.1.189:80

so additional tools are always of use, never reply on just one. In my case I run snort on all my machines and KVM instances with a weekly batch job to refresh the rules and a custom nagios plugin to monitor when alerts are written to the snort alert file so they are detected quickly in a totally hands off way.

Examples

A few cut down examples of how to set up automatic blacklisting. These are not working examples, although they probably will work. These are “cut-down” examples based on my scripts with functions inlined, directory paths changed, and all error handling removed in order to make them more readable for this post. These examples have not been tested, as I obviously use my much larger and much more complex scripts.

Also all these examples assume you are familiar enough with iptables to have created the “blacklist” chain used. Most simply to create an empty chain that simply returns to the calling chain it would be “iptables -N blacklist”, “iptables -A blacklist -j RETURN”. To implement the chain ideally you would append a jump to the chain in the section of your undoubtably complex iptables rules that suits you best, to simply make it the first entry in your INPUT chain as I do the very first entry for the INPUT chain should be “iptables -A INPUT -j blacklist”. Should you have failed to correctly script your rules you can force the blacklist chain to be the first entry in the input chain by using “iptables -I INPUT 1 -j blacklist” to insert it above rule 1 in the INPUT chain, but you should really make sure you underrstand your scripts and insert it correctly where you wish it to be in the first place.

You will also see in the examples I create the files I use under the webserver directory, that is definately not recomended but will work. It is recomended to place any sensitive file outside your webroot, but as this post is not going into things like the httpd servervice PrivateTmp flag and complicated SELinux rules the examples are in a way that should just work.

apache rewite rules example


     RewriteEngine on
     RewriteCond %{REQUEST_URI} (phpMyAdmin) [NC]
     RewriteRule ^(.*)? /cgi-bin/block_ip.sh [NC,L]
     RewriteCond %{REQUEST_URI} (etc?passwd) [NC]
     RewriteRule ^(.*)? /cgi-bin/block_ip.sh [NC,L]
     RewriteCond %{REQUEST_URI} (http:) [NC]
     RewriteRule ^(.*)? /cgi-bin/block_ip.sh [NC,L]
     RewriteCond %{REQUEST_METHOD} ="CONNECT"
     RewriteRule ^(.*)? /cgi-bin/block_ip.sh [NC,L]
     RewriteCond %{REQUEST_URI} (login.jsp) [NC]
     RewriteRule ^(.*)? /cgi-bin/block_ip.sh [NC,L]
     RewriteCond %{REQUEST_URI} (shell.php) [NC]
     RewriteRule ^(.*)? /cgi-bin/block_ip.sh [NC,L]
     RewriteCond %{REQUEST_URI} (cmd.php) [NC]
     RewriteRule ^(.*)? /cgi-bin/block_ip.sh [NC,L]

script run by apache rewite rules example

script to reload blacklist example

You need to have created a “blacklist” chain in your iptables rules for this to work of course.


#!/bin/bash
# Flush our dynamically changing blacklist chain
iptables -F blacklist
# load all the blacklist rules
bash /var/www/html/data/blacklist.sh 
# Anything not blacklisted RETURNs to the main INPUT chain
iptables -A blacklist -j RETURN
exit 0
#!/bin/bash
BLACKLIST="/var/www/html/data/blacklist.sh" # This should be run by your system firewall startup scripts
LOGFILE="/var/www/html/data/blacklist.log" # This should be automatically checked for changes

# ----------------------------------------------
# Display a web page back to the requesting user
# Yes it needs the blank line after the content type.
# Do this before reloading the firewall rules or
# the hacker will never see it.
# ----------------------------------------------
cat << EOF
Content-Type: text/html
Status: 404 NotFound
<html>
<head><title>Unauthorised hacking attempt detected</title></head>
<body bgcolor="yellow">
<h1>Site access violation</h1>
<hr>
<p>
Unethical activity has been detected from your ip-address, specifically
a request to attempt to access<br />
${REQUEST_URI} <br />which this site considers unacceptable behaviour.
</p>
<p>
Your ip-address <b>${REMOTE_ADDR}</b> has been automatically added to this sites blacklist.
</p>
</body>
</html>
EOF
# ----------------------------------------------
# Now block all traffic to and from the ipaddress
# that triggered this script.
# If the blacklist file exists
# - check if the entry exists, we do not want duplicates
# - if the entry does not exist add it
# - if the entry does exist we should not have triggered, log warning
# ----------------------------------------------
# record into the blacklist file to use on system restarts
echo "/sbin/iptables -A blacklist -j DROP -s ${REMOTE_ADDR}" >> ${BLACKLIST}
# and "sudo" the script to add the updated drop rules file
# /etc/rc.firewall can be run by apache with no password (configured in sudoers file)
/usr/bin/sudo /etc/rc.firewall_blacklist
# All done
exit 0

Batch jobs

You should create batch jobs to remove duplicate ip-addresses from the backlist command script file, and also to create new blacklist drop rules from parsing the apache error logs.

No sample script is provided for that as mine is really to complicated to use as am example, but things you should be looking for in the error logs are messages such as “AH02042: rejecting client initiated renegotiation”, “AH01071: Got error ‘Primary script unknown'”, any error 400s for attempts to access pages that do not exist, and many other conditions depending on what your environment is. And yes you should have a batch job for it, you do not really want to look at the logs every day, just check them periodically for new conditions to add into checks in your batch job.

Update 11Dec2019

A faster and more efficient way of automatically blacklisting hackers is to make the apache configured 404 (page not found) error handler also use the blacklist cgi script when a page is not found rather than display a static error page.

That is what I have moved to now, after extensive website scanning to try to locate bad links before doing so. This method is a lot more effective as it will capture all the hacking attempts not in any rewrite rules in real time.

There is a lot more manual effort required afterward however, as you must review the blacklist logs often to determine if it was actually is a bad link on the website causing the error (where the user needs to be unblocked and the bad link fixed). In a well maintained website there should be no bad links of course, but inevitably they will pop up occasionally.

Posted in Automation, Unix | Comments Off on Using apache rewrite to automatically add iptables drop rules

Docker and issues with using minimal Fedora base images

It is recomended when creating docker images that minimal images be used. In the case of Fedora it is recomended that for smallest images the minimal image should be used and “microdnf” be used instead of the full blown “dnf” package manager.

One major issue I have hit with using microdnf is that if a package in the main Fedora repositories is downgraded to a version below that shipped with the minimal image “microdnf” cannot handle downgrading packages when required.

Re-doing a build that had been working for over four months it suddenly started failing with a conflict. The error I hit is below.

.....
Upgrading:
 glibc-2.29-22.fc30.x86_64                             updates      4.2\xc2\xa0MB
 glibc-common-2.29-22.fc30.x86_64                      updates    858.5\xc2\xa0kB
 glibc-minimal-langpack-2.29-22.fc30.x86_64            updates     48.4\xc2\xa0kB
 libxcrypt-4.4.10-1.fc30.x86_64                        updates    125.3\xc2\xa0kB
Downgrading:
 libstdc++-9.0.1-0.10.fc30.x86_64                      fedora     583.9\xc2\xa0kB
Transaction Summary:
 Installing:      204 packages
 Reinstalling:      0 packages
 Upgrading:         4 packages
 Removing:          0 packages
 Downgrading:       1 packages
Downloading packages...
Running transaction test...
error: Error running transaction: package libstdc++-9.1.1-1.fc30.x86_64 (which is newer than libstdc++-9.0.1-0.10.fc30.x86_64) is already installed

The “microdnf” tool does not support usefull options “dnf” supports for resolving conflicts, in the case of conflicts like this the only solution is to switch to using the full “dnf” package. It requires changing the Dockerfile from…

FROM registry.fedoraproject.org/fedora-minimal:30
...lots of stuff...
RUN microdnf -y install perl procps-ng vim-minimal && microdnf clean all

to the below…

FROM registry.fedoraproject.org/fedora-minimal:30
...lots of stuff...
RUN microdnf -y install dnf
RUN dnf -y --allowerasing --obsoletes install perl procps-ng vim-minimal && dnf clean all && microdnf clean all

This results in a image size around 42Mb larger than just using microdnf, but it is unfortunately the only way to handle the issue.

Posted in Automation, Unix | Comments Off on Docker and issues with using minimal Fedora base images

Installing a F30 network install and recovery server

In these days of cloud images being launched at the push of a button, and customised via heat patterns or user configuration scripts, network install via pxe boot seems to have dropped out of the news. As most home users will be using KVM virtual machines rather than running their own cloud infrastructure at home it still has a place, if you are creating a lot of new KVMs or even installing a new physical machine it is much easier to just network install rather than copy install ISO files about.

The documentation at https://docs.fedoraproject.org/en-US/fedora/f30/install-guide/advanced/Network_based_Installations/ provides a good starting guide, with the following main exceptions

  • a major issue is the section on creating a boot menu for UEFI clients is wrong as the directory mentioned does not exist. I believe that file should be /var/lib/tftpboot/pxelinux.cfg/efidefault but am not 100% sure, I am sure following the instructions on the fedoraproject documentation page will not work; at the time this post was written anyway
  • a minor issue is the ‘default’ example used references a kickstart file at example.com, which of course does not exist. Therefore copying the example provided should be considered a non-working example rather than a working solution

This post is based upon the documentation linked above, and provides additional configuration and customisation tips that will make the network install server more useful.

After following the fedoraproject dsocumentation and installing the shim and grub2-efi-x64 into an alternate root, after copying the required files out to the tftpboot directories you can simply remove the entire alternate root directory to clean up the packages without affecting your real root rpm databases.

The documentation does correctly say only one dhcp server should exist per network, so if your router is assigning dhcp addresses you should reconfigure it not to do so before starting the dchp server, or ensure your dhcp server is on an isolated network. Obviously if you reconfigure yiur router to no longer server dhcp assigned addresses that will prevent wireless devices such as your smart phone and smart TVs connecting to the router unless you have them configured with static ip-addresses so you may wish to do as I do and simply turn that feature off on your router only when your tftpboot server is running.

What the documentation does not mention in the example configuring the initial /etc/dhcp/dhcpd.conf is the significance of the “next-server” parameter, this parameter identifies the address of the server tftp will use to download the network boot files which may not be the same server as the dhcp server. It makes sense for it to be the same server but it does not have to be, the tftpboot packages could be installed on a completly different server to the dhcp server.

If you have devices using dhcp that you wish persistent ip-addresses assigned to these should be defined to the dhcp server with entries in /etc/dhcp/dhcpd.conf such as the below for each server, obviously you must use the correct MAC address. This will ensure whenever they request a dhcp assigned ip-address they are always given the same one.


host yourhostname1 {
   hardware ethernet 52:54:00:67:ea:35;
   fixed-address 192.168.1.189;
   option host-name "yourhostname1";
}

The documentation at fedoraproject.org linked to above provides a non-working example for the default file in /var/lib/tftpboot/pxelinux.cfg, but I would also recomend adding another entry to the menu list to allow you to PXE boot a failed server into rescue mode when needed to avoid having to run around looking for a boot DVD. That entry is as below


label rescue30
menu label ^Boot F30 in rescue mode
kernel f30/vmlinuz
append initrd=f30/initrd.img ramdisk_size=9216 noapic acpi=off linux rescue

Also when replacing the non-working “server” entry with a custom one as below I found when testing it that during the install it defaults to being able to install all versions of F30 (desktop, lxe, server etc) from the closest mirror, not just the server software from the intsall media I put in place.
However thats not what I wanted so during install I changes the method from “closest mirror” to “url” and pasted the stage2 url I wanted to use which refreshed the software available list to just whats on the server install media (and allowed a install without going near the internet which is what I wanted).


label server
menu label ^Install Fedora 30 ( Any Flavor )
menu default
kernel f30/vmlinuz
append initrd=f30/initrd.img inst.stage2=http://192.168.1.175/tftpboot/install_ISO/Fedora-Server-dvd-x86_64-30-1.2/ ip=dhcp

The anaconda-ks.cfg file generated by the test install I just copied out to use as a starting point for a ks=xxx file for further tftp hands-off installs for virtual machines (minimal changes needed, such as change “clearpart none” to “clearpart all” and the pv definition being set to “–size=0 –grow” instaead of “–size=nnnn” to handle diferent disk sizes and addind a custom %post section).

Obviously the ‘default’ file needs to be updated to use working install sources, you can create local install sources based on the normal install media as I used in the example above. That is discussed in a later section.

Creating customised configurations for each server

As well as the default configuration file you have created you will probably want to create custom configurations per host; this is achieved by creating a file in that same directory /var/lib/tftpboot/pxelinux.cfg for the MAC address of each machine PXE booting that you would want to customise; the way it works is the PXE boot process will first look for a configuration file specific to the requesting MAC address and use it if found, if one is not found it will revert to using the default.

Using the MAC address in the example above the file we would create for that server would be 01-52-54-00-67-EA-35 and contain something like the below


prompt 1
default linux
timeout 100

label linux
kernel f30/vmlinuz
append initrd=f30/initrd.img ramdisk_size=9216 noapic acpi=off ks=http://192.168.1.175/tftpboot/configs/yourhostname1.cfg

# PXE boot for yourhostname1 mac addr

The obvious reason for creating a file per machine is that you can provide a customised kickstart file for each machine, as shown in the example above. The kickstart file does not have to reside on the tftpboot server, but it would make sense to keep everything controlled from one place. You can also place in a customised kickstart file something like “network –device eth0 –bootproto static –ip 192.168.1.101 –netmask 255.255.0.0 –gateway 192.168.1.1 –hostname=servername –activate” to ensure if you are re-installing a server it keeps the same network configuration.

Kickstart files should be used for servers with complex install requirements, the “%post” section of the kickstart file can do anything a shell script can do, customise configuration files, add additional repositories and packages, wget database backups or normal backups and recreate databases and filesystems etc. At one point my entire webserver was re-built that way with a simple reboot used to start at formatting the disks and recreate the entire environment from backups and was the method I used to migrate new changes from test to production and simply overwrite anything a hacker may have introduced, so kickstart is very powerfull.

In these days of software configuration management a Kickstart file can be as simple as the below, set the network config, set a root password, format the disks and install the boot loader, install minimal packages, then just start the puppet agent and let it install additional packages and do all the customisations… although if you do not have a handy puppet/chef/ansible server to do your configuration for you it can easily be done in scripting in the “%post” section, I have used kickstart files with well over 100 lines of scripting with no problems.

However you do need a kickstart config per server you intend to ‘hands-off’ install this way; depending on whether you installing a physical machine or KVM and even on hardware things like disk names change (hda/sda/vda) and ethernet names change, will not always be eth0.


#version=DEVEL
ignoredisk --only-use=vda
# System bootloader configuration
bootloader --location=mbr --boot-drive=vda
# Partition clearing information
clearpart --all --initlabel
# Use graphical install
graphical
# Use network installation
url --url="http://192.168.1.175/tftpboot/install_ISO/Fedora-Server-dvd-x86_64-30-1.2/"
# Keyboard layouts
keyboard --vckeymap=us --xlayouts='us'
# System language
lang en_NZ.UTF-8

# Network information
network  --bootproto=dhcp --device=link --gateway=192.168.1.1 --hostname=localhost.localdomain --nameserver=192.168.1.1 --activate
# Root password
rootpw --iscrypted $6$AdsjmM2lq//fLiLu$n.Fx7hdO.inVPNsfqCRVsLv9QCYL5I0dcJcxjyZu766qOaGTd/0FSXPRzS8O2VDJAj9OOovEINycMiwuEKHiK/
# Run the Setup Agent on first boot
firstboot --enable
# Do not configure the X Window System
skipx
# System services
services --enabled="chronyd"
# System timezone
timezone Pacific/Auckland --isUtc
user --groups=wheel --name=mark --password=$6$s0l.7uikser6VGT5$PPuBEBS7aOrctU6Pr1HyP8DwUCyemRHTegQ5G9rEjMMNKjv530DSJtOQ8CTT5.XQhNMKQ9iWKAvOX3roLSSiR1 --iscrypted --gecos="Mark Dickinson"
# Disk partitioning information
part pv.111 --fstype="lvmpv" --ondisk=vda --size=0 --grow
part /boot --fstype="ext4" --ondisk=vda --size=1024
volgroup fedora_server00 --pesize=4096 pv.111
logvol swap --fstype="swap" --size=1024 --name=swap --vgname=fedora_server00
logvol / --fstype="ext4" --grow --size=1024 --grow --name=root --vgname=fedora_server00

%packages
@^server-product-environment
@editors
@guest-agents
@headless-management

%end

%addon com_redhat_kdump --disable --reserve-mb='128'

%end

%anaconda
pwpolicy root --minlen=6 --minquality=1 --notstrict --nochanges --notempty
pwpolicy user --minlen=6 --minquality=1 --notstrict --nochanges --emptyok
pwpolicy luks --minlen=6 --minquality=1 --notstrict --nochanges --notempty
%end

%post
(
dnf -y install rsyslog
systemctl enable rsyslog
systemctl start rsyslog
) 2>&1 >> /root/custom_install.log
%end

Obviously for interactive installs you do not need a kickstart file, not a unique MAC based configuration file. If there is no MAC named configuration file in /etc/tftpboot/pxelinix.cfg for a server the PXE boot will use the ‘default’ entry you have created which will perform a normal interactive install.

Using local install media

And back to that ‘default’ configuration. You will have seen when you copied the example from the fedoraproject website that the install sources are at download.fedoraproject.org (and at least one is non-working). If you are planning on doing a lot of installs (which you will be doing if you are testing kickstart files) you should create copies of the install media under your /var/lib/tftpboot directory to be accessed by URL so you can perform local installs.

You can in most cases just copy the contents of the install media into a local directory with a meaningful name. One thing to watch out for in doing so is that a ‘cp -rp’ will not copy all the files, there is a .treeinfo file on the server install DVD that must be copies specifically as the cp omits copying that hidden file. Another issue to be aware of is that the live desktop install media Fedora-WS-Live-30-1-2.iso only supports UEFI systems and will refuse to run on a bios system, it does not has a compatible .img file in the images directory for bios machines which is a pain.

To make your local copies accessable by URL simply install httpd and create the file /etc/httpd/conf.d/tftpboot.conf that contains the below and start the httpd service.



   Alias /tftpboot /var/lib/tftpboot


    AllowOverride None
    # Allow open access:
    Require all granted

Additional information you need to know

By default fedora server installs block all ports not explicitly opened, so you will need to “firewall-cmd –add-service tftp –permanent”, and “firewall-cmd –add-service tftp” to start using it immediately without restarting firewalld to pick up the new permanent rule. Also if you are using the same server to provide the install images you need to do the same for the “http” service.

There are no selinux rules to support tftp booting, quite possibly because there are so many different customisations you can do it would be impossible to cater for all uses, so they don’t. Despite this I did make an effort to create rules for my use, to the point there were zero AVC denial entries being logged, but I could still only get it working after a “setenforce 0”; so if you want to use network boot for installs ensure you have selinux in permissive mode.

And finally, you will have created lots of nice menu entries in your /var/lib/tftpboot/pxelinux.cfg/default file; but do not expect to see a menu. All you will see is a “boot:” prompt, at which point you can type in the entry you wish to use such as “server” to select from the menu… so you need to remember what they are, so don’t have too many entries in your default file :-).

And the bigest limitation of a network install is that Fedora30 needs at least 2Gb of memory assigned to the server being network booted, 1.5Gb is not enough to unpack the initramfs to perform the install.

Posted in Automation, Unix | Comments Off on Installing a F30 network install and recovery server

Docker container network isolation can be a pain.

I have been embarking on an exercise to migrate some of the smaller applications I use into Docker containers.

This is the reverse of my prior more secure approach where I wanted 3rd part apps that may be insecure but were internet accessable to be isolated in their own KVM instances to prevent any impacts to other services should they indeed prove to be insecure and let someone into the system, in fact many of my KVMs were created specifically to isolate such applications and move them off my main web server. However memory is finite, and with each OS upgrade each KVM needs even more resources to run just the operating system so I have hit a limit and need to consolidate again, and Docker containers seem to be the way to go.

So far I have containerised my ‘live’ hercules system (no savings as it already ran on the web server, and worse additional overhead of docker running on the web server) plus my IRC server (inspirecd) I have containertised and decomissioned the dedicated KVM instance for that (savings in KVM being decomissioned that had 256Mb memory allocated and was swapping badly and I can cap the docker container at 20Mb and it runs fine…and I prefer to run my own customised one but for those interested there is an official container image).

Trying to get a container that needs ip forwarding to work, reverse route needed

The next KVM instance I want to migrate is my openvpn server, as from working on it to date the entire thing runs in a docker container capped at 32Mb, so being able to decommission that server would be of benefit also. However it obviously needs to be able to pass through traffic from the openvpn network and the internal network, which docker is not keen on without a bit of tweaking.

As I want the container image to be portable, and obviously not to contain any server keys, the container requires that it’s /etc/openvpn directory be provided as a read only overlay which not only makes it portable but allows configurations to be switched/tested easily (ie: source filesystem can be /home/fred/openvpn/files_server1 or /home/fred/openvpn/files_server2 for the overlay on the docker run command without the container image needing to change). This allows the container (or multiple of the same containers) to be started on any docker host with customised configurations managed outside of the image.

And that has made finding this issue easy. To assist with tracking down the issue I simply installed docker on the existing VPN server and can test the docker image there, easy in that the standalone and container openvpn processes are on a host with exactly the same host and networking configuration plus exactly the same configuration files. Standalone works and containerised needs extra steps on the docker host.

In the following standalone configuration everything works perfectly (for ping).

  • The openvpn server can ping and connect to everything in the 10.0.1.0/24 network
  • The openvpn server can ping and connect to external network addresses via the 10.0.1.0/24 networks external gateway
  • Clients connected to the server can ping and connect to all addresses available in the above two points

                                    +----------------------------------+
                                    |                                  |
                                    |         OPENVPN SERVER           |
                                    |                                  |
Outside world< -->10.0.1.0/24 < --->|      eth0               tun0     |< ----------->vpn clients
                    network         | 192.168.1.173         10.8.0.1   |            tun0 10.8.0.n
                                    |  gw routes to                    |
                                    |  10.0.1.0/24                     |
                                    |                                  |
                                    +----------------------------------+

However in the following container configuration ping responses cannot traverse the return path unless a static route is added on the docker host routing into the container.

Note that this is the same server used above, and while the application has been moved into a container the configuration files are identical as they are provided by a filesystem directory overlay to the container, the container uses exactly the same files as in the non-container example above. Absolutely everything is identical between the two configurations apart from the docker ‘bridge’ connection between the container eth0 and the docker host docker0… which is working OK as the container itself can see the external networks.

  • The openvpn container can ping and connect to everything in the 10.0.1.0/24 network
  • The openvpn container can ping and connect to external network addresses via the 10.0.1.0/24 networks external gateway
  • Clients connected to the server can ping the tunnel interface and the container 172.17.0.2 interface but cannot ping the host docker0 or any other external to the container addresses the container itself can ping.
    Fixed: on docker host a route needs to be added to the container ip running the openvpn network (in the example below ‘route add -net 10.8.0/24 gw 172.17.0.2’ allows all servers within the 10.0.1.0/24 network to be pinged)

                                      +--------------------------------------------------------------------+
                                      |                                                                    |
                                      |                  OPENVPN SERVER / DOCKER HOST                      |
                                      |                                                                    |
                                      |                         +----------------------------------+       |
                                      |                         |                                  |       |
                                      |     eth0                |          OPENVPN CONTAINER       |       |
                                      |  192.168.1.173          |                                  |       |
                                      |    gw routes to         |                                  |       |
Outside world< -->10.0.1.0/24< ---->  |  10.0.1.0/24            |        eth0             tun0     |< ----------->vpn clients
                    network           |                         |     172.17.0.2        10.8.0.1   |       |    tun0 10.8.0.n
                                      |                         |    gw routes to                  |       |
                                      |   docker0 < --bridge--> |    172.17.0.1                    |       |
                                      |  172.17.0.1             |                                  |       |
                                      |                         +----------------------------------+       |
                                      |                                           ^                        |
                                      |                                           |                        |
                                      |  STATIC ROUTE 10.8.0.0/24 gw 172.17.0.2---+                        |
                                      |      (needed for ping reverse traversal)                           |
                                      +--------------------------------------------------------------------+

With the static route added in the container configuration it behaves similarly to a native KVM instance running the openvpn server, and can ping all hosts in the internal 10.0.1.0/24 network.

PING does not mean it is all working

The ‘ping’ traffic returns to the originator by reversing back down the route it travelled to reach the target host, this is unique to utilities like ping and does not apply for normal tcp traffic. Normal traffic requires routing to provide a return search path for traffic, as such it would be easiest to implement an openvpn server on the same host that is the default gateway for all your servers; which is unlikely to happen in the real world.

In the real world I believe each server you would need to access would require a route defined back to the host running the containers 172.17.0.0/24 network plus to the same host for the 10.8.0.0/24 network (or in a KVM simply the 10.8.0.0/24 route) in order for application traffic to find a return path.

Unfortunately in my lab environment I cannot really test that, while I can remove routes to the 10.0.1.0/24 network and test that traffic to that network does enter it via the openvpn connection, as that is an existing network with routing already defined back to my home network via a different path it is impossible for me to reconfigure return routing via the openvpn network to my home network without severely damaging my lab setup.

One other point of note is that the working examples are for a native KVM or physical host running the container.
Almost the same configuration running in a container on an OpenStack KVM host simply does not work; the only difference in configuration is that the container host is on the 10.0.1.0/24 network itself, tcpdump shows the ping tries to leave the openstack kvm to another kvm on the same internal network but never completes the final hop to the target host; this is with security rules allowing all traffic on both hosts. The internal network is the 10.0.1.0/24 network and is the same one that works perfectly for ping on a native KVM host on a seperate network with a route to the 10.0.1.0/24 network, but does not work when the container is run on a OpenStack host in that same network, lots of pain there. This I believe is asn issue with OpenStack networking rather than Docker sinply because it works on a natuve KVM, but as google throws up lots of posts with difficulties with Docker escaping the host machine I cannot be sure.

Starting the container

As the docker host needs to have a route to the container assigned ip-address my script to start the container is shown below, it uses docker exec to extract the ip-address assigned to the container after the container is started, and if the route into the container for the openvpn network does not already exist it is added.


#!/bin/bash
# ======================================================================================
# Start the openvpn docker container
#    which is not as simple as you may think.
#
# Issues: all hosts on the network that are to be accessable via the VPN tunnel need
#         to know to route traffic to the openvpn network address range via the
#         docker host running the container.
#         And the docker host running the container needs to route into the container
#         to access the openvpn tunnel (that is automated by this script so at least
#         ping to all hosts will work as ping traverses back up the route it travelled
#         down so returns to the container host to look for the openvon network,
#         most ip traffic does not behave that way and needs complete routing entries).
# ======================================================================================

# --------------------------------------------------------------------------------------
# Start the Docker container
# * the configuration files for /etc/openvpn are supplied by a read only overlay
# * we also need to overlay the modules directory as the docker container `uname` is
#   the value of the docker host, not the value used to build the container image,
#   also read only.
# * we need access to the docker hosts tun device, and capability SYS_MODULE to load
#   the tun driver in the container.
# --------------------------------------------------------------------------------------
docker run -d --memory=32m --cpus="0.3" --rm --name openvpn1 -p 1194:1194 \
  --cap-add=SYS_MODULE --cap-add=NET_ADMIN \
  --device /dev/net/tun:/dev/net/tun \
  --network bridge \
  -v /home/fedora/package/openvpn/files/etc_openvpn:/etc/openvpn:ro \
  -v /lib/modules:/lib/modules:ro \
  openvpn

# --------------------------------------------------------------------------------------
# Docker host needs routing back to the openvpn network via the etg0 interface that
# docker assigned to the container. Use 'docker exec' to query the container for the
# ip address that was assigned.
#
# If a route already exists, we need to do nothing.
#
# ping replies that traverse back up the network path need to be able to route via
# the openvpn network, the docker host does not know about it so add a route.
# Clients can now get replies to pings...
# HOWEVER while pings traverse back up the path it came down application taffic behaves
# differently and all servers that need to be contacted across tcp will need to be
# able to route traffic via the openvpn network... so the docker host machine should
# ideally be a default gateway for all servers.
# --------------------------------------------------------------------------------------
gw_addr=`docker exec openvpn1 ifconfig eth0 | grep inet | grep -v inet6 | awk {'print $2'}`
if [ "${gw_addr}." == "." ];
then
   echo "**** Unable to obtain eth0 address from within container ****"
   exit 1
fi
gw_exists=`route -n | grep "^10.8.0.0" | grep -v grep`
if [ "${gw_exists}." == "." ];
then
   myuserid=`whoami`
   if [ "${myuserid}." != "root." ];
   then
      echo "---- Performing sudo: Enter your password to add the required route ----"
      sudo route add -net 10.8.0.0/24 gw ${gw_addr}
   else
      route add -net 10.8.0.0/24 gw ${gw_addr}
   fi
fi
gw_exists=`route -n | grep "^10.8.0.0" | grep -v grep`
if [ "${gw_exists}." == "." ];
then
   echo "**** Failed to add reqired route to openvpn network within container ****"
   echo "Manually (as root or using sudo) enter : route add -net 10.8.0.0/24 gw ${gw_addr}"
fi
# --------------------------------------------------------------------------------------
# Done
# --------------------------------------------------------------------------------------

I have yet another unrelated pain

I will need another KVM server if I decommission the existing dedicated VPN server anyway.

The already converted IRC container and mvs38j/TK4- container I can happily run on my web server as ip traffic is only between those and clients.

An openvpn application requires access to the internal network, and I have locked down my webserver with iptables rules that prevent it from initiating any connections to the internal network; which I obviously do not want to change. So if I containerise it, it will still need its own server, so I may as well just continue using the existing working KVM server even though host memory is becoming an issue, as the alternative of running it as a container on the KVM host itself rather than in a KVM while it would release resources is an extremely insecure option I prefer not to consider. If there is misuse I would prefer to kill a KVM rather than a host that would affect multiple KVMs.

Posted in Unix | Comments Off on Docker container network isolation can be a pain.

Docker Isolation, and non-Isolation

Docker is not KVM, there are major security trade-offs with a container, The key ones are shown below.

Processes are not isolated

The processes that are run by containers run for all intents and purposes as processes on the Docker host machine, this is an issue as an admin on the host machine may inadvertently cause failure of applications within a container. For example in a memory shortage situation on the host an admin may kill a process that is a memory hog on the host without realising it is spawned from a container application.

The display below is a process display on the host, every single one of the results (apart from the grep itself) are processes launced from within the container.

[root@vosprey2 log]# ps -ef | grep herc
root      1203  1185  0 15:45 ?        00:00:00 /bin/bash /home/mark/hercules/tk4-minus/start_system_wrapper.sh
root      1240  1203  0 15:45 ?        00:00:00 /usr/bin/su -c bash /home/mark/hercules/tk4-minus/start_system.sh > /var/tmp/hercstart.log 2>&1 mark
mark      1241  1240  0 15:45 ?        00:00:00 bash -c bash /home/mark/hercules/tk4-minus/start_system.sh > /var/tmp/hercstart.log 2>&1
mark      1242  1241  0 15:45 ?        00:00:00 bash /home/mark/hercules/tk4-minus/start_system.sh
mark      1244  1203  0 15:45 ?        00:00:00 SCREEN -t hercules -S hercules -p hercules -d -m hercules -f mark/marks.conf
mark      1246  1244  9 15:45 pts/0    00:00:49 hercules -f mark/marks.conf
mark      3321  1246  0 15:46 pts/0    00:00:00 hercules -f mark/marks.conf
mark      3322  3321  0 15:46 pts/0    00:00:00 /bin/bash /home/mark/hercules/tk4-minus/mark/scripts/printer_interface.sh
mark      3333  3322  0 15:46 pts/0    00:00:00 /bin/bash /home/mark/hercules/tk4-minus/mark/scripts/printer_interface.sh
root     11963 10154  0 15:53 pts/0    00:00:00 grep --color=auto herc
[root@vosprey2 log]# 

Network connections are hidden

In a complete reverse of the above issue, network connections to applications within a Docker container are not visible on the host machine. That can make the job of diagnosing network connectivity issues difficult as the admin would normally look on the host for established tcpip sessions; but established sessions to container applications are not visible on the host.

Refer to the output below. The container application listens on port 3270 but while netstat shows no established sessions in this example a computer remote to the docker host does have an established connection via the host into the container on that port. A docker exec into the application lets us see that established connection from 192.168.1.187 and it is dangerous to not have that connection displayed on the docker host. The last command in the output below was run on the computer that established the session, it correctly shows it is connected to the docker host (189) to access the port mapped to the container even though displays on the docker host do not show the connection.

By dangerous I mean in a large environment if there are network issues to be resolved an admin cannot be expected to attach to or exec into every container to see what established connections exist on a host (assuming the container is even built to contain the netstat command), not having established connections to a host displayable on the host I would consider a serious issue. And a ‘ip netns’ shows no network namespaces are in use so I really don’t know how the connection is being hidden.

[root@vosprey2 log]# netstat -an | grep 3270
tcp6       0      0 :::3270                 :::*                    LISTEN     

[root@vosprey2 log]# docker exec -ti mvs38j1 /bin/bash
bash-5.0# netstat -an | grep 3270
tcp        0      0 0.0.0.0:3270            0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:3270          127.0.0.1:39892         ESTABLISHED
tcp        0      0 127.0.0.1:39896         127.0.0.1:3270          ESTABLISHED
tcp        0      0 172.17.0.2:3270         192.168.1.187:48842     ESTABLISHED
tcp        0      0 127.0.0.1:39892         127.0.0.1:3270          ESTABLISHED
tcp        0      0 127.0.0.1:3270          127.0.0.1:39896         ESTABLISHED
tcp        0      0 127.0.0.1:39894         127.0.0.1:3270          ESTABLISHED
tcp        0      0 127.0.0.1:3270          127.0.0.1:39894         ESTABLISHED
unix  2      [ ACC ]     STREAM     LISTENING     16297755 /run/screen/S-mark/71.c3270A
unix  2      [ ACC ]     STREAM     LISTENING     16297819 /run/screen/S-mark/76.c3270B
bash-5.0# exit
exit

[root@vosprey2 log]# netstat -an | grep 3270 
tcp6       0      0 :::3270                 :::*                    LISTEN     
[root@vosprey2 log]#
[root@vosprey2 log]# ip netns
[root@vosprey2 log]#

[mark@phoenix mvs38j]$ netstat -an | grep 3270
tcp        0      0 192.168.1.187:48842     192.168.1.189:3270      ESTABLISHED
[mark@phoenix mvs38j]$ 

The dangers of running apps in containers as non-root

Everybody will tell you user applications should never be run as the root user, and applications in containers should also follow that rule.

There is a major issue with running applications as a non-root user in containers however.

Remember the processes lauched within a host container run as actual processes on the docker host, and as containers are supposed to be portable there is no way to guarantee that UIDs assigned to users within the container will match UIDs on the docker host.

Refer the the output shown below. Within the container the application runs under the userid ‘ircserver’ with a uid of 1000, all good right.

[root@vosprey2 log]# docker exec -ti ircserver1 /bin/bash
bash-5.0# ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 03:28 ?        00:00:00 /bin/bash /home/ircserver/marks_irc_server
ircserv+    23     1  0 03:28 ?        00:00:00 /home/ircserver/IRC/inspircd-3.3.0/run/bin/inspircd --config=/home/ircserver/IRC/inspircd-3.3.0/run/conf/inspircd.conf
root        81     1  0 03:58 ?        00:00:00 sleep 600
root        82     0  7 04:01 ?        00:00:00 /bin/bash
root        87    82  0 04:01 ?        00:00:00 ps -ef
bash-5.0# grep ircserver /etc/passwd
ircserver:x:1000:1000:Used to run the inspircd IRC server:/home/ircserver:/bin/bash
bash-5.0# 

Wrong !. Displaying the process on the docker host shows a different story. On the docker host it also runs under UID 1000 correctly, but on the docker host the ircserver user does not exist and another user is assigned uid 1000.

[root@vosprey2 log]# ps -ef | grep IRC | grep -v grep
mark     17875 17834  0 15:28 ?        00:00:00 /home/ircserver/IRC/inspircd-3.3.0/run/bin/inspircd --config=/home/ircserver/IRC/inspircd-3.3.0/run/conf/inspircd.conf
[root@vosprey2 log]# 
[root@vosprey2 log]# grep 1000 /etc/passwd
mark:x:1000:1000:Mark Dickinson:/home/mark:/bin/bash
[root@vosprey2 log]# 

The obvious major issue here is that unless all container applications are run as root they cannot be considered portable, as there is no way to ensure UIDs either match or do not exist on every docker host that may ever run the image.

Imagine again from a admin perspective trying to troubleshoot a memory hog process, and a ‘ps’ shows the issue causing process is being run by user ‘fred’ but user fred swears he never started the process; he may not have started the process, it could have been spawned from a container using a matching UID for fred. Admitedly if they were all root processes admins would still have to track down the container causing the impact; but would not have been sidetracked into wasting time chasing after fred.

Also lets not forget that ‘fred’ can at any time kill any of those processes he has been granted ownership of, causing issues with the application within the container.

Operating system dependent

Containers are not truely portable, they should be considered operating system dependent and a container build on Fedora 30 should only run on a Fedora 30 host; and the container and host should be on similar patch levels. If an OS upgrade is done all containers should be rebuilt.

Obviously this depends on the complexity of your container applications. One I have been working on requires –device mapping and overlaying the host /lib/modules directory over the container… because when the container is run ‘uname’ reports the host kernel level not the kernel version the container was build from so the container does not have the correct modules. But it is fair to say a container OS must be a very close match to the host OS in order to function.

Summary

If you are security conscious you would not consolidate applications running on KVM hosts into Docker containers.

If your environment is secure and totally locked down then docker containers can be used to consolidate applications running on KVM instances. You will get no application memory savings from moving an application (if an app needs 1Gb to run on a KVM instance it will still need 1Gb to run in a Docker container) but you will get around 750Mb OS alone overhead back from each KVM instance shutdown if you are migrating from KVM to Docker.

And of course containers start faster than booting a KVM, which I personally do not consider a selling point. If designed properly images enable copies of applications to be started in containers on multiple hosts with minimal effort, of course KVM live migration has been a thing for a long time now so thats not really a major selling point either. Being able to encapulate an entire online application in an image is useful.

Docker networking between Docker containers in a multihost docker engine environment is much hyped, and much is touted about docker swarms, so I’m sure docker internal networking works well although I have not needed to play with any of that stuff.

External networking from the docker containers is another issue. One network issue I have already highlighted above, from a troubleshooting viewpoint on the host.

There are many other networking issues I will probably cover in a later post, when I figure out how to resolve them, lets just say if your docker container needs pass-through access to external hosts outside the internal docker network, prepare for a long frustrating time.

Posted in Unix | Comments Off on Docker Isolation, and non-Isolation