Docker is not KVM, there are major security trade-offs with a container, The key ones are shown below.
Processes are not isolated
The processes that are run by containers run for all intents and purposes as processes on the Docker host machine, this is an issue as an admin on the host machine may inadvertently cause failure of applications within a container. For example in a memory shortage situation on the host an admin may kill a process that is a memory hog on the host without realising it is spawned from a container application.
The display below is a process display on the host, every single one of the results (apart from the grep itself) are processes launced from within the container.
[root@vosprey2 log]# ps -ef | grep herc root 1203 1185 0 15:45 ? 00:00:00 /bin/bash /home/mark/hercules/tk4-minus/start_system_wrapper.sh root 1240 1203 0 15:45 ? 00:00:00 /usr/bin/su -c bash /home/mark/hercules/tk4-minus/start_system.sh > /var/tmp/hercstart.log 2>&1 mark mark 1241 1240 0 15:45 ? 00:00:00 bash -c bash /home/mark/hercules/tk4-minus/start_system.sh > /var/tmp/hercstart.log 2>&1 mark 1242 1241 0 15:45 ? 00:00:00 bash /home/mark/hercules/tk4-minus/start_system.sh mark 1244 1203 0 15:45 ? 00:00:00 SCREEN -t hercules -S hercules -p hercules -d -m hercules -f mark/marks.conf mark 1246 1244 9 15:45 pts/0 00:00:49 hercules -f mark/marks.conf mark 3321 1246 0 15:46 pts/0 00:00:00 hercules -f mark/marks.conf mark 3322 3321 0 15:46 pts/0 00:00:00 /bin/bash /home/mark/hercules/tk4-minus/mark/scripts/printer_interface.sh mark 3333 3322 0 15:46 pts/0 00:00:00 /bin/bash /home/mark/hercules/tk4-minus/mark/scripts/printer_interface.sh root 11963 10154 0 15:53 pts/0 00:00:00 grep --color=auto herc [root@vosprey2 log]#
Network connections are hidden
In a complete reverse of the above issue, network connections to applications within a Docker container are not visible on the host machine. That can make the job of diagnosing network connectivity issues difficult as the admin would normally look on the host for established tcpip sessions; but established sessions to container applications are not visible on the host.
Refer to the output below. The container application listens on port 3270 but while netstat shows no established sessions in this example a computer remote to the docker host does have an established connection via the host into the container on that port. A docker exec into the application lets us see that established connection from 192.168.1.187 and it is dangerous to not have that connection displayed on the docker host. The last command in the output below was run on the computer that established the session, it correctly shows it is connected to the docker host (189) to access the port mapped to the container even though displays on the docker host do not show the connection.
By dangerous I mean in a large environment if there are network issues to be resolved an admin cannot be expected to attach to or exec into every container to see what established connections exist on a host (assuming the container is even built to contain the netstat command), not having established connections to a host displayable on the host I would consider a serious issue. And a ‘ip netns’ shows no network namespaces are in use so I really don’t know how the connection is being hidden.
[root@vosprey2 log]# netstat -an | grep 3270 tcp6 0 0 :::3270 :::* LISTEN [root@vosprey2 log]# docker exec -ti mvs38j1 /bin/bash bash-5.0# netstat -an | grep 3270 tcp 0 0 0.0.0.0:3270 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:3270 127.0.0.1:39892 ESTABLISHED tcp 0 0 127.0.0.1:39896 127.0.0.1:3270 ESTABLISHED tcp 0 0 172.17.0.2:3270 192.168.1.187:48842 ESTABLISHED tcp 0 0 127.0.0.1:39892 127.0.0.1:3270 ESTABLISHED tcp 0 0 127.0.0.1:3270 127.0.0.1:39896 ESTABLISHED tcp 0 0 127.0.0.1:39894 127.0.0.1:3270 ESTABLISHED tcp 0 0 127.0.0.1:3270 127.0.0.1:39894 ESTABLISHED unix 2 [ ACC ] STREAM LISTENING 16297755 /run/screen/S-mark/71.c3270A unix 2 [ ACC ] STREAM LISTENING 16297819 /run/screen/S-mark/76.c3270B bash-5.0# exit exit [root@vosprey2 log]# netstat -an | grep 3270 tcp6 0 0 :::3270 :::* LISTEN [root@vosprey2 log]# [root@vosprey2 log]# ip netns [root@vosprey2 log]# [mark@phoenix mvs38j]$ netstat -an | grep 3270 tcp 0 0 192.168.1.187:48842 192.168.1.189:3270 ESTABLISHED [mark@phoenix mvs38j]$
The dangers of running apps in containers as non-root
Everybody will tell you user applications should never be run as the root user, and applications in containers should also follow that rule.
There is a major issue with running applications as a non-root user in containers however.
Remember the processes lauched within a host container run as actual processes on the docker host, and as containers are supposed to be portable there is no way to guarantee that UIDs assigned to users within the container will match UIDs on the docker host.
Refer the the output shown below. Within the container the application runs under the userid ‘ircserver’ with a uid of 1000, all good right.
[root@vosprey2 log]# docker exec -ti ircserver1 /bin/bash bash-5.0# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 03:28 ? 00:00:00 /bin/bash /home/ircserver/marks_irc_server ircserv+ 23 1 0 03:28 ? 00:00:00 /home/ircserver/IRC/inspircd-3.3.0/run/bin/inspircd --config=/home/ircserver/IRC/inspircd-3.3.0/run/conf/inspircd.conf root 81 1 0 03:58 ? 00:00:00 sleep 600 root 82 0 7 04:01 ? 00:00:00 /bin/bash root 87 82 0 04:01 ? 00:00:00 ps -ef bash-5.0# grep ircserver /etc/passwd ircserver:x:1000:1000:Used to run the inspircd IRC server:/home/ircserver:/bin/bash bash-5.0#
Wrong !. Displaying the process on the docker host shows a different story. On the docker host it also runs under UID 1000 correctly, but on the docker host the ircserver user does not exist and another user is assigned uid 1000.
[root@vosprey2 log]# ps -ef | grep IRC | grep -v grep mark 17875 17834 0 15:28 ? 00:00:00 /home/ircserver/IRC/inspircd-3.3.0/run/bin/inspircd --config=/home/ircserver/IRC/inspircd-3.3.0/run/conf/inspircd.conf [root@vosprey2 log]# [root@vosprey2 log]# grep 1000 /etc/passwd mark:x:1000:1000:Mark Dickinson:/home/mark:/bin/bash [root@vosprey2 log]#
The obvious major issue here is that unless all container applications are run as root they cannot be considered portable, as there is no way to ensure UIDs either match or do not exist on every docker host that may ever run the image.
Imagine again from a admin perspective trying to troubleshoot a memory hog process, and a ‘ps’ shows the issue causing process is being run by user ‘fred’ but user fred swears he never started the process; he may not have started the process, it could have been spawned from a container using a matching UID for fred. Admitedly if they were all root processes admins would still have to track down the container causing the impact; but would not have been sidetracked into wasting time chasing after fred.
Also lets not forget that ‘fred’ can at any time kill any of those processes he has been granted ownership of, causing issues with the application within the container.
Operating system dependent
Containers are not truely portable, they should be considered operating system dependent and a container build on Fedora 30 should only run on a Fedora 30 host; and the container and host should be on similar patch levels. If an OS upgrade is done all containers should be rebuilt.
Obviously this depends on the complexity of your container applications. One I have been working on requires –device mapping and overlaying the host /lib/modules directory over the container… because when the container is run ‘uname’ reports the host kernel level not the kernel version the container was build from so the container does not have the correct modules. But it is fair to say a container OS must be a very close match to the host OS in order to function.
Summary
If you are security conscious you would not consolidate applications running on KVM hosts into Docker containers.
If your environment is secure and totally locked down then docker containers can be used to consolidate applications running on KVM instances. You will get no application memory savings from moving an application (if an app needs 1Gb to run on a KVM instance it will still need 1Gb to run in a Docker container) but you will get around 750Mb OS alone overhead back from each KVM instance shutdown if you are migrating from KVM to Docker.
And of course containers start faster than booting a KVM, which I personally do not consider a selling point. If designed properly images enable copies of applications to be started in containers on multiple hosts with minimal effort, of course KVM live migration has been a thing for a long time now so thats not really a major selling point either. Being able to encapulate an entire online application in an image is useful.
Docker networking between Docker containers in a multihost docker engine environment is much hyped, and much is touted about docker swarms, so I’m sure docker internal networking works well although I have not needed to play with any of that stuff.
External networking from the docker containers is another issue. One network issue I have already highlighted above, from a troubleshooting viewpoint on the host.
There are many other networking issues I will probably cover in a later post, when I figure out how to resolve them, lets just say if your docker container needs pass-through access to external hosts outside the internal docker network, prepare for a long frustrating time.