Advice for Newbies

I originally wrote this as a reply to a Reddit post but as I saved it, comments were blocked.

2867374530_5feabdfbce_bGive yourself little tasks and projects to do. Think of it as being like model kit building. You start with the easy kits like a plane with just a few pieces and as you get better you pick up new things like painting, sanding, and eventually making bigger better kits.
So, start with small things. For example, write a small program with a for-loop and get to know what all if the commands are really doing. This is your basic kit. Add in some variables. Add in user input, and keep going trying new things. Eventually, challenge yourself by learning how to work with a GUI. Sometimes your program will break. This isn’t a bad thing. It teaches you how to debug. What’s important is to take your time and experiment.
The same goes for aspiring system engineers. Learn how to create a virtual machine and install Linux in it. Then learn how to create a web server and then how to get PHP and MySQL working, etc.
A computer course can teach you how to write good code or what all of the system services do that you need but what’s most important is that you don’t give up and never lose your curiosity.

Creating the Ultimate Container Playground: LXD on Kubic

Introduction

LXC (Linux Containers) are whole-system containers. They are meant to be able to do just about anything you can do with a VM with a percentage of the system resources and and a tiny startup time.

During Installation:

During installation, you can pretty much choose defaults for everything except you will need to create two additional btrfs subvolumes and if you gave your VM more than 30G of space, you will need to specify that manually because the installer will only recognize 30G by default.

Create btrfs subvolumes for:
/snap
/media

After Installation

Add the snappy repo

sudo zypper addrepo --refresh http://download.opensuse.org/repositories/system:/snappy/openSUSE_Tumbleweed/ snappy

Create the last subvolume needed for snappy

sudo btrfs subvolume create /var/lib/snapd

Install snappy

sudo transactional-update pkg install snapd

reboot

Enable and start the snapd service

sudo systemctl enable snapd && sudo systemctl start snapd

Install the LXD snap

sudo snap install lxd

Setup

Initialize LXD

lxd init (choose defaults to make life easier the first time)

Create your first LXC container. The first time you create the container, LXD will download the image. After that any new containers build from that image will start very quickly.

lxc launch images:opensuse/42.3 opensuse

Enter into your first container

lxc exec opensuse bash

Why Should I Patch My Server?

Wait, what?

Why should I patch my server?

Because you won’t get bug fixes.

But my server works just fine, if it’s not broken don’t fix it, right?

What about security fixes?

My server isn’t accessible from the internet. I’m not worried about it getting hacked.

But it’s still possible, right? Do you really want to take that chance?

Yeah, patching requires time and money. It could even require downtime. It’s more important for us to keep the server going than to worry about that stuff. We’re safe.

Over months memory was slowly drained by a rogue application that never released it. A month before, some users were experiencing slowness, but nothing too bad. The week before, a ticket was opened to with the helpdesk. It wasn’t really a major issue. It was probably their laptops, not the server. And then in happens, the ticket goes from P3 to P1. The server is down and nobody knows why. Nobody can log in. The ticket is then escalated to L2 and then to L3 support. When someone opens a remote terminal to the server, they see OOM-Killer alerts and then nothing. There is no response in the console or ssh. The server is rebooted. The applications are running again and nobody really knows what happened. They open a ticket with the OS vendor so they can tell them why their server crashed.

The logs show nothing. Journalctl shows business as usual and then a reboot. There’s no OOM-Killer alerts, no memory errors, and no kernel core-dump. There’s nothing in the application logs. The OS vendor reports that the reason is inconclusive except that they never patched the server. In the years since the server was set up, hundreds of patches were created and sent out. The issue could come from any number of places. To paraphrase Arthur Conan Doyle, in order to get to the truth, you must eliminate the impossible. The easiest way to do that is to make sure that the systems are patched against known issues so that when a new issue comes along it is easier to diagnose and repair.

If the company performed all of these patches, could this have still happened? Yes, of course. Bug fixes can only fix issue that are already found and reported. Another part of good system hygiene is just keeping an eye on the server. Of course, you don’t log into a few hundred servers at all times watching them. You set up a monitoring system like Nagios or one of a dozen others. If CPU cycles, memory or swap usage, or network lag goes up, you have a chance to do something about it before the system crashes. That’s when you want to start your investigation, not after you’ve experienced downtime. A monitoring system isn’t terribly costly, but it’s not completely free either. It takes time to set up and it takes resources to host it.

To answer the original question, you patch your server to save money in the long run against system and application crashes and against security breaches. If you don’t, you’re doing yourself a disservice and loss in revenue could ultimately cost significantly more than the time lost for patching.