A Systems Policy


Recently I talked to a couple of friends, which all wailed quite a bit about their operations or internal IT departments.

Most of these teams had to fight with some very basic things. They lacked a decent monitoring system or monitoring at all. They didn’t deploy systems, they installed it by hand. Systems where not documented etc.

So here are some guidelines, I try to aspire with my team. This is by far not a complete list of things you need to run successful operations but it should give you a fair hint about what it takes.

Also please note that you might want to adapt your own policy a bit to fit your needs. I’m coming from the web industry, but we still run our own hardware, so this might especially not fit a typical cloud based infrastructure.

Systems

A System is considered the lowest part of our infrastructure and services. All rules defined here, should be considered in all other policies.

A system….

Hardware

A piece of hardware can be anything from a big server to a small temperature sensor in your server room.

A piece of hardware…

All tools needed to open and repair any part of the system are available.

Servers

A server…

Switches

A switch

Operating Systems

An operating system (OS) is considered as everything running on a server or instance, to support a service or an application.

An Operating System…

Hostnames

Hostnames exist to identify every part of your infrastructure uniquely. They are used to refer to systems in your configurations and in discussions. You should think about a naming convention, but here are some rough guidelines.

Hostnames …

Services

A service is considered as everything running on a server’s operating system, to provide continuous functionality (e.g. a script or an application).

A service…

Networks

A network is considered any part of infrastructure, which is used to interconnect servers or systems. (Layer 1,2,3,4,…)

A Network…

Class Description
net Internet/upstream network
mgmt Management network (monitoring, remote access)
traffic Site local traffic network
backup Traffic network for backups
voip Voip Telephony network
clients A network with client workstations.
devel A network with development machines.
staging A network with staging equipment.

To round up my article, here is a example checklist we use to peer review new systems:

Example Review Checklist

Every newly deployed host or instance should undergo a peer-review process. The checklist below will provide you with a couple of base acceptance criteria and is going to ensure a certain level of quality. Give it to any other sysadmin and ask him or her to check the system, before it’s put into production.

-- Physical Host --

read more