The Operations Report Card

A. Public Facing Practices

1. Are user requests tracked via a ticket system?
2. Are "the 3 empowering policies" defined and published?
3. Does the team record monthly metrics?

B. Modern Team Practices

4. Do you have a "policy and procedure" wiki?
5. Do you have a password safe?
6. Is your team's code kept in a source code control system?
7. Does your team use a bug-tracking system for their own code?
8. In your bugs/tickets, does stability have a higher priority than new features?
9. Does your team write "design docs?"
10. Do you have a "post-mortem" process?

C. Operational Practices

11. Does each service have an OpsDoc?
12. Does each service have appropriate monitoring?
13. Do you have a pager rotation schedule?
14. Do you have separate development, QA, and production systems?
15. Do roll-outs to many machines have a "canary process?"

D. Automation Practices

16. Do you use configuration management tools like cfengine/puppet/chef?
17. Do automated administration tasks run under role accounts?
18. Do automated processes that generate e-mail only do so when they have something to say?

E. Fleet Management Processes

19. Is there a database of all machines?
20. Is OS installation automated?
21. Can you automatically patch software across your entire fleet?
22. Do you have a PC refresh policy?

F. Disaster Preparation Practices

23. Can your servers keep operating even if 1 disk dies?
24. Is the network core N+1?
25. Are your backups automated?
26. Are your disaster recovery plans tested periodically?
27. Do machines in your data center have remote power / console access?

G. Security Practices

28. Do Desktops, laptops, and servers run self-updating, silent, anti-malware software?
29. Do you have a written security policy?
30. Do you submit to periodic security audits?
31. Can a user's account be disabled on all systems in 1 hour?
32. Can you change all privileged (root) passwords in 1 hour?
  

1. Are user requests tracked via a ticket system?

This is so basic it pains me that I have to explain it.

Humans can't remember as well as computers. Expecting sysadmins to remember all user requests is the direct route to dropping requests.

Keeping requests in a database improves sharing within the team. It prevents two people from working on the same issue at the same time. It enables sysadmins to divide work amongst themselves. It enables passing a task from one person to another without losing history. It lets a sysadmin back-fill for one that is out, unavailable or on vacation.

It enables better time management. Every interruption a sysadmin receives sets his or her work back 7 minutes. A ticket system prevents interruptions from users that want to make requests or ask for status of requests. It lets sysadmins prioritize their work instead of responding to the loudest complainer.

It lets managers be better managers. Stalled requests become visible so that managers can intervene. It reveals trends and frequent requests so that they may be eliminated via new automation or process improvements. Micro-managers can get the information they want via software rather than bothering their sysadmins. It replaces "user whining" with evidence-based discussions: If a request really has taken too long the manager can properly address the issue; if the user only perceives the issue is taking a long time the evidence will shut them down. If they claim "it's been a problem for months but I only opened the ticket yesterday" then the manager has an... opportunity to educate the user about the non-existence of time-travel, mind-reading, and other supernatural powers.

It helps you identify systemic problems. I once had a boss query the ticket system and discover that 3 out of our 1,000 users opened 10% of all tickets. A little investigation and we were able to solve the fundamental issues causing this.

It helps users help themselves. Frequently when a user writes down what their problem is they realize what the solution is and no ticket is opened. When this doesn't happen, it helps them think through the problem so they can communicate it more clearly, which makes the actual transaction more efficient.

I spent the 1990s being the "radical" encouraging people to set up ticket systems. I spend the 200xs pleased to see it become accepted practice. We are well into the third decade. If you don't have a way to track user requests at this point shame on you.

 
Community Spotlight
LISA15