Why incidents occur. What we are doing to prevent them.  

Posted by Dino in ,

I've come across an interesting video that highlights in the simplest terms possible why incidents occur: because a change has been made to the system.

The interesting segment is from the time 2:20 onwards:




They key parts from the video are where they mention:

Change is what leads to problems and incidents. The vast majority of incidents are due to someone making a change, mistakenly believing that the change is not going to lead to an outage.

Being intelligent about change will make a big difference to IT.

25% of problems come from infrastructure... the vast majority of the rest comes from changes that are wrong that have been made with the best of intentions.

The speaker goes on to tell a story about 3 programmers working together producing punchcards for a system. Taking punchcards they had, they made very careful, thought out changes. However, still things went wrong.


We have been experiencing a number of problems in the environments managed by the Operations team I lead. These environments are a series of components, software applications as well as hardware, with particular inputs and outputs. These components are connected to each other in various ways and configured through configuration files. To get a particular service playing through the TV, an engineer will need to hand craft the configuration of these series of components, ensuring the inputs, outputs and configuration of each component is correct. This is much like the story of the punchcard system that is described in the video.

The problems that we experience on our environments are almost certainly due to slight mis-configurations that are sometimes made by the engineers. As a consequence, we have started looking at automating these changes. If we are successful, we will be able to use a web interface to make the changes that are normally hand made. In doing this, we should be able to expect that the changes made by this automated system will be accurate and correct every time. This in turn should dramatically reduce the number of problems experienced due to mistaken human changes.

Read More >>

Getting the team talking through The Daily Scrum  

Posted by Dino in ,

Communication is key to a Service Desk - team members need to feel comfortable exchanging ideas about how to solve problems. However, working in a flexitime working environment, one by one the members of the team crawl into work. Some start studiously looking through their email. Others put on their headphones as they get on with their work for the day. In an apathetic environment like this, the practise of having a Daily Scrum meeting becomes the heartbeat to kicking off talking and chatter among the team for the rest of the day.

The Daily Scrum is a concept taken from Agile Scrum project management method used most frequently for software development. There are plenty of articles on the Internet that describe how it works, like here and here, as well as articles on how it does not work. In essence it is a daily round circle, stand-up meeting where each member of the team goes around in a circle and tells the team what they have been working on since the previous daily scrum, what they plan on working until the next one and any blockages they have to completing any work.

Unlike long running projects, a Service Desk environment is based on incidents so it is often difficult to predict what team members are going to be working on. However, the meeting acts as a useful way for members to synchronise with the rest of the team, giving everyone greater visibility of each others' work and providing an opportunity for them to help each other. Particularly when there are major incidents, the meeting keeps everyone focused and aware of progress.

Read More >>

A glimpse of a slick, professional team  

Posted by Dino in ,

I consider note taking to be a key behaviour of members of a world class service management desk. Note taking while investigating issues creates an audit trail that easily gives the engineer working on an issue, as well as others in the team, a trace of how something has been investigated. It allows others to be included on the investigation, allowing them to make contributions.

Without the key behaviour of note taking, the service management desk becomes prone to common problems that frustrate stakeholders of less well managed service desks:

  • Lack of visibility of issues raised by end users
  • Engineers progressing issues in isolation and difficulty in tracking the progress they have made on issues
  • Difficulty in different engineers picking up and progressing issues worked on by other engineers
  • Difficulty in work done by an engineer to be peer reviewed and retrospectively reviewed
  • Over-reliance on specific engineers for specific tasks
I am currently building a service desk team, with many members of the team inherited from elsewhere. Getting their adoption of the note taking practise has been slow to happen, but today I've started to see a glimpse of the kind of slick, professional team we are working towards: able to pass issues between engineers easily, with clear visibility to anyone interested of the technical investigation done and confidence in the capability of the team rather than individuals.

Specifically, the glimpse that I "saw" was that
  • Engineer 1 completed some work to build a new server to a very particular specification. He had recorded the details of his investigation on the ticket that was raised, #12. At first glance, the notes on the ticket seem excessive and as though not much thought had gone into them. They usually never are excessive and that there are notes always is the key, not necessarily the quality. The build of the server took almost 3 weeks to complete, between Engineer 1 working on other things.
  • Recently, almost 3 months later, a similar request, #354, came in for machine of the same specification to be built. In the past the engineer picking up the issue would have had to reinvestigate and re-determine how to build such a machine. In fact, the task of building this machine might have fallen to the same engineer who had previously worked on the issue, as that engineer might remember some of the details of what they had done 3 months previously in the previous occurrence.
  • However, because there are sufficient details on #12, a new engineer (Engineer 2) was able to pick up the new ticket, #354, and complete the work for the new server. I'm sure he sought clarification from Engineer 1 on some things, but there is enough in #12 to confidently work on this new similar issue on his own. He was also able to complete the work for ticket #354 quicker than the time taken to complete #12 – days rather than weeks. This is because he did not have to do any rework or reinvestigation done for #12.
  • This alone I thought was a great improvement in working practises…. But it gets better! Engineer 2 was away today and a further request was made on #354 by the user who logged the issue. In the past, this might have had to wait for Engineer 2 to return to work because no one would have been quite sure of what had been done. However, Engineer 2 had also made notes on #354 as he progressed the issue meaning a third engineer, Engineer 3, could respond to this and progress the issue further.
We still have some way to go for stories like this being true of every incident that we deal with. However, I think it is encouraging that we are now starting to the note taking behaviour being adopted and the benefits of this.

Read More >>

Jack Welch: Mountains Do Move  

Posted by Dino in , ,

In his book titled Winning, Jack Welch dedicates a chapter to Change, subtitling it Mountains Do Move. Welch distills his wisdom for bringing about change into four practises:

1. Attach every change to a clear purpose or goal. Change for change's sake is stupid and enervating. People need to understand in their heads and hearts the need for change. This is easiest when the reason for change is obvious, for example bad media headlines. When it is not obvious, it becomes necessary to collect data and relentlessly communicate the rationale for change.

2. Hire and promote only true believers and get-on-with-it types. To find these change agents, Welch points out to look for their characteristics: being brash, high-energy and more than a little paranoid about the future. They tend to be curious and forward looking, asking questions that start with the phrase "Why don't we...". They often invent their own change initiatives or ask to lead them.

3. Ferret out and get rid of resisters, even if their performance is satisfactory. Welch believes it is necessary to get the right people by your side and to get rid of those resisting change. These resistors usually lower the morale of those who support change and foster an underground resistance. They waste their own time in a company where they don't share in the vision - they should be encouraged to find one where they do. Even if they have a specific skill-set, they should not be held on to: they only get more die hard and their followers more entrenched over time.

4. Look at car wrecks.
Make the most of opportunities, even capitalising on other people's unpredictable disasters. It is possible to acquire resources cheaply from bankruptcies, for example.

These practises, though they may seem simplistic, do make sense. As well as point 1, behavioural change in the team should be the priority. From point 2, the person in my team who is embracing change the most is of the type Jack describes - always asking questions about improvements. Point 3 confirms a suspicion I have that if the person who is the biggest resister to change stays on my team, I am not likely to accomplish the change I need. Point 4 is definitely something that I will bear in mind.

Read More >>

Recruitment: Would you marry someone based on a one hour interview in a singles bar?  

Posted by Dino in , ,

A reoccurring theme in the new jobs that I take on seems to be hiring new people to accomplish significant goals. Seth Godwin asks, "Would you marry someone based on a one hour interview in a singles bar?", and then explains that far too often people are hired because they interviewed well - not necessarily because they can do the job well. A bad hire can be a significant cost in terms of resource and investment spent on them that proves to be fruitless. Instead of interviews, Godwin recommends actually trialing potential new hires on the job:

There are no one-on-one-sit-in-my-office-and-let’s-talk interviews. Boom, you just saved 7 hours per interview. Instead, spend those seven hours actually doing the work. Put the person on a team and have a brainstorming session, or design a widget or make some espressos together. If you want to hire a copywriter, do some copywriting. Send back some edits and see how they’re received.

If the person is really great, hire them. For a weekend. Pay them to spend another 20 hours pushing their way through something. Get them involved with the people they’ll actually be working with and find out how it goes. Not just the outcomes, but the process. Does their behavior and insight change the game for the better? If they want to be in sales, go on a sales call with them. Not a trial run, but a real one. If they want to be a rabbi, have them give a sermon or visit a hospital.

Yes, people change after you hire them. They always do. But do they change more after an unrealistic office interview or after you’ve actually watched them get in the cage and tame a lion?

Read More >>

Cultural Change as Behavioural Change  

Posted by Dino in , ,

In a previous post I discussed the need for cultural change as the necessary underpinning of any implementation of "best practise" and how I have tried thus far to overcome the resistance to change. Without some level of cultural change, the team or organisation becomes stuck in "current practise", not "best practise" and the implementation will not succeed.

In researching this area, I've come across Dr Leandro Herrero's Chalfont Project. Dr Herrero's perspectives on the area of cultural change are a contrast to other thinking, for example challenging the view that cultural change has to be slow and painful.

Dr Herrero's site has a video and a number of articles, but the highlights are:

  • Cultural change programmes concentrate on creating new mindsets and attitudes. A lot of time and effort is spent in rolling out new processes and tools. There is also a lot of communication and training, rationalising the logical need for change. In all this, an assumption is made that the new behaviours will follow to support these changes.
  • In reality, behavioural changes have to come before "cultural change". People need to be performing behaviours that are specifically "collaborating", for example, and this behaviour encouraged and spread. Once the behaviour becomes widespread, it can be considered that cultural change has taken place.
  • In this form of change programme, it is necessary to identify the key behaviours that will produce the required change. These behaviours must be reinforced and encouraged.
  • To make cultural change happen across an organisation, it is necessary to take advantage of the few people in the organisation who are connected to many people. These people need to be demonstrating and spreading the behavioural change. Dr Herrero compares this to the spread of an infectious disease, virally through a network.
  • The best thing that can happen in this kind of programme is dropping the terms "culture" and "change". People have preconceptions about these labels. Instead, people's natural tendencies to copy well regarded behaviours is used.
The content of Dr Herrero's ideas, which he calls Viral Change, make more sense to me than anything else I have come across in the area of cultural change. Perhaps what is more impressive is that I have previously seen a team undergo dramatic changes in working practises and the observations from just seeing that at a distance fit quite well with Dr Herrero's thoughts.

Read More >>

Customer Service Reviews  

Posted by Dino in ,

When I was a Support Engineer, my manager would make a point of visiting as many customers as possible to find out what they thought of the department's service. He would not just visit the senior managers at the customer, but more importantly he would want to talk to the people who used the service of the Support Department on an every day basis. At the time, when he would do all this, I would think to myself, "Are these visits necessary? Surely it is obvious whether you are providing a good service or not". Later on, when I managed the department, it became apparent that I sometimes did not have sufficient visibility of the pain experienced by the customers or even how and why they were using the service of the Support Department in a particular way.

I'm five months into my current role in a new company and I am realising that these kinds of "Service Reviews" are more important than ever. It is easy to drop into a false sense of security, because the "customers" for this Service Desk are internal. I speak to the managers and team leads of these "customers" regularly, but I am only now realising that I don't get from them the full detailed picture or understanding of the customer's requirements.

The elements I am covering in these reviews include:

  • How the customers find dealing with the Service Desk. Some of the typical feedback I have got includes things like "not being given estimates of how long things will take;this means the customer is not sure whether to get on with other work"
  • Frustrations the customers are finding with the applications/services supported by the Service Desk. Feedback in this area includes issues relating to the instability of the applications and certain reoccurring incidents in the infrastructure.
The feedback from these areas feeds into the continual improvement of the Service Desk. Ultimately, it does not matter how elegant the infrastructure is or how well things work - the real measure of your success is dependent entirely on what your customers think of you.

Read More >>