Tuesday, 7 May 2019

An introduction to Agile

In this article I will discuss how to get started with Agile in the most hands-on way possible, with no discussion of frameworks and methodologies. I believe it is important to understand the essence of Agile first, as it is easy to be overwhelmed with all of the techniques and tools that have evolved from it (Scrum, XP, SAFe, etc.).

The goal then is to create an iterative software development process that can be improved upon continuously. The only tool you are going to need is a stack of post-its and some wall space or a whiteboard where the team can work together.

Start small, which means starting at the team level. Learning to work in an Agile way will also require some experimentation as every team works differently. The point being that you will need to create some slack in the team's schedule if you want to change the way they work. Finally, you or the team lead will take on the roll of Agile Team Coach.

I should also mention that there's lots of help out there: blogs, forums and books. One excellent resource is the Agile and Lean Software Development Group on LinkedIn. Now let's get started!

Step 1: Visualisation

First the team should start by visualising the their work, this is especially true in software development which by nature is very abstract. By visualisation, I am not referring to traditional documentation which tries to capture an entire scope such as requirements or test cases. What you want to visualise here is what the team is doing right now. For this you use post-its. Every team member writes down what they are working on, big or small, together with their initials in the corner, and sticks it onto a whiteboard or wall.

Now the team have an opportunity to discuss the work, make adjustments, add or remove post-its. The team can try to group related activities for instance. Spend about 5-10 minutes at this, no more; just enough time to smooth out the rough edges.

Step 2: Create flow

The next thing the team need to consider is what the definition of "Done" is for each post-it note. By "Done" we mean that the team is finished with the work item; it could be putting software in production, writing a manual, upgrading a database, etc. In reality, a lot of teams starting out with Agile do not have a clear definition of Done for their work items, so don't sweat it too much yet. Fixing this will be part of the improvement process mentioned later on.

Create three columns on the whiteboard or wall and label them: Backlog, Doing, Done. Now each team member places each of their Post-its into one of the three columns. Finished? Great! You have now created your first Kanban board. (Here we introduce the most elementary and useful of Agile tools, the Kanban Board.)


So now the team has visualised their work and created a flow of work from left to right on the board. Congratulations!

Step 3: Reflection

The whole exercise above shouldn't take more than an hour for a team of 10 people. Stand back and take a look at it. It may be obvious that some items on the board have unclear scope and some items are very large (or small). We'll come back to these issues later.

One final exercise, sum the number of post-its in the Backlog and Doing columns and divide them by the number of members on the team. This will give you some indication of how much multitasking is going on and how much overhead is being created due to context-switching.

Step 4: Focusing on the goal

OK, the team have taken the important first steps in becoming Agile. And they will continue taking small steps, applying well-proven techniques that will improve the flow of work. But let's discuss the goal; where is the team trying to get to? In the book The Phoenix Project, Bill is inspired by Lean manufacturing techniques used on production lines. Bill's goal becomes the creation of a factory production line for his IT Department. As stated in Agile Principle #8:
Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
In other words the team shall create a process that they can reuse to build whatever software solutions the organisation or customers need now and in the future. The team can now take their new Kanban board and visualised work flow and use it to build a software factory!

Step 5: Breaking down the work

In our first iteration the team's work items had unclear scope and different sizes. The team should deal with the scope problems first. This can be solved by breaking these work items into smaller items, each with a clear definition of "Done".  Spend 1-2 hours on this step, starting with the most important work items.

A classic problem is that a work item involves input from people outside the team. The Kanban board should not contain items that are assigned to "outsiders". If this external work is a prerequisite for completing a team work item, then it should be added as a dependency to a team work item only. It is essential that the team have control over the work items on their board, even if they are currently blocked by external dependencies. The Kanban board should be used to focus on the team's work!

Sizing of work items is about creating items that are of roughly equal size. As a rule-of-thumb, a work item should take about 2-3 days to complete, up to a maximum of two weeks. There are techniques for standardising work sizes, but for now I recommend a simple consensus from the team on whether an item is large or small or somewhere in between. Remember, if the definition of "Done" is software in production, then this must include coding, testing, etc.

In the worse case, a work item is so badly scoped and sized that it may not be possible to continue working on it in its current state and some more analysis (of requirements or architecture) is needed. If work stops altogether on such items then it should be moved into the backlog. This is one of the hardest things to do in Agile, but really knowing when a work item is ready for execution is one of the great benefits Agile brings.

A clear definition of Done for each item together with creating items of roughly equal size will build team confidence. By breaking down work items into smaller chunks and visualising them on the Kanban board it becomes possible for every team member (and stakeholders!) to understand what the team is going to deliver. And getting items to Done will make everyone happy.

Step 6: Limiting Work in process (WIP)

At this point the team have broken down the work into similar size chunks and this probably means that there are many more post-its on the board. (There are many tools available for creating digital Kanban boards, but this is still a low priority for now; wait 2-4 weeks before taking that step.) What the team needs to focus on next is WIP. This exercise should take about 30-60 minutes.



Earlier the team calculated how many work items were being done per team member. Ideally, each team member should be working on one item at a time, i.e. sequentially; so for a team of 10, the number of work items in the "Doing" column would be 10. In practice the figure is higher and the team need to think about what that number is.

In Agile terms, we are talking about the team capacity. We use this figure to set a work in progress (or process) limit (WIP limit). In other words, the team cannot start a new work item until they have finished a work item that is already in progress (unless an item is blocked). Remember, the team have a clear definition of Done for every work item, so they are supposed to be able to complete them before starting something new.


WIP limits are extremely important in creating flow. It follows that if the team tries to complete 20 work items at the same time it will take twice as long as if they were working on just 10 items.

For now, there is just one column with work in progress ("Doing"). The team should try to estimate how many man-days of work is in that column. Anything more than 30-40 days (3-4 days x 10 people) worth of work should be moved to the Backlog, and this means prioritising what needs to get done first. Prioritising is the responsibility of the Product Owner or Business Manager responsible for the product being developed, so naturally they need to be involved. Agile creates visibility for both the team and stakeholders!

Step 7: Daily stand-ups

Book 10-15 minutes with the team every morning for a stand-up in front of the Kanban board to discuss the day's activities. The stand-up is for the team only, but guests can be invited on occasion. Longer discussions should be saved for break-out sessions with those involved. The focus of the stand-up is to make sure everybody knows what they are doing, if there are any blockers that need to be escalated, and to check that the Kanban board is up-to-date.

In case it's not obvious, the Kanban board has now become the most important tool the team have for organising and visualising their work. Well done!

Conclusion

The team have made great progress! They have managed to visualise their work, create flow, size their work items and limit their work in progress. This demonstrates the concept of Continuous Improvement ("Kaizan") as preached in Lean manufacturing, meaning that the team are constantly looking for ways to improve the flow of work.

In Agile we use Retrospectives to specifically discuss how well the flow of work is, well, working. All the team are involved in suggesting improvements, and then some or all of the team are responsible for implementing at least one improvement right away. Process automation (e.g. test automation) is a classic example of improving flow.

There are many, many other techniques that are used as part of Agile such as User Stories, Storyboarding, Minimum Viable Products (MVPs), backlog refinement and measuring velocity to create an iterative software development process. Scrum is a subject onto itself. But these are topics for another article.

Further reading

I highly recommend the following books:
  • The Phoenix Project by Gene Kim
  • User Story Mapping by Jeff Patton
  • Lean from the trenches by Henrik Kniberg
  • Accelerate by Nicole Forsgren, Jez Humble and Gene Kim

Wednesday, 10 April 2019

The Agile factory


In the book The Phoenix Project there is a part where Bill is discussing lead times with Wes and Patty. Even though a certain task only takes 30 seconds, the lead time is still hours due to the time spent waiting for a resource to become available (Brent). Bill draws a graph to demonstrate the problem: if a resource is fully (100%) utilised then wait times become very long.



This statement threw me for a bit until I did some research. The graph definitely nails the problem with trying to maximise utilisation of resources. I mean, it is counter-intuitive for a manager to allow resources to idle. So what is the graph actually showing?

The graph shows that when the average utilisation goes over 80-90% then wait times become very long. So over time, a very high average utilisation will cause the queue to grow very long, increasing lead times dramatically. In other words, if a resource is very busy then it cannot cope with a workload that varies. The team must be allowed some slack so that lead times are still reasonable even when the workload is sometimes higher than average. In short, there is a trade-off between workload variability and resource utilisation.

This is well-understood in the manufacturing industry, but it is intrinsically true for software development. Everything that is built in software development is a one-off, unique. This creates huge variability in the job times of the development process; no two projects are ever the same. The challenge then is to reduce this variability so that we can create a more predictable workload and push up utilisation.

In Agile we use techniques such as storyboarding, MVPs and backlog grooming to manage variability and ensure that we maintain flow. WIP limits and Velocity are our KPIs that let us know how well we are succeeding in maintaining flow. Flow refers both to managing the variability of the arrival time of work (i.e. breaking down the work into smaller deliverables) and the execution time of the job (e.g. sizing of User Stories).

The science

Back to Bill's graph. Where does it come from? It is actually based on Kingsman's formula which is from the domain of queueing theory. In layman's terms the wait time is made up of three parts:


So what Bill is actually saying is that, given a certain variability and a certain job time, then the wait time will be a function of the utilisation as shown in the graph above. Bill wants to focus on utilisation, so he normalises the other parameters (variation and job time) as follows:


For an excellent explanation of the Kingsman's formula have a look at EuroLEAN+'s Youtube tutorials.

Reducing variability

Variation is the norm in software development, so it has to be dealt with. There are several ways to mitigate variability. Reducing utilisation is one option, but we can do better than having our developers and testers idling.

Kingsman's formula shows that adding more work centres reduces sensitivity to variation (as well as increasing capacity obviously). However, this is probably more feasible in manufacturing than in software development, because a work centre (i.e. a development team) often has domain expertise, i.e. no two teams have exactly the same capabilities. But this approach may be more applicable in larger organisations.

We have already mentioned reducing variability using Agile techniques such as storyboarding, MVPs and backlog grooming, and this should be the primary focus of the team coach in creating and optimising flow.

Another option available to development teams is to have a technical backlog containing work that is lower priority that can be used to fill idle time and bring utilisation closer to 100%. The kind of tasks in this backlog should be small and independent. For example, it could involve refactoring, writing automated tests, learning about a new technology, and so on.

In summary, it is the combination of these techniques that allows development teams to be fully utilised. What Lean teaches us, is that the same discipline and structure that is used to optimise manufacturing flows, applies even more so to software development.

Tracking the team's velocity gives us insights into both utilisation and the amount of planned work vs. unplanned work. We can also track the velocity of both Stories and Epics to see how good we are at sizing our MVPs. (An Epic is always an MVP in my book; this makes it clear what the definition of Done is for an Epic).

Skipping the queue

One of the early problems Bill had to deal with was departments trying to skip the queue. This is the result of a chronic failure of the development process. If lead times become unacceptably long (due to high utilisation, high variability or both), then eventually people will try to find shortcuts. This just makes a bad problem even worse, and represents a total breakdown in the chain-of-command. That kind of short circuit has to dealt with before any other improvements have a chance of succeeding. Hence the need to start by visualising all of the work in process.

Inventory

I was almost going to write that inventory doesn't cost anything in software development, after all it is virtual. We don't have to purchase raw materials and we don't have to store anything in warehouses. (Yes, GitHub costs something, but it is a negligible cost in this context.)

But there is still inventory in software. The raw materials are just ideas, one-liners that take up virtually no space at all and until the team commits to building (analysing, developing and testing) something, the backlog can be reorganised and priorities changed as often as desired.

The rest of the inventory is in the queues between work centres, e.g. when handing over from development to test. This inventory does represent an investment in time and effort, e.g. breaking down the problem, defining an MVP and coding a solution. The cost of having this inventory is that the knowledge about the solution disappears over time; no amount of documentation can replace the shared understanding that existed when the team were working actively on the solution. Furthermore, TTM is probably the single most important factor for success nowadays. So to sum up, a lot of inventory, or WIP, is bad in software development. Watch those WIP limits!

Cycle time

Here is an example of a good article describing Lead times and Cycle times. However, the difference between the two is not very clear in my opinion. A new Initiative will contain an unknown amount of work, that's why we analyse it and break it down into reasonably sized chunks; we are reducing variability and minimising risk. A task (e.g. User story) is only added to the backlog when it is somewhat well-defined and so the Lead time for every deliverable is a reasonably well-understood and managed parameter. Otherwise, Lead times just becomes guess work and that is not so useful.


But the backlog doesn't only contain planned work, it also contains unplanned work; bugs and outages which must be dealt with immediately. This increases the Lead time for planned work in ways that can be hard to manage. While unplanned work cannot be avoided completely, it can be mitigated using a small iterative release process, i.e. continuous delivery, continuous improvements to the delivery process, as well as detective and preventative security controls.

So ideally, Lead Time only applies to tasks that are MVP-sized and, we should also have a WIP limit on new work to control Lead time. It does not make sense to fill up the backlog with Tasks that will be delivered years from now. Doing this, we achieve an understanding of the team's capacity and that long Lead times indicate the need for an increase in team capacity, the need for more teams, or a change in priorities.

My agile development team was involved in storyboarding and backlog grooming for all new tasks, not just development and testing. The team were constantly managing the flow of deliverables at all stages on the Kanban board, both when there were too few and too many tasks in a queue. So the difference between "Task created" and "Work started" was really very small, and therefore Cycle time should be uninteresting.

In queueing theory there is a formula known as Little's Law which is used to calculate the Cycle time. So does this formula still have relevance even if we are not interested in Cycle time?


The term "Cycle time" is somewhat non-intuitive. But if you think about it, the cycle time is also the average time it will take to deliver everything that's on your Kanban board right now. For example, if your WIP is 10 and your average throughput is 2 tasks/day, then your cycle time is 5 days/task. Or put another way, the team can deliver everything on the Kanban board within the next 5 days. Now that's a rather powerful statement. So Cycle time is also the Turnover time for all WIP.


The better the team get at breaking down the backlog into equal-sized chunks (i.e. minimising variability), the more relevant the turnover figure becomes.

And so if Cycle times converge with Lead times, then we are much more sure of our commitments to the business side of the organisation. Roll-on Big Room Planning!

Thursday, 7 March 2019

Integration bloat

Integration platforms create a useful abstraction layer and are a prerequisite for building a Service Oriented Architecture. The integration platform is often the domain of an "Integration team" which may reside in-house or be out-sourced.

When building new services, one of the first things that has to be done is to create the service specification, which defines how the integration platform will publish your service. For SOAP web services this is done using WSDL. The integration team is then responsible for translating messages between systems, mapping fields, etc.

In some cases the integration work involves packaging a specific functionality of an existing legacy service and publishing it as a more intuitive and lightweight service that can be more easily consumed by modern clients. If the clients are under development, then the scope for the integration team may not be 100% specified. To compensate, the integration team can include mappings that might be needed. This can result in a service that contains more functionality than is strictly necessary to create a working solution.



When end-to-end testing is performed any problems found will be fixed, but only for that portion of the new service that is actually used by the client. Furthermore, the integration team may not have tested all or indeed any of the features of the service they created, instead relying on the end-to-end testing to find problems.

The result is an integration service that fulfils the client's requirements but includes features that are untested. The integration team document the entire service but have no idea how much of the service has actually been verified to work. This creates a maintenance headache when the service must be modified.

The presence of superfluous fields is an obvious problem. A more subtle issue are fields that support specific values (like enums)  where clients use some values but not all. The service provider might allow values A,B,C,D,E,F, the integration documentation might only advertise A,B,C,D, and the client might only use A and B. In reality, the integration may allow all values if no validation is applied; however all that has been tested are A and B. Since the integration team do not have in-depth knowledge of the client behaviours, they have no alternative but to rely on their own code and documentation to understand the scope of the service.

In conclusion, once a service has been created that is too big for purpose, it is difficult if not impossible to reduce its functionality. Ideally, the service should be built up incrementally in an agile way-of-working, this ensures that the client and the integration are fully meshed. This method may not be possible with out-sourced integration teams. Another alternative is for the integration team to create a mock client that verifies the whole service even if no client actually exists that will use all of the service's functionality. This at least would enforce a cost constraint on the integration team that will hinder the creation of services that are larger than necessary. Tools such as SoapUI and Postman can be used for this purpose.