Interview with Madhur Kathuria

Madhur Kathuria has coached nearly 300 teams for almost 75 clients across the US, Europe, South East Asia, Malaysia and Thailand. In this interview he talks about some of the cultural challenges for agile adoption. Read it here.

Interview with Elena Yatzeck

Elena was Chief Agilist for JP Morgan Chase Treasury Services and is now a VP of Corporate Compliance Tech. Find out how JP Morgan Chase reconciles agile with compliance and risk management demands. Read it here.

Wednesday, May 16, 2018

DevOps Release Management


The purpose of release management is to ensure that the risks associated with deploying software releases are managed.
 
Waterfall software development is a phased and gated process, and so waterfall release management is also phased and implemented as a gate. Waterfall processes also seldom use much automation for testing and deployment, and so test reports tend to be manually assembled. In the end, the release management gate consists of the various stakeholders attesting that their test processes have completed satisfactorily and that all governance artifacts have been completed.
These methods work well for releases that occur a few times per year. However, organizations today are moving toward “continuous delivery”, in which releases are frequent, possibly monthly, possibly many times per day. (Amazon releases software to production every ten seconds.) Manual processes, governed by attestation meetings are too cumbersome for continuous delivery. A new approach is needed.

Managing Risk in a Continuous Manner

Continuous delivery requires that nearly all software and system tests are automated, and that all deployment processes are also automated. As such, it is expected that all of those processes are fully tested and exercised prior to production deployment. The primary means of managing risk for automated processes is that those processes are tested, and the primary metric for risk management is test coverage. This applies to all areas of risks that are amenable to automation, including functional testing, performance testing, failure mode testing, and security scanning.
 The primary metric for risk management
is test coverage.
The goal of continuous delivery is that release to production should be a “business decision” – that is, any software build that passes all of its tests should be considered releasable, and the decision whether to release (and deploy) it is therefore based only on whether stakeholders feel that the user community and other business stakeholders are ready for the new release. Risk management has been automated!
For the above process to work, the software tests must be trustworthy: that is, there must be confidence that the tests are accurate, and that they are adequate. Adequacy is normally expressed as a “coverage” metric. Accuracy is typically ensured by code review of the tests, or spot checking them, and ensuring a separation of duties so that acceptance tests for a given code feature are not written by the same people who write the application code for that feature. For very high risk code, additional methods can be used to ensure accuracy. In the end, however, tangible metrics should be used, and preferably metrics that can be measured automatically. (See the article series, Real Agile Testing in Large Organizations.)

Is Attestation Still Needed?

In a continuous delivery process, attestation is still needed, but the attestation should be on the testing process – not on the result. Specifically, risk management attestation focuses on whether the process for creating and assessing tests ensures that the tests are accurate and that they have sufficient coverage. Attestation does not occur for the releases themselves, because they arrive with too much frequency. Instead, attestation is done at a process level.

Are Gates Still Needed?

Since release to production is a business decision, humans make the decision about whether to release a given software build. In addition, there are sometimes tests that fail, or quality criteria that are not fully met, but release to production might still be justified. Therefore, for most businesses, release to production will still be governed by a gated process, even when all tests have been automated. Release to production can only be fully automated and gateless if one automates all of the production release decision criteria and places quality control on those automated decision criteria.

What About Documents?

Some things cannot be automated. For example, design documentation must be created by hand. Design documentation is important for managing the risks associated with maintainability.
The continuous delivery approach to such things is to shift assessment into the Agile sprint. As such, updating artifacts such as a design document are part of the “definition of done” for each team’s Agile stories. To manage the risk that these documents might not be updated, one embeds risk mitigation practices into the Agile team’s workflow. For example, one way to ensure that design documents are updated is to include a design review in the workflow for the team’s Agile stories. Thus, overall assessment and attestation of the risk should occur on the process – not on the set of documents that are produced. If the process is shown to be robust, then when the code passes its tests, it should be “good to go” – ready to release – assuming that releasing it makes business sense.
The surest path to unreliability is to provide
direct access to static test environments.

What About “Lower Environments”?

In many organizations that are early in their adoption of DevOps methods and therefore use static test environments, teams push code to the various test environments. That is a legacy practice that we plan to eliminate. In a DevOps approach, build artifacts (not code) are pulled into a test environment. Further, each test process is performed by a single build orchestration task (VSTS task or Jenkins job), and only that task should have the security permission required to pull into that environment. Thus, it should not be possible to push into an environment. This eliminates the need for any kind of release management for the lower environments.
Many of these issues go away once one starts to use dynamically provisioned environments. Until then, it is absolutely critical that one control updates to the various test environments, using an orchestrated process as described here. The surest path to unreliability is to provide direct access to static test environments.
-->

How Does Agile Deal With Requirements Traceability?

Feature traceability is a hold over practice from waterfall, where requirements are checked against tests. If one is using an Agile testing practice such as, say, Behavior Driven Development (BDD), then feature traceability is redundant and a great time waster. This is because in an Agile process, each Agile story has a set of acceptance criteria, and in the BDD process, each acceptance criteria has one or more associated test scenarios. Thus, there is a clear one-to-many relationship between acceptance criteria—which you could think of as requirements—and tests.

On the other hand, there is no table of this mapping, and so organizations that use traditional (waterfall) controls will often insist on a requirements traceability matrix (RTM). This is a real nuisance, because it is make-work for an Agile team and adds zero value; however, if you are forced to create an RTM, and you use BDD, there are techniques that you can use to lessen the burden. I’ll explain some below.

First of all, create a unique ID for each story (stories usually have IDs anyway, so this is easy), and also for each acceptance criteria. Then, given that you are using BDD, each story has a Gherkin feature file (that is not actually true, but assume it is for the moment): tag the feature with the story’s ID, and tag each scenario with the ID of the acceptance criteria that it pertains to.

You now have an explicit linkage between an acceptance criteria and one or more test scenarios. The challenge is to provide that linkage as a deliverable. If the linkage must be provided as a table, then you will need to either write a script to parse the feature files and assemble the table, with each table column an acceptance criteria ID and each row a Feature/Scenario ID combination. A much better situation, however, is if you can get the organization to accept a different process, whereby a test lead or business analyst reviews each story’s feature file and attests, via a comment in the Agile story tool, that the acceptance criteria are all covered by the feature file. That approach does not require creating a separate RTM.

Of course, any such process needs to accommodate a situation in which a story’s acceptance criteria or feature file change after the feature file has been attested to. Unfortunately, the tool for creating a feature file is usually a text editor (some BDD tools allow you to use a Word document), and so there is probably no workflow built into the tool. The feature file should be in the project’s source code repository, so you can create a trigger using a build tool to perform some action whenever the feature file is changed, such as posting an email to the team lead or the test lead.

Adding features over time

Another consideration is what happens over time, as features accumulate and tests accumulate. Eventually you get to a point at which a new story impacts an existing feature. The question then is, Do you modify the current feature file, or do you add a new one? When your project started, you created a feature file for each story, but now some stories are changing the features that were implemented by prior stories.

If you continue to create a new feature file for each story, you will have good traceability between a requirement and a feature file. However, it will become difficult to maintain because the programmers will have to find out which feature files are impacted by a code change. That is generally impractical, and so teams prefer to update existing feature files, and this breaks the correspondence between story and feature file. For this reason, contrary to what I said earlier, do not create a feature file for each story: instead, for each story, identify and name a set of one or more features that the story implements or changes; then, make sure there is one and only one feature file for each such feature. In each feature file, tag the feature with the IDs of the stories that affected that feature. This makes it possible to search and find features pertaining to stories and stories pertaining to features - without needing a table.

A ramification of this is that the analyst who reviews feature files will now have to sometimes review multiple feature files for a story, and so each story should state which features are impacted by the story.

Assessing coverage

One thing that is very important to acknowledge is that programmers write terrible behavioral tests. I am not talking about unit tests - I am talking about functional integration tests, which are what behavioral tests are. It is therefore essential to have a testing expert or QA person examine the feature files and assess the coverage to determine if there are enough test scenarios, covering edge conditions, error paths, etc. One test scenario for each acceptance criteria is not enough!!! Having an independent person who has a testing mindset assess the test scenarios will increase your quality enormously.

Tuesday, May 15, 2018

Is DevOps Agile?

So often I hear things like,
“The Agile methodology is different from SAFe”
or
“We are replacing our waterfall methodology with the Agile methodology”
Agile coaches will sigh at these kinds of statements, because they know that Agile is not a methodology. Those who are familiar with SAFe will sigh as well, knowing that SAFe is also not a methodology, but rather a framework for considering how to adjust one’s organization to accommodate Agile teams.

And then there is DevOps: is it a methodology? Is it an extension of Agile, or an evolution of Agile, or something different?

DevOps evolved independent of Agile because the Agile community drifted away from its technical roots, and therefore failed to keep up with technical advances such as cloud computing. Organizations needed to know how to scale Agile—not merely in terms of the number of teams, but also in terms of the complexity of systems that they could build. The scale of computing had greatly increased: companies today can have hundreds of millions or even billions of users accessing their websites: so-called “Internet scale” needed new approaches. The Agile movement began with eXtreme Programming (XP), which was highly centered on technical practices such as test-driven development (TDD) and continuous integration, but the Agile community failed to say how Internet-scale applications could be developed. Instead, the Agile community became mired in an excessive focus on process and team behavior, such as the way that teams plan their work, epitomized by Scrum.

And so DevOps arose apart from the Agile community. Initially it did not have a name: it was merely a set of solutions that organizations invented to solve real problems. Some of these solutions included (not a complete list),
    1.    Dynamic provisioning of virtual environments—aka “cloud computing”—to enable rapid integration testing and to enable new deployment approaches.
    2.    Containerization, in order to enable rapid turnaround in standing up new app instances on demand, and to provide isolation and repeatability for deployments.
    3.    Automated integration testing.
    4.    Large scale code repositories and integrating functionality across thousands of application components in real time.
    5.    Extremely thorough automated functional and non-functional testing, to enable continuous delivery and reliable and trustworthy continuous deployment, and ownership of deployment and support by dev teams.
    6.    Extensive real time monitoring of applications, to support a metrics based philosophy.
    7.    Increased knowledge of—and responsibility for—Web security by application teams.

All that time, the Agile community was busy debating whether organizations should first change their culture before trying to adopt Agile—and by “Agile” they meant Scrum, because Scrum took over the Agile community to a degree that Scrum almost became synonymous with Agile.

The rise of Scrum did great harm to the Agile movement, and indeed many have said that Agile is now broken as a result. (See my article here, and Dave Thomas's talk here.) For one thing, the availability of certification for Scrum resulted in a large number of people who had never programmed in their life obtaining an Agile certification, which they then used to get jobs as Agile coaches and Scrum Masters. Think about it: people with no programming experience telling programming teams how to do their job. It is no surprise that that did not work well. Today, there is a glut of Agile coaches, forcing down compensation; yet a very large percent of those coaches only know the Scrum process. I personally would prefer a coach who has years of experience on a programming team, as well as other leadership experience (of the servant leadership kind), perhaps even sales experience and P&L responsibility experience, because those kinds of experience make one sensitive to the real needs of a business. But the skills needed for a programming team coach is a lengthy topic in its own right.

The rise of Scrum occurred during the 2000s, with cloud computing coming onto the scene around the same time; but cloud computing was not really understood by most organizations until after 2010, and that is when the term DevOps came into being. By that time, the Agile community had finally figured out that Scrum cannot just be inserted into a large organization, but that other things also had to change to enable Scrum teams to work in that organization. The Agile community, by and large, did not understand large organizations, so its response was generally something like, “Don’t do Agile, be Agile”. However, that was not very helpful and it reflected the Agile community’s lack of knowledge of how to engage with organizations at the levels needed to make the necessary changes.

What happened next was that frameworks such as Less and SAFe came onto the scene. Less was generally supported by the Scrum community because it echoed that community’s ideology, which one could characterize as team-centric with a strong preference for self-organization and abhorrence of any kind of explicit authority. SAFe, in contrast, proposes a-lot of structure, and the the Scrum community has been very derisive of SAFe from the start. It continues to be to this day, to the extent that if someone wants to be a Scrum trainer, they are not allowed to also be a SAFe trainer—the Scrum Alliance will not certify someone who is known to be a SAFe trainer. How Agile is that? (If anyone wants to challenge my assertion, I have an actual case to back it up.)

Regardless of whether one prefers SAFe or Less or another framework, the important point is that the Agile community finally realized that its early dogma was too simplistic. It now has answers for how to “be” Agile, instead of simply saying that one should, as if it is the fault of the companies (the Agile coach’s customer) that they are not Agile—like they have a disease. Finally, the Agile community has useful answers to the question, What should we do?—other than “start doing Scrum”. It finally realized that to be something, you have to do something.

Meanwhile, the rise of DevOps took the Agile community by surprise, and now the Agile community has embraced DevOps as if DevOps is merely an extension or evolution of Agile, while the truth is that DevOps evolved completely independently out of the need to scale to Internet scale and deploy changes to Internet scale apps in rapid sequence. Fortunately, DevOps and Agile mesh together very well, because while the Agile community has chosen to focus mostly on process and behavior, DevOps practices are mostly around the technical questions of how to scale and rapidly deploy. Thus, DevOps has added a technical dimension back to the Agile viewpoint—a dimension which had been lost.