Interview with Madhur Kathuria

Madhur Kathuria has coached nearly 300 teams for almost 75 clients across the US, Europe, South East Asia, Malaysia and Thailand. In this interview he talks about some of the cultural challenges for agile adoption. Read it here.

Interview with Elena Yatzeck

Elena was Chief Agilist for JP Morgan Chase Treasury Services and is now a VP of Corporate Compliance Tech. Find out how JP Morgan Chase reconciles agile with compliance and risk management demands. Read it here.

Wednesday, May 16, 2018

DevOps Release Management


The purpose of release management is to ensure that the risks associated with deploying software releases are managed.
 
Waterfall software development is a phased and gated process, and so waterfall release management is also phased and implemented as a gate. Waterfall processes also seldom use much automation for testing and deployment, and so test reports tend to be manually assembled. In the end, the release management gate consists of the various stakeholders attesting that their test processes have completed satisfactorily and that all governance artifacts have been completed.
These methods work well for releases that occur a few times per year. However, organizations today are moving toward “continuous delivery”, in which releases are frequent, possibly monthly, possibly many times per day. (Amazon releases software to production every ten seconds.) Manual processes, governed by attestation meetings are too cumbersome for continuous delivery. A new approach is needed.

Managing Risk in a Continuous Manner

Continuous delivery requires that nearly all software and system tests are automated, and that all deployment processes are also automated. As such, it is expected that all of those processes are fully tested and exercised prior to production deployment. The primary means of managing risk for automated processes is that those processes are tested, and the primary metric for risk management is test coverage. This applies to all areas of risks that are amenable to automation, including functional testing, performance testing, failure mode testing, and security scanning.
 The primary metric for risk management
is test coverage.
The goal of continuous delivery is that release to production should be a “business decision” – that is, any software build that passes all of its tests should be considered releasable, and the decision whether to release (and deploy) it is therefore based only on whether stakeholders feel that the user community and other business stakeholders are ready for the new release. Risk management has been automated!
For the above process to work, the software tests must be trustworthy: that is, there must be confidence that the tests are accurate, and that they are adequate. Adequacy is normally expressed as a “coverage” metric. Accuracy is typically ensured by code review of the tests, or spot checking them, and ensuring a separation of duties so that acceptance tests for a given code feature are not written by the same people who write the application code for that feature. For very high risk code, additional methods can be used to ensure accuracy. In the end, however, tangible metrics should be used, and preferably metrics that can be measured automatically. (See the article series, Real Agile Testing in Large Organizations.)

Is Attestation Still Needed?

In a continuous delivery process, attestation is still needed, but the attestation should be on the testing process – not on the result. Specifically, risk management attestation focuses on whether the process for creating and assessing tests ensures that the tests are accurate and that they have sufficient coverage. Attestation does not occur for the releases themselves, because they arrive with too much frequency. Instead, attestation is done at a process level.

Are Gates Still Needed?

Since release to production is a business decision, humans make the decision about whether to release a given software build. In addition, there are sometimes tests that fail, or quality criteria that are not fully met, but release to production might still be justified. Therefore, for most businesses, release to production will still be governed by a gated process, even when all tests have been automated. Release to production can only be fully automated and gateless if one automates all of the production release decision criteria and places quality control on those automated decision criteria.

What About Documents?

Some things cannot be automated. For example, design documentation must be created by hand. Design documentation is important for managing the risks associated with maintainability.
The continuous delivery approach to such things is to shift assessment into the Agile sprint. As such, updating artifacts such as a design document are part of the “definition of done” for each team’s Agile stories. To manage the risk that these documents might not be updated, one embeds risk mitigation practices into the Agile team’s workflow. For example, one way to ensure that design documents are updated is to include a design review in the workflow for the team’s Agile stories. Thus, overall assessment and attestation of the risk should occur on the process – not on the set of documents that are produced. If the process is shown to be robust, then when the code passes its tests, it should be “good to go” – ready to release – assuming that releasing it makes business sense.
The surest path to unreliability is to provide
direct access to static test environments.

What About “Lower Environments”?

In many organizations that are early in their adoption of DevOps methods and therefore use static test environments, teams push code to the various test environments. That is a legacy practice that we plan to eliminate. In a DevOps approach, build artifacts (not code) are pulled into a test environment. Further, each test process is performed by a single build orchestration task (VSTS task or Jenkins job), and only that task should have the security permission required to pull into that environment. Thus, it should not be possible to push into an environment. This eliminates the need for any kind of release management for the lower environments.
Many of these issues go away once one starts to use dynamically provisioned environments. Until then, it is absolutely critical that one control updates to the various test environments, using an orchestrated process as described here. The surest path to unreliability is to provide direct access to static test environments.
-->

How Does Agile Deal With Requirements Traceability?

Feature traceability is a hold over practice from waterfall, where requirements are checked against tests. If one is using an Agile testing practice such as, say, Behavior Driven Development (BDD), then feature traceability is redundant and a great time waster. This is because in an Agile process, each Agile story has a set of acceptance criteria, and in the BDD process, each acceptance criteria has one or more associated test scenarios. Thus, there is a clear one-to-many relationship between acceptance criteria—which you could think of as requirements—and tests.

On the other hand, there is no table of this mapping, and so organizations that use traditional (waterfall) controls will often insist on a requirements traceability matrix (RTM). This is a real nuisance, because it is make-work for an Agile team and adds zero value; however, if you are forced to create an RTM, and you use BDD, there are techniques that you can use to lessen the burden. I’ll explain some below.

First of all, create a unique ID for each story (stories usually have IDs anyway, so this is easy), and also for each acceptance criteria. Then, given that you are using BDD, each story has a Gherkin feature file (that is not actually true, but assume it is for the moment): tag the feature with the story’s ID, and tag each scenario with the ID of the acceptance criteria that it pertains to.

You now have an explicit linkage between an acceptance criteria and one or more test scenarios. The challenge is to provide that linkage as a deliverable. If the linkage must be provided as a table, then you will need to either write a script to parse the feature files and assemble the table, with each table column an acceptance criteria ID and each row a Feature/Scenario ID combination. A much better situation, however, is if you can get the organization to accept a different process, whereby a test lead or business analyst reviews each story’s feature file and attests, via a comment in the Agile story tool, that the acceptance criteria are all covered by the feature file. That approach does not require creating a separate RTM.

Of course, any such process needs to accommodate a situation in which a story’s acceptance criteria or feature file change after the feature file has been attested to. Unfortunately, the tool for creating a feature file is usually a text editor (some BDD tools allow you to use a Word document), and so there is probably no workflow built into the tool. The feature file should be in the project’s source code repository, so you can create a trigger using a build tool to perform some action whenever the feature file is changed, such as posting an email to the team lead or the test lead.

Adding features over time

Another consideration is what happens over time, as features accumulate and tests accumulate. Eventually you get to a point at which a new story impacts an existing feature. The question then is, Do you modify the current feature file, or do you add a new one? When your project started, you created a feature file for each story, but now some stories are changing the features that were implemented by prior stories.

If you continue to create a new feature file for each story, you will have good traceability between a requirement and a feature file. However, it will become difficult to maintain because the programmers will have to find out which feature files are impacted by a code change. That is generally impractical, and so teams prefer to update existing feature files, and this breaks the correspondence between story and feature file. For this reason, contrary to what I said earlier, do not create a feature file for each story: instead, for each story, identify and name a set of one or more features that the story implements or changes; then, make sure there is one and only one feature file for each such feature. In each feature file, tag the feature with the IDs of the stories that affected that feature. This makes it possible to search and find features pertaining to stories and stories pertaining to features - without needing a table.

A ramification of this is that the analyst who reviews feature files will now have to sometimes review multiple feature files for a story, and so each story should state which features are impacted by the story.

Assessing coverage

One thing that is very important to acknowledge is that programmers write terrible behavioral tests. I am not talking about unit tests - I am talking about functional integration tests, which are what behavioral tests are. It is therefore essential to have a testing expert or QA person examine the feature files and assess the coverage to determine if there are enough test scenarios, covering edge conditions, error paths, etc. One test scenario for each acceptance criteria is not enough!!! Having an independent person who has a testing mindset assess the test scenarios will increase your quality enormously.

Tuesday, May 15, 2018

Is DevOps Agile?

So often I hear things like,
“The Agile methodology is different from SAFe”
or
“We are replacing our waterfall methodology with the Agile methodology”
Agile coaches will sigh at these kinds of statements, because they know that Agile is not a methodology. Those who are familiar with SAFe will sigh as well, knowing that SAFe is also not a methodology, but rather a framework for considering how to adjust one’s organization to accommodate Agile teams.

And then there is DevOps: is it a methodology? Is it an extension of Agile, or an evolution of Agile, or something different?

DevOps evolved independent of Agile because the Agile community drifted away from its technical roots, and therefore failed to keep up with technical advances such as cloud computing. Organizations needed to know how to scale Agile—not merely in terms of the number of teams, but also in terms of the complexity of systems that they could build. The scale of computing had greatly increased: companies today can have hundreds of millions or even billions of users accessing their websites: so-called “Internet scale” needed new approaches. The Agile movement began with eXtreme Programming (XP), which was highly centered on technical practices such as test-driven development (TDD) and continuous integration, but the Agile community failed to say how Internet-scale applications could be developed. Instead, the Agile community became mired in an excessive focus on process and team behavior, such as the way that teams plan their work, epitomized by Scrum.

And so DevOps arose apart from the Agile community. Initially it did not have a name: it was merely a set of solutions that organizations invented to solve real problems. Some of these solutions included (not a complete list),
    1.    Dynamic provisioning of virtual environments—aka “cloud computing”—to enable rapid integration testing and to enable new deployment approaches.
    2.    Containerization, in order to enable rapid turnaround in standing up new app instances on demand, and to provide isolation and repeatability for deployments.
    3.    Automated integration testing.
    4.    Large scale code repositories and integrating functionality across thousands of application components in real time.
    5.    Extremely thorough automated functional and non-functional testing, to enable continuous delivery and reliable and trustworthy continuous deployment, and ownership of deployment and support by dev teams.
    6.    Extensive real time monitoring of applications, to support a metrics based philosophy.
    7.    Increased knowledge of—and responsibility for—Web security by application teams.

All that time, the Agile community was busy debating whether organizations should first change their culture before trying to adopt Agile—and by “Agile” they meant Scrum, because Scrum took over the Agile community to a degree that Scrum almost became synonymous with Agile.

The rise of Scrum did great harm to the Agile movement, and indeed many have said that Agile is now broken as a result. (See my article here, and Dave Thomas's talk here.) For one thing, the availability of certification for Scrum resulted in a large number of people who had never programmed in their life obtaining an Agile certification, which they then used to get jobs as Agile coaches and Scrum Masters. Think about it: people with no programming experience telling programming teams how to do their job. It is no surprise that that did not work well. Today, there is a glut of Agile coaches, forcing down compensation; yet a very large percent of those coaches only know the Scrum process. I personally would prefer a coach who has years of experience on a programming team, as well as other leadership experience (of the servant leadership kind), perhaps even sales experience and P&L responsibility experience, because those kinds of experience make one sensitive to the real needs of a business. But the skills needed for a programming team coach is a lengthy topic in its own right.

The rise of Scrum occurred during the 2000s, with cloud computing coming onto the scene around the same time; but cloud computing was not really understood by most organizations until after 2010, and that is when the term DevOps came into being. By that time, the Agile community had finally figured out that Scrum cannot just be inserted into a large organization, but that other things also had to change to enable Scrum teams to work in that organization. The Agile community, by and large, did not understand large organizations, so its response was generally something like, “Don’t do Agile, be Agile”. However, that was not very helpful and it reflected the Agile community’s lack of knowledge of how to engage with organizations at the levels needed to make the necessary changes.

What happened next was that frameworks such as Less and SAFe came onto the scene. Less was generally supported by the Scrum community because it echoed that community’s ideology, which one could characterize as team-centric with a strong preference for self-organization and abhorrence of any kind of explicit authority. SAFe, in contrast, proposes a-lot of structure, and the the Scrum community has been very derisive of SAFe from the start. It continues to be to this day, to the extent that if someone wants to be a Scrum trainer, they are not allowed to also be a SAFe trainer—the Scrum Alliance will not certify someone who is known to be a SAFe trainer. How Agile is that? (If anyone wants to challenge my assertion, I have an actual case to back it up.)

Regardless of whether one prefers SAFe or Less or another framework, the important point is that the Agile community finally realized that its early dogma was too simplistic. It now has answers for how to “be” Agile, instead of simply saying that one should, as if it is the fault of the companies (the Agile coach’s customer) that they are not Agile—like they have a disease. Finally, the Agile community has useful answers to the question, What should we do?—other than “start doing Scrum”. It finally realized that to be something, you have to do something.

Meanwhile, the rise of DevOps took the Agile community by surprise, and now the Agile community has embraced DevOps as if DevOps is merely an extension or evolution of Agile, while the truth is that DevOps evolved completely independently out of the need to scale to Internet scale and deploy changes to Internet scale apps in rapid sequence. Fortunately, DevOps and Agile mesh together very well, because while the Agile community has chosen to focus mostly on process and behavior, DevOps practices are mostly around the technical questions of how to scale and rapidly deploy. Thus, DevOps has added a technical dimension back to the Agile viewpoint—a dimension which had been lost.

Wednesday, April 13, 2016

Does DevOps Change Agile?

Yes and no.

Some time back, on my first day supporting a CIO on an Agile transformation, the CIO said that he wanted to implement DevOps. In his mind, they were the same thing - that DevOps was merely the latest Agile model for how to arrange IT functions. Indeed, it is.

But it is also different. On the one hand, Agile is defined by a set of values and principles enshrined in the Agile Manifesto. In that sense, DevOps is Agile. On the other hand, Agile has become defined by a set of practices that are almost universal to Agile implementations - things such as standups, team rooms, test-driven development, a product owner, iterations, and so on. In that sense, DevOps does change Agile, in a very substantial way.

Pipeline Versus Team

Agile project culture is very team focused. DevOps does not change that, but it shifts the focus to something broader: the end-to-end "value stream", also known traditionally as a "value chain", that consists of the sequence of activities that happen from requirement inception to actual deployment of the consequent features. That value stream is best thought of as a pipeline of activities, the term being borrowed from the computer science concept of "pipelining". If drawn, a DevOps pipeline looks like a waterfall process, consisting of requirements followed by implementation, followed by various kinds of system level testing, followed by deployment. What makes is not waterfall, however, is the fact that the pipeline operates continuously. That is, any any moment, every portion of the pipeline has something in it - it is a non-stop flow.

To make a value stream work, the development team must enlarge its horizon, to consider actors beyond the team - such as enterprise security, operations, and so on. The world does not revolve around the development team: it revolves around the value pipeline, and that means there must be lots of ongoing collaboration with parties beyond the team. This is perfectly consistent with Agile values and principles, but it is a different viewpoint than is traditional for Agile in the way that it is normally practiced. For example, should those parties that are external to the development team be in the standup? Normal Agile practices do not answer that question, and there are many other similar questions that need to be answered to make DevOps work.

Behavior-Driven Development Versus Test -Driven Development

The practice of Test-Driven Development (TDD) is deeply entrenched in Agile culture. Teams that practice it are often viewed as advanced, whereas teams that do not are considered by many in the Agile community to be less advanced. Alas, TDD has come under fire, and there are many people who feel that it is not the best approach for everyone in all cases. Regardless of that, it turns out that DevOps does not need TDD for an effective pipeline. What DevOps needs is Behavior-Driven Development (BDD).

Loosely speaking, BDD consists of defining tests from a user perspective. Thus, tests are "end-to-end" because they are defined in terms of the behavior of the system - not the behavior of system components or even more granular "units". Historically it has been difficult to implement a BDD approach because one needs to assemble the entire system as an integrated unit to perform the tests. However, virtualization - the technology that has enabled cloud computing and DevOps - makes it possible - even easy - to create local instances of system components, so that one can assemble an integrated system very early. That makes BDD possible. Thus, one can use a test-first approach in which tests are defined at a behavior level, instead of a very granular unit level, as is the case with TDD.

TDD is still useful, but the focus shifts to BDD.

TDD might still be useful - that is a different discussion - but it is no longer necessary for having a high coverage automated test process. One can measure test coverage for behavioral tests just as easily as for unit level tests. In fact, DevOps teams typically use BDD as the foundation of their continuous integration test cycle: developers run the BDD tests - which are end-to-end - for a story before checking their code in. This is shown in figure 1.


Figure 1: Comparison of traditional continuous integration (CI) and how CI is often done today in a DevOps setting.


To be able to perform end-to-end tests locally before checking one's code in, developers need a local build process that instantiates an integrated system locally (or perhaps in a developer-specific cloud instance - see figure 2). Linux container technology now makes this even easier: one merely starts a container for each system component and runs one's tests - starting a container takes a fraction of a second, so there is little delay from instantiating the integrated system locally. If developers work in Linux (locally or in a cloud), they can run docker and create all the containers they need in a fraction of a second, test with those, and then destroy the containers. Indeed, testing in the cloud right from the beginning of the CI process is increasingly useful given that one of the main things that needs to be tested is the orchestration definition - which is currently tied to the type of target operating environment (e.g., AWS, Azure, GCE, etc.), although there are efforts to unify orchestration definition.


 Figure 2: Traditional development environment versus DevOps development environment.

Again, all this is perfectly consistent with Agile values and principles. What it conflicts with, however, is a longstanding Agile practice of focusing on comprehensive unit testing as the bedrock of the test automation cycle. DevOps does not preclude unit tests, which are still valuable, but it shifts the focus to test-first development using end-to-end tests, done early and continually, using locally provisioned instances of all system components instead of shared test instances or "mocks".

Testing the System Versus Testing Stories

The "user story" is the cornerstone of any Agile process. A story defines a requirement, from a user's perspective, in terms of an end result. Agile teams work against a backlog of stories, and a story is considered to be "done" when it has passed all of its tests; the tests in turn map one-to-one to a set of acceptance criteria that are attached to the story. A release of the system is considered to be "done" when all of its stories are "done".

DevOps changes the last part. A DevOps pipeline begins with stories, but as soon as work passes through the continuous integration portion of the pipeline, the idea of a story ceases to have meaning: at that point, the entire system is being tested, and a test failure might be easy to correlate to a story or it might not be. As an example, consider figure 3: suppose that a story's tests have passed when run in the continuous integration (CI) environment, but subsequently the same tests are run again "downstream" is a more production-like environment, and one of the tests fails in that environment. Is the problem with the story, or the environment? It is not clear. Also, suppose a performance test fails: such a failure might be difficult to trace to a particular story. Thus, tests performed after continuous integration are system-level tests - not story level tests. This means that the release is not "done" when all the story tests have passed - the release is "done" when all of the story tests have passed, and all of the system level tests have passed, in all of the applicable environments. A typical full DevOps pipeline is shown in figure 3.


Figure 3: A typical full DevOps pipeline.

And again, this is consistent with Agile values and principles, but it departs from traditional Agile practice.

Is DevOps Itself a Sign Of Immaturity?

Organizations that deploy continuously have fully automated processes and so one might consider whether there are even separate "development" and "operations" teams. There are. A man who used to work for me went on to become head of deployment at Amazon for several years. In a large organization, you still need someone - a team - to focus on the challenges of operations. Leaving it to each team without any oversight or centralized support is a recipe for disintegration and a "tragedy of the commons" where things that are not urgent for a team but that are urgent for the organization as a whole fall by the wayside.

Netflix has a very large "tools" team that supports all of the many different development teams. Netflix focuses on automating things, so that instead of manual monitoring they have alert systems that let the right people know when something is wrong. As Adrian Cockroft of Netflix puts it, "[dev teams] never have to have a meeting with ITops, or file a ticket asking someone from ITops to make a change to a production system", but Josh Evans is Director of Operations Engineering and he oversees the way that their hundreds of microservices are integrated, so much of operations has, in effect, been replaced by an operational architecture team: the "Ops" has shifted to defining an architecture that is highly decoupled, that alerts the right people, and that is elastic and self healing. This approach is sometimes called "NoOps" because the ops people are more engineers than operations people: they build and operate automated deployment systems that enable development teams to deploy continuously with minimal human intervention. What NoOps is not is a group of autonomous development teams that operate without any centralized operational support.

DevOps entails a considerable shift in responsibility, in that operational responsibility is now shared by development: programmers need to build systems that are easier to operate, and they need to test in production-like environments so that deployment does not reveal problems that could have been found earlier. Similarly, operations needs to focus mostly on automating operations, and needs to deploy with the same scripts and processes that are used by development. These two camps must design the pipeline together, and work together to continually improve it. The pipeline becomes their shared platform.

Some Agile Ideas That Are Even More Important Now

There is one idea that is not in the Agile Manifesto but has nevertheless been strongly embraced by the Agile community and that is even more important for DevOps: that is the idea of continuous learning.  In order to implement a DevOps cloud-based testing pipeline, one must use a-lot of tools. It is a fact that those tools turn over at a rapid rate. For example, two years ago the most important tools for provisioning environments were chef, puppet, vagrant, and a few others. Today, the entire idea of provisioning environments is in question, because containers have emerged as a better alternative to VMs, and so the need to deploy software to VMs is going away - instead, one deploys container images. Those images are built locally, and so the need for remote software provisioning has been replaced by remote container management. This all happened essentially overnight, and it means that DevOps teams are going to have to eventually replace all of their chef/puppet/vagrant code with orchestration files and dockerfiles. Surely something else will take the place of these in a few years. We are now in an era in which tools change more rapidly than ever before, and teams need to accept that they are always learning new tools. The ability and willingness to abandon one's expertise and plunge into a new one is essential. This point is highlighted in Cal Newport's bestseller, Deep Work. According to him,
"...because these technologies change rapidly, this process of mastering hard things never ends: You must be able to do it quickly, again and again." (page 30)

Summary

DevOps needs us to change how we think about Agile. It requires us to loosen our hold on established practices, and go back to first principles. The technologies that enable DevOps - virtualization, cloud services, and behavioral testing tools - empower us to do things in an even more Agile way, but that requires changing some practices that are historically equated with "Agile".

Monday, February 2, 2015

Interview: Dean Leffingwell

Cliff: Today I am talking with Dean Leffingwell, creator of the Scaled Agile Framework, commonly known as SAFe. Dean, can you please tell me a little about your background?

Dean: My degrees are in aerospace and biomedical engineering, so I see myself as a systems engineer dedicated to software. In one form or another, I’ve spent my entire 40+ years career being responsible for software and complex systems development.

Cliff: I can see the imprint of that: SAFe seems to have a systems view of things.

With regard to your origins, you created RELA and Colorado Medtech.

Dean:  In 1977, I founded RELA, and later absorbed the publicly held company, Cybermedic. The melding of those two organizations resulted in Colorado Medtech, also public. That was my first 20 years as a CEO, we built complex medical devices and other fun stuff that included one of the coolest adventure rides on the planet for a major theme park. I became a software quality methodologist by necessity because we were building systems that could literally save people’s lives, or if defective, could kill them. Since then, the focus on software quality has been a driving passion that informs everything I do. One of the reasons that I so enjoyed the transformation to Agile—especially after starting with eXtreme Programming—was the intense focus on the quality of code.

My exposure to Agile came through XP first and Scrum second, and I saw two things that I had not seen earlier: in XP a set of courageous software practices that were technically sound, and in Scrum, a simple and lightweight project management method. I thought that the two applied together were really cool, but I immediately noticed a clash between the user communities. Technically, I never really understood the basis for the competition, because to be effective you have to have both approaches. But methodologists don’t agree with other methodologists.

I’ve also been involved in Lean throughout my career. I was chairman a company that made a lean version of MRP [Material Requirements Planning]. Colorado Medtech had a manufacturing capability, so I also cut my teeth on Lean manufacturing. I attended a workshop from Goldratt on the Theory of Constrains, learning from Goldratt himself. Lean helped save a division of our company that was critical to our ultimate success.

Fast-forward to the present day where we find ourselves building the world’s most complex systems with this incredibly robust body of knowledge that includes Lean, Product Development Flow – Don Reinertsen – Agile, XP, Scrum, and Kanban. SAFe integrates and builds on that pool of knowledge to help address growing systems complexity. And after all, what software developer today should not understand all of these aspects? That is one reason that you see some of heat around SAFe; we don’t see it as a zero sum game. It’s all good. In fact, the next version of SAFe will incorporate kanban for teams, in addition to Scrum and XP, bringing optionality, balance and integration of the best of the best to the team’s choice of approach.

Cliff: As you pointed out, there seems to be quite a division in the industry. I don’t think that is healthy. Being from the same time period as you, I see all these things as different schools of thought that overlap in terms of ideas. These recent approaches do seem to complement each other. There is a long view that many people need to see.

Dean: Your website reflects the larger perspective, which is why I agreed to invest some time in this interview. I don’t usually engage in defending the framework because it speaks for itself through the website, scaledagileframework.com, and the case studies. You don’t need a methodologist’s or service provider’s opinion about it, everyone can decide for yourself. There is nothing hidden.

For example, about a year ago, a blogger was criticizing SAFe. That’s fine; there are plenty of improvements to be made. I criticize it myself. But he wrote that he could not figure out what he didn’t like about SAFe and then he realized that it was because there were no people in it. To me, that is like looking at the Eiffel Tower and seeing no iron. When you look at the Big Picture and click through to the articles behind it, you will see more people than anything else.

To simplify SAFe, I’ll share a discussion that happened just yesterday with one of the world’s largest software companies. They have some enormous challenges in a really large program—involving 400–500 developers and stakeholders from many aspects of the business—and are looking to SAFe for help. Absolutely mission critical. I said that the first thing we could do is organize, train, and empower Agile teams. Second, communicate the mission, provide some UX and architectural guidance for consistency of purpose and usage, and then let them define it, design, it, plan it, build it, validate it, and gather customer feedback in a continuous series of two-week iterations. And finally, we facilitate largely face-to-face planning, feedback, JAD and problem solving sessions for the entire program, every ten weeks or so, on a fixed cadence. How could that not work?

“There are at least three hundred thousand
Agile practitioners using SAFe today.”

When you look at it in such a simplified form, you wonder whether criticisms of SAFe are based are on its fundamental constructs, or misconceptions, or perhaps the fact we are in a competitive marketplace for thoughts and services? SAFe is clearly a disruptive change to the industry. But SAFe is built on Agile teams. Period. There are at least three hundred thousand Agile practitioners using SAFe today, who were previously locked out. They were living in waterfall SDLCs. Just last week someone told me that there was a time when they could not use the word “Agile” in their company. Of course they use it now with SAFe. For many, their organization’s very survival now depends on it. We hear things like: “You’ve given us a wee bit of hope for this company,” and, “It used to be easier to write software that didn’t work than it was to change a requirement.” We changed that with SAFe.

That kind of personal feedback, along with the business results, are what motivate us every day. When we hear controversy about the method, we look at the measurable results companies are getting, see what can be improved, and move on to the next revision. And because SAFe is a work in process, we don’t get all emotional about it, rather, we learn and adapt. If I remember correctly, there’s an agile principle that speaks to continuous improvement, so I assume that applies to methods as well.

Cliff: One of the objections seems to be that SAFe is very prescriptive.

Dean: SAFe documents a set of proven success patterns. For instance, the SAFe Big Picture depicts teams iterating and delivering value every two weeks, and every so often—8, 10, 12 weeks—they check in with their end customers and larger stakeholders to validate the net accumulation of those iterations. Then they run a larger Inspect and Adapt workshop at the Program level to address the larger issues. Is that prescriptive? Sure, but can you imagine that you shouldn’t do that?

What’s more, XP and Scrum are very prescriptive as to how the teams do their work. Clearly, we need Agile guidance for people above the Team level, because it takes more than development teams to deliver end user value. Sometimes when you have a headache, maybe you need to take an aspirin. Might help with the pain.

Cliff: I think the push back is, why tell them that it has to be a certain number of weeks?

Dean: When someone interprets SAFe so literally it usually means they haven’t taken the time to click beyond the Big Picture to learn about the actual intention. For instance, if you click on “Develop on Cadence,” you’ll find the following: “ … while the Big Picture illustrates five sprints per program increment, that is arbitrary and programs can pick whatever cadence that best suits their abilities and context”. That’s guidance around a principle, not a prescription of how many aspirins to take.

And by analogy, can you fully understand a software system by simply looking at a sketch of the domain model? Obviously that’s not enough to reason about the underlying system. So it is with SAFe. The Big Picture is the domain model, but the principles and implementation lives deeper.

Cliff: It looks prescriptive if you don’t click through and read it. But what you are saying is that there is a lot of judgment in these things.

Dean: Absolutely. It’s just a framework.

Cliff: Is SAFe hierarchical?

Dean: There has to be a top of a picture and a bottom. Where does strategy come from?

Cliff: Strategy is inherently hierarchical because it is the outermost level of intent, in terms of what an organization is trying to do.

Dean: Strategy and investment funding comes from the top. The teams don’t pay their own salaries or decide what business the enterprise should be in. I think there is also a misconception about what is meant by the SAFe core value of alignment. Is “alignment” management telling the teams what to do and how do it? Or is it guiding the mission for the program? What’s the alternative: no mission, misalignment?

Cliff: I have actually heard from some camps a rejection of alignment. The objection seems to be about bottom-up versus top-down, and about self-organization versus coherence.

Dean: Therein lies the crux of the issue. A key principle of product development flow is that overall alignment delivers more value than local optimization. To achieve that alignment, empower teams, and speed value delivery, SAFe fosters decentralized decision-making under an umbrella of common mission and some architectural governance.

For example, the group I spoke with last week has about 400–500 people working on a platform that processes billions of dollars of revenue and needs a major revitalization. They will be launching three or four Agile Release Trains with some 50 teams. Don’t you think there has to be some guidance that says, “Here is what we need to accomplish? These are the features that drive the most important behavior. These are the common UX patterns for the user navigating. Here is our view of an architecture that will hold it all together.” Do we believe that 500 people can independently and emergently arrive at a common conclusion? If you answer those questions truthfully, you’ll acknowledge that the vision for the new platform must be driven by the overarching business strategy. And there have to be people responsible for that. People who shoulder the ultimate responsibility for success of the enterprise. And yes, they tend to live at the top of the organizational chart.

Cliff: Is there a sweet spot for SAFe, or a range of organizations that it is a best fit for?

Dean: SAFe was designed in real world context at places like John Deere ISG, BMC, Navteq, Nokia Siemens Networks, etc.; places where there are 300–1,000 practitioners that need to collaborate on their work. Last week I was at the Scaling Agile for the Enterprise Consortium in Brussels. There were a couple of Agile thought leaders on stage who, when asked about scaling, basically said “Don’t do it. Don’t scale agile. Don’t get that big.” Seriously, can you imagine the response from the enterprise, “Sorry, it’s too late, we are already incredibly successful, and we are already big.”

If teams don’t need to collaborate on a common mission, that’s a different issue. For example. If there are even a large numbers of teams building largely independent products, the level of governance in SAFe may not be necessary – though a common way of working may well be. But if you are building, say, a field crop combine with many hundreds of people involved, and there is a virtual rat’s nest of complexities and dependencies—the electronics, transducers, computer systems, actuators, the control system that gets fed from GPS to move the combine straight down the field, the engine control unit, real time vehicle service information and status reporting—well, you get the idea. That’s what SAFe is designed for.

And in the IT and ISV world, can a few agile teams build a significant enterprise class product these days? Should we compartmentalize the development of a such solutions into isolated teams, or should we build a team of teams that synchronize and work toward a common mission?

“We sometimes work with applications so large that it takes
multiple instances of SAFe to support it.”

To meet that enterprise demand, all of the sudden, there is a lot of energy going into various methods for scaling Scrum, and some are public about their belief that they are competing against a “big, one-size-fits-all framework.” That’s SAFe, of course, but I have news: we sometimes work with applications so large that it takes multiple instances of SAFe to support it. SAFe fits well for 8–10 release trains working together, but beyond that you’ll have a different level of problem of scale, and you’ll need multiple instances of SAFe. That’s one of the reasons we put Strategic Themes in 3.0, as a connector to other instances of SAFe, and to the enterprise’s overall business strategy.

That makes SAFe a highly scalable framework, but it is not designed to solve the problems faced by just a few teams looking to align their sprints. It is the larger enterprises that need SAFe, many of which are in the process of a SAFe transformation. If you name almost any ten Global 1000 companies, SAFe is already being deployed in a number of them.

Like with any disruptive technology, there is an adjustment phase that will come more easily to some than others – think S-curve ‘early adopters,’ ‘early majority,’ etc. When SAFe is deployed for the first time, it can feel top-down to the Scrum coaches who are coaching the teams, because with SAFe their teams need to be aligned on a release train. Is it worth it? The results say so. And the teams quickly get into it as they realize they are empowered to contribute to the larger value. Working on a team of teams that is delivering enterprise value faster is simply more satisfying for all. SAFe is successful in the market for only one reason, it works. Check out the case studies pages [here] for the objective measures of the value of SAFe. Simply, winning is more fun.

Cliff: What got my attention when I first discovered it was the picture. It was the first picture that filled in the pieces of the puzzle.

Dean: Didn’t it make sense to you when you saw it? Although if we both looked at that very first picture now, I’d be a bit embarrassed. It looks like Fred Flintstone might have drawn it. Oh, I guess that was me. But SAFe evolves. Version 4 will be out this summer.

Cliff: It does make sense to me.

What are the reasons you would have multiple instances of SAFe? Is it because of different portfolios? Different sources of funding?


Dean: It is typically driven by the different value streams, business units, and operating budgets. In a really large business, say a 25B company, each business unit may have a few hundred million in revenue, and each business unit will invest a percent of that in IT and software development. But because of the organizational challenges of managing large numbers of practitioners, and because many are working in largely separate domains, they tend to naturally fall into pockets of 300-500-1,000 people, each with an instance of SAFe.

Cliff: Do you end up with a SAFe steering committee that oversees the multiple SAFe instances?

Dean: Well that’s up to the enterprise architecture and enterprise portfolio strategy, currently a bit outside the scope of SAFe. And in any case, they wouldn’t be steering SAFe instances so much as they would be defining and coordinating portfolio investments that business units use SAFe to realize.

The root cause of much of this debate comes down to a discussion about who decides what gets built. If 3-5 teams are working together in a domain they know, can they largely determine together what gets built? Probably. Would 100 development teams in a global healthcare company be the ones to decide if the company should enter a different market, and provision teams to address that opportunity? Of course not. Strategy and investment funding is a centralized concern.

Cliff: Teams are not always aware of the long discussions and analysis that take place before that point.

Dean: And perhaps we leaders cause some of that lack of visibility when we fail to have a systems, rather than a parochial or functional view – when we mandate waterfall SDLCs, when we fail to communicate a clear strategy and compelling mission, and worst of all, when we overload the teams with unrealistic and unachievable commitments. That’s why the SAFe model depends on Lean-Agile leaders, and the emphasis on taking a systems view, implementing flow, and empowering teams by constantly communicating vision and strategy.

Cliff: One hears a lot in the Agile community about culture, and being Agile versus doing Agile: What role does mindset of leadership style play in a SAFe implementation?

Dean: It plays a huge role. [Dean calls up the SAFe website and clicks on “Implementing”.]  Look at steps 2 and 3 of a SAFe rollout. [They are, “Train All Executives, Managers, and Leaders,” and “Train Teams and Launch Agile Release Trains,” respectively.] Let’s start with Executives. In a SAFe rollout, the process is as follows: Over on the left here [he points to “Train Lean-Agile change agents”], everyone needs to understand the principles behind SAFe. This is what the process of building that mindset looks like: We train change agents (SPCs, both internal and external) to teach “Leading SAFe,” a course that introduces managers and executives to Lean and Agile thinking. Those leaders participate in a release planning simulation and do an exercise sprint, right in the classroom. They study the Agile Manifesto. They learn about Lean and Product Development Flow. Then they learn about Agile Teams, Agile Release Trains and how to implement an agile portfolio. The last two hours is a leadership module. We finish there, because if they are not ready to lead, rather than follow, success will be limited. Then the teams are trained in SAFe Scrum and XP, and organized around value streams that can more reliably deliver value on demand.

Is there a new mindset required to be
successful with SAFe? Yes.

Is there a new mindset required to be successful with SAFe? Yes. Do we achieve it? Almost always. Are we dependent upon it? Absolutely. How else would companies get the results they are getting? But you can’t see mindsets in the SAFe diagram, you gotta click!

Cliff: The site is pretty rich, there is a lot of stuff. How does servant leadership fit into this?

Dean: We use “Lean-Agile Leadership” as our metaphor, which emphasizes taking a systems view, embracing the Agile Manifesto, product development flow, creating a learning organization, and enabling knowledge workers. Operating as servant leaders is part of that.

Cliff: There seems to be a misunderstanding in the industry about some core Agile concepts, such as what servant leadership is. Books I have read on servant leadership—and indeed ways in which I have experienced effective servant leadership—stipulate that servant leadership is not about totally leaving things up to the team: it is a style of leadership.

Dean: It is indeed a style of leadership. It is not passive; it’s supportive, but it still has responsibility for outcomes. Managers do not abrogate their responsibility, or indeed their authority, just because we better understand the power, and indeed humanity, of self-organization and empowerment. We must have both, leaders who lead, and teams and programs that are largely self-organizing and self-managing.

We are now are dealing with hundreds of thousands of practitioners who have embraced a new method, indeed a better way, more empowering, more fulfilling, and potentially far more effective, a new belief system. But it is a huge danger to exclude or belittle management. Perhaps this is because they haven’t been managed well in the past; and yes, we still see plenty of that. Perhaps they assume managers cannot learn new behavior; and perhaps some cannot. But we also see the opposite, an emergence of a new form of leadership. One that is based on common principles. We see that every day too, and that inspires us to keep moving forward.

That includes new ways of planning work, which SAFe provides via large scale face-to-face planning. It’s absolutely key to what we do, and we take that to a level where some have said, “Well it’s not Agile to have 100 people planning together.” But face-to-face communication is a key tenet of Agile. For example, look at the group in the photo on our website [here]; I know that group. They get as many as 175 people “together” every ten weeks, in multiple locations, and they plan simultaneously. Every ten weeks they pull together folks from the US, India, and Serbia, they bring the business owners in to participate. See that table in the middle? Those are the business executives. Every ten weeks they spend a part of two days with the teams. Frankly, it’s exhilarating. There is nothing like it, and if you read Lyssa Adkins’ article in InfoQ [here], she attended one and noted that she had never seen anything like it. She called it an “agile accelerant.”

Cliff: That was quite an inspiring article. She talks about the arc of her thinking on it, and how it changed as she experienced SAFe.

With regard to the release planning meeting, how flexible is that. Is it always a two-day session?

Dean: Two days is very standard, but it depends a bit on scope. It isn’t just planning and alignment, its a joint requirements and design session. The group in the photo takes two and a half days, because they have a lot of folks in Mumbai with a 12.5 hour time delay.

Cliff: Does it ever take longer? When I have done release planning with teams, it can take a week.

Dean: It doesn’t take that long with SAFe. You are planning only the next Program Increment; you will do it again in about 10 weeks. Take a small bite. Limit the batch size. One big international company plans across five trains at the same time—and they still do it in 2.5 days or so, but they plan together because they are interdependent. This is one aspect of SAFe that is prescriptive: you plan together every PI. Face to face communication is part of the Manifesto. You aren’t SAFe without it.

Cliff: How much preparation is needed?

Dean: It depends on whether it’s your first time or tenth. Your first time can be more challenging. Alignment has to occur in management, development, system design, operations, etc. and it might not be present prior to the meeting. Some upfront preparation is going to be required.

For example, I was recently talking to a company that had been worried about their first big room planning experience. They were afraid that the first session would be chaotic because they were obviously not fully ready—you never really are—and people were wavering in their support. What would happen if they met for two days and nothing useful came of it? Well, the CIO was a Lean-Leader. He said, “this is going to be a really critical learning experience for us. It’s just us, so what can possibly go wrong, really? Let’s get going with that first meeting.”

Because of his leadership and what took place in release planning, they now have a common way of working, they have an aligned view amongst the executives and teams, they eliminated much of the excess work in process. You cannot underestimate the value of finding and addressing program level bottlenecks, identifying otherwise hidden dependencies, finding the way to flow, and navigating competing priorities.

Cliff: How do you deal with the organizational structure issues? These organizations must have existing functions for QA and Testing and release management and so on.

Dean: The release train is usually virtual, at least initially. Just a group of the right people who agree to plan together, commit together, execute together and inspect and adapt. A business is not going to close down the business unit and merge with IT. The DevOps IT/OM group is still going to have a director who runs operations and deployment. But they have to operate as an extended team to accomplish the mission. [Dean points to this page, and “Finding the Value Stream.”].

Cliff: So it is kind of a matrix?

Dean: It is. It typically starts as a virtual organization. In some cases—the easier ones—they are already  organized on lines of business.

Cliff: Is that a natural path?

Dean: In some cases. Lets say an automotive components supplier has four BUs building four product lines. Most of the business people, devs, testers, architects, etc. are all in the BU. That’s a pretty straightforward value stream, and the virtual and the physical organization are basically the same.

But if someone is implementing single sign-on across a suite of products, or trying to improve the supply chain, you’ll have to bring people in from a number of areas and different platforms.  And you can’t just create a new organization for the purpose of this large-scale initiative, even if it’s long lived. So release trains are often virtual.

Cliff: How does this compare to, say, Spotify?

Dean: I’ve talked to them a bit, and I’ve followed the method. We have even discussed SAFe for scaling further. If you look at their organizational metaphor, they have Squads, which are 5-10 individuals and a product owner, pretty much the same as SAFe agile teams. They have Tribes, organized groups of Squads that deliver largely independent solution value of a type. There are up to 100 or so people in a tribe—a fairly natural social limit, a direct parallel to a SAFe Agile Release Train. As I understand it, Guilds are basically communities of practice: they advance skill development, which we don’t model in SAFe. At Agile Israel 2014, Spotify’s Head of People Operations gave a talk about their self-organizing and self-managing Squads, and he showed some of what happened initially. Lots of fast success stories, for sure. Then he showed the initial UIs for Android, IOS, Windows, etc. They looked like they were built by different teams, because of course, they were. He noted that was suboptimum for the user experience, and then described how they went back and reworked those apps with a common UX governance, so the user experience was largely the same across devices.

We are all learning the same lessons by doing Agile at scale. Is great design emergent or intentional? Both! I have a great respect for those guys. I don’t see it as a competitive method; I see it as a different set of labels for accomplishing the same thing. We absolutely support communities of practice and what they are able to accomplish, but, for the time being, they are outside the scope of SAFe. Besides, if we added them, we’d be too prescriptive :)

Cliff: Have you ever had any organizations struggle in getting SAFe going, and if so, what are some of the suggestions that you have to avoid that?

Dean: We have a very simple mantra: if you train everyone and launch Agile Release Trains, you will succeed. If execs, business owners, architects and product managers think that Agile is just a process for developers, it’s not going to work. Everybody has to understand what they are doing and what everyone’s role is. We believe that the success of the initiative is ultimately dependent, not on the framework, not on the consultants and coaches, or not even solely on the teams—it also depends on leadership. If leadership is trained in Lean-Agile thinking, people accomplish great things with SAFe.

Cliff: Is SAFe “Lean Systems Engineering” (LSE) already out, or is that coming out?

Dean: It’s well under way. A lot of the content has been developed. For example, the principles of Lean Systems Engineering are built in explicitly, whereas they are a little more hidden in SAFe. We have defined most of the core concepts, what’s a system, a system of systems, how to express systems intent without over-specifying, adaptive requirements and design, set-based development, MBSE, kanban for teams and systems work, etc. We met with about 20 systems engineers yesterday in a feedback session, and they showed us some things we need to change. It will be available to the public sometime soon, but we have not fixed the date, because we are not sure when we will reach an MVP. But I think we are over half way there.

SAFe is designed for large-scale software solutions—
banking, financial, insurance, ISVs, etc.

Cliff: Why would someone use SAFe versus SAFe LSE?

Dean: SAFe is designed for large-scale software solutions—banking, financial, insurance, ISVs, etc. But if you are building a satellite, where you have the satellite itself, the ground station, the web farm feeding data to the users, then that is really a system of systems, and you have to understand how the subsystems are built and interact. How one system may impose requirements on another, and how capabilities span subsystems. It’s a different problem, the systems and subsystems and their interfaces are physical and tangible, and the notion of value streams is not necessary the right abstraction at the highest level.

The large systems builders—industrial, defense, automotive, home automation, and such—come to class to learn about SAFe and how to best to apply it to their context. But they also note that “We don’t really have a Portfolio level concern here, it’s just one really big system.” We are learning from them how to model things differently. Systems, subsystems, components, capabilities, and features all play a role.

Cliff: And you have hardware-in-the-loop testing.

Dean: You still design with fast iterations and integrations. You build in small batch sizes. But you also probably have IV&V teams who may be the only ones that can put the whole thing together and test it. You have supplier subsystems and internal programs that may or may not be using Agile. You might have a customer that says “here is the system and software requirements specification, do it like it says.” You have delivery milestones for certain. You have a whole different set of constraints. It is not as free form as it is in SAFe, but you still want the benefits of a Lean and Agile approach. That’s the challenge of the modern systems builder and we think we can help with SAFe LSE.

Cliff: Is there anything that you would like to mention about what people can look forward to?

Dean: You can look forward to the continuing evolution of SAFe. The next release, SAFe 4.0, will be out this summer. It will include a number of new constructs and content elements. For instance, it will integrate kanban guidance for Teams, alongside Scrum, so SAFe teams have a clearer choice of methods, and can even combine them as they see fit.

And by the way, we are not so sure how well that news will be received in the Scrum and Kanban communities, but we think the teams deserve that choice. Let’s say you are adopting SAFe at a major systems builder, and you have a small group of 3-5 optics engineers, do they need a Scrum Master and Product Owner? Not clear. Do they need to visualize work and understand flow? Do they need to integrate with the rest of the system every two weeks? Absolutely.

But personally, I look forward to SAFe Version 8.0! That should be awesome. Some of the people in my SPC class asked me why I don’t just call version 4 version 8, but we all know that would be cheating. And of course, we have SAFe LSE 1.0 that will launch sometime this year, along with companion courseware.

Are there going to be multiple ways to scale, be it Scrum and others? Of course. Are we learning new and better ways to deliver bigger and better systems more quickly? I sure hope so. As of now, we are on Version 3.0 of SAFe, and it works. It has a large footprint of customers experiencing success, and a global community of consultants, partners, and practitioners implementing it, supporting it, and telling us how to improve it. We’ll keep listening and evolving. That’s a pretty good launching point for a next set of innovations.

Saturday, January 10, 2015

Cloud based apps are extremely vulnerable - here's what to do

And the Two Design Patterns That All Developers Should Know


15 per cent of business cloud users have been hacked.

That is according to a recent Netskope report (article here).

Recent debates about whether cloud storage are secure have focusd on the infrastructure of the cloud: that is, if your data is in the cloud, can other cloud users see it? Can the cloud provider see it?

But there has been little attention to the event more important issue of whether cloud apps themselves are secure. This is so important because if your data is in the cloud, then it is not behind your company’s firewall – it is accessible over the Internet. All one needs is the password. So if you think things were bad before, just wait until hackers shift their focus to the cloud.

Executives think that IT staff know how to
write secure software applications
– but most don’t.

Companies spend huge amounts of money trying to make their infrastructure secure, but they invest essentially nothing in making sure that their application code itself is secure. As I wrote in my book High-Assurance Design,
•    The average programmer is woefully untrained in basic principles related to reliability and security.

•    The tools available to programmers are woefully inadequate to expect that the average programmer can produce reliable and secure applications.

•    Organizations that procure applications are woefully unaware of this state of affairs, and take far too much for granted with regard to security and reliability.

The last bullet is the most important one: Executives of companies think that IT staff know how to write secure software applications – that to do otherwise would be unethical, and their staff are definitely not unethical. But this attitude is the heart of the problem, because the fact is, most software developers – and even most senior software architects – know very little about how to write secure software. Security just isn’t that interesting to most programmers: no one rewards you for writing secure code – not like you get rewarded for writing more features. And no one is asking for it, because there is an assumption – an incorrect one – that programmers create secure code in the course of their work, just as plumbers create well sealed pipes in the course of plumbing. True for plumbers, in general, but not true for programmers.

Recently I was on a DevOps team in which the client was very concerned about security. The client ran its own scans of our servers in the cloud, and found many issues that needed to be fixed. All of these issues were infrastructure related: primarily OS hardening. None had to do with the design of the application. The general feeling of the team was that the security of the application itself would not be questioned, so we did not have to worry about it. At one point, one of our databases in our cloud test environment was hacked. The database was shut down and a forensic analysis was supposedly performed (we were never told what they found). There was no impact on the team’s work – it was business as usual.

If we don’t fix this dysfunction in our industry,
then the Internet Of Things (IOT) will be a disaster.

This state of affairs is unsustainable. If we don’t fix this deep rooted dysfunction in our industry, then the Internet Of Things (IOT) will be a disaster: Imagine having every device you own connected to the Internet – to a cloud service of some kind – and all of these devices and accounts hackable. And imagine the continuous software updates to keep pace with newly discovered security vulnerabilities. This is not a future that I want – do you? Not only is George Jetson’s car a pain to maintain – with constant software updates – but it might come crashing down. People will be afraid to drive their car or use these IOT devices.

The only way to fix this is for organizations to demand that developers learn how to write secure software. You cannot scan for application level security: doing so is not effective. Having a “security officer” oversee things is not effective either – not unless that person intends to inspect every line of code written by every programmer – and that is not feasible in an Agile setting, where the code changes every day. The only way to produce secure software in an Agile environment is to for the programmers to know how to do it.

It is not that there are not lots of resources available for this, my own textbook included. There are tons of books, there are online resources – notably OWASP – and there are even certifications – and these certifications are the real deal: these are not fluff courses.

People like magic bullets. Unfortunately, there is no magic bullet for security: knowledge is the only path. But if I were asked what two things software developers should know to make their code more secure, I would have to say that they should know about these two design patterns: (1) Compartmentalization, and (2) Privilege Separation.

Your systems will be hacked. The only question is,
What will the hackers get away with?

Your systems will be hacked. There is no question about that. The only question is, What will the hackers get away with? Will they be discovered right away through intrusion detection monitoring and shut down? And if not, will they be able to retrieve an entire database of information – all of your customers’ personal data? That is, will one compromised account enable them to pull down an entire complete set of information?

Compartmentalization is an old concept: In the context of computers, it was first formalized by the Bell LaPadula model for security. It became the basis for security in early military computer systems, and it formalizes the essential concept used by the military and intelligence communities for protecting sensitive information. It is based on the concept that a person requesting access to information should have (A) sufficient trust level – i.e., they have been vetted with a defined level of thoroughness – and (B) a need to know: that is, they have a legitimate reason for accessing the information. No one – not even the most senior and trusted person – can automatically have access to everything: they must have a need to know. Thus, if someone needs information, you don’t open the whole filing cabinet: you open only those files that they have an immediate need for. To open others, you have to request permission for those.

Military computing systems are onerous to use because of the layers of security, but in a civilian setting for business applications there are ways to adopt the basic model but make parts of the process automatic. For example, restrict the amount of information that an individual can access in one request: don’t allow someone to download an entire database – regardless what level of access they have. And if they start issuing a-lot of requests – more than you would expect based on their job function – then trigger an alarm. Note that to implement this type of policy, you have to design the application accordingly: this type of security is not something that you can bolt on, because it requires designing the user’s application in such a way that they only access what they need for each transaction and are not given access to everything “in that file cabinet”.

The other key concept that programmers need to know is “privilege separation”. No one should be able to access a large set – e.g., a table – of sensitive data directly: instead, they should have to access a software service that does it for them. For example, if a user needs to examine a table to find out which rows of the table meet a set of criteria, the user should not be able to access or peruse the table directly: the user should only be able to initiate the filter action and receive the single result. The filter action is a software service that performs the required action under the privileged account of the server – which the user does not have access to. The user performs his or her work using an account that is only able to initiate the software service. If the user’s account is obtained through a phishing attack, that account cannot be used to obtain the raw data in the database: retrieving the entire table would require a huge number of calls to the service and intrusion monitoring should be watching for abnormal use such as that. This does not prevent hacking, but it greatly limits what can be lost when a hack occurs.

These measures are not sufficient, but they are a start, and they provide a foundation for how to think about application level security, from which programmers can learn more. The key is to start with an access model based on the kinds of actions that users need to perform and the subsets of data that they need direct access to for each transaction – access is not simply based on their overall level of trust or general need to access an entire class of data.

Organizations are completely to blame for the current state of affairs – and organizations can fix it.

Organizations are completely to blame for the current state of affairs: If organizations demand that programmers know how to write secure code, then programmers will respond. People are merely focusing on what their bosses are telling them is important.

So if you are an executive in an IT organization, it is up to you. The industry will not fix things: You need to make security a priority. You need to tell your teams that you expect them to learn how to write secure code. You need to create incentives for programmers and software architects to become knowledgeable and even certified in secure coding. You need to create a culture that values security. Security is up to you.

Saturday, January 3, 2015

Real Agile Testing, In Large Organizations – Part 4

(Continued from Part 3)

Is everyone a tester?

This is one of the greatest debates about Agile testing: who does it? One camp claims that everyone on an Agile team is a tester: there should be no “tester” role. The Scrum camp is perhaps most adamant about this. Another camp claims that there are testers, and that the separate role is very important.

Again, the right answer depends. Even Jeff Sutherland – the inventor of Scrum – has complimented the performance of projects that had separate test teams. This one stands out. So if it is ok with Dr. Sutherland, it should be ok for Scrum adherents. The question is, when does it make sense, and when does it not make sense?

For Morticia’s website (see Part 1), Thing did most of the testing but we all chipped in, and that was fine. But for the EFT Management Portal, the testing is so complex that it would sure make sense to have a test lead, and several people to focus on pulling together the performance testing, the security testing strategy, the testing of the multiple back end partner interfaces, the testing of the legal compliance rules, and so on. These things are like projects in their own right and so they need leads. But don’t forget about learning: some people might want to learn about new types of testing and test automation, even if they have not done it before, so allow team members to change roles with appropriate supervision (e.g., through pairing).

Saying “You should never have a test lead” is not very Agile,
and saying “You should always have a test lead”
is not very Agile either.

If unsure, use common sense: don’t use doctrine. Agile is first and foremost about applying judgment: that is why the Agile Manifesto is written the way it is, with phrases like “While there is value in the items on the right, we value the items on the left more.” In other words, it is not prescriptive. They wanted us to keep our thinking flexible: e.g., saying “You should never have a test lead” is not very Agile, and saying “You should always have a test lead” is not very Agile either.

Who needs to understand these things

One of the most important aspects of Agile in general is the conversations among the team – the exchange of ideas, the helping each other, and the talking through of issues. This is critically important as it relates to testing and understanding when we have met the needs of the end users. Ensuring that the team understands – not just reads – the testing strategy, is critical, so that the testing strategies are ever present in the minds of the developers. Developers need to think about testability as they design and code, and they need to be thinking about failure modes and how those are going to be tested, because developers often identify situations that that testers have overlooked. Communicating testing concerns across team roles is extremely important.

In Part 1 of this article we pointed out that an Agile test strategy is developed and maintained by the team, in collaboration with external stakeholders. The team leader(s) (e.g., Scrum Master, coach, project manager, tech lead, etc. – however the team(s) is/are constituted) need(s) to understand what an Agile test strategy is for, so that it is accounted for during iteration planning. If the organization has support functions such as Security, Testing, Architecture, etc., those support functions need to understand how an Agile test strategy is different from a traditional test plan: the collaborative nature of Agile testing, the need to test continually, the need to automate as much as possible, and the need to evolve the testing strategies as the team learns more about the application and its requirements. The support function managers need to know all this so that they can make sure that they provide staff to collaborate with the team in the initial development of the testing strategies and throughout software development.

The support functions will need to come to terms with the fact that their role changes significantly with respect to waterfall development: waterfall teams obtain services from support functions, services such as Testing, Architecture, etc., and those services operate largely independently. In an Agile setting, the support functions need to operate in a collaborative way, working side by side with the team. In fact, much of their work should shift from “doing” to “teaching” – i.e., the support functions need to coach the team in how to perform the things that the support function used to do – to the extent that that is practical. Thus, support functions become coaching centers and resource centers. (In the article How To Rapidly Infuse Technical Practices Into Your Agile Teams we talk about how to transition waterfall oriented support functions to Agile support functions.)

Agile coaches need to work with the various support functions to help them to think through these changes. Agile will impact the types of people who work in the various support functions – they need to be more people-oriented, with an interest in helping others instead of doing the work themselves. The suppot function staff will also have to learn about Agile practices and automation tools. Agile will impact how the support functions are measured by senior management: they will need to be measured on how effectively they help teams to become self sufficient in technical practices, and the support functions also need to be measured in terms of whether they stay current in the rapidly evolving landscape of Agile tools. Given these changes, Agile will therefore impact funding for these functions. It will shift the balance of power in the organization, and that is why the CIO needs to be the driver for these discussions. In a successful Agile transformation, the support functions are not eliminated: they are transformed and reorganized. Knowledge must increase – not decrease – and to make continual learning a sustainable practice, it really helps to have organizational functions that focus on helping practitioners – the teams – to continue to learn new things in an endless cycle of learning, doing, and improving.

Creating a learning organization

In a discussion thread in the LinkedIn group Agile and Lean Software Development, Claes Jonsson, a Continuous Deployment architect at TPG Objektfabriken in Sweden, asked this pertinent question:
How is [assurance achieved] in an organization that is committed to delivering the right thing, with extremely high quality, minimal waste and with the shortest possible time to market using Lean Startup principles and Continuous Release practices? And do note that this does NOT mean unstructured, or disorganized, it instead relies on high organizational alignment, and extreme discipline.

The only way for an organization to preserve assurance – that is, manage risk – while becoming more Agile is to elevate people's knowledge. E.g., consider security: the process of having "gates" for security review is antithetical to Agile because gates impose risk management at discrete points instead of integrating it into the development process itself. But if you teach developers how to write secure code, then you don't need the gates anymore! The same thing applies to other areas of assurance.

But wait: I am using a little bit of hyperbole here: gates are a form of oversight, and it is not really that you don’t need any kind of oversight – you do – but it takes a different form. In Part 2 of this article we talked about how the concept of test coverage, and the role that a quality assurance (QA) function might play in an Agile setting. We explained that an Agile form of QA is still independent, but that it works alongside a development team – not as a gated phase. Again, Morticia’s website probably doesn’t need such a setup, but the EFT management portal probably does: you need to make a judgment about how much independent quality oversight is necessary to properly manage all of the risks in your project or program. Agile is about transparently and collaboratively making those kinds of judgments – not blindly following a plan or procedure.

Many large organizations utilize gated software development processes due to historical reasons. These “gates” typically include a review phase for things such as security and regulatory compliance adherence. What we find when working with these organizations – and this is different from small companies and startups – is that the gates are used in lieu of conversations about how the software will meet compliance requirements: i.e., the transparent and collaborative discussions about what process to use for the project do not take place. By shifting to a culture of learning through conversations, gated processes can eventually be reduced to a few minimal stages or eliminated entirely.

There is an artificial comfort in gated processes.

There is an artificial comfort in gated processes: one feels secure because the gates are in place, but the comfort is naïve because the gates do not address the underlying reason the gates were created in the first place: that those who are building systems either do not know how to implement the compliance and risk management requirements, or they are not testing sufficiently for these things. Learning organizations move past this dilemma by ensuring that there is a much broader understanding of the requirements and how to test for them.

Agile transformation is really about systematically creating a learning organization. You have to identify the things that people need to know, and make sure that there are people who know those things embedded in the development process, by creating a system for people to learn and share that knowledge (here is one approach). Ideally, everyone knows everything, but that is not practical, so there is a balance that needs to be achieved between specialists and generalists. But all need to be involved in real time or near real time.

The chart at the end of this article lists some of the things that each part of the IT organization will need to learn. As you can see, it is a-lot, and that is why learning – not process re-engineering – is the “long pole in the tent” for Agile transformation. Learning is one of the first steps on the long road of changing to an Agile culture.

Conclusions

Morticia’s website and the EFT Management Portal are two extremes – as business systems go. Most business applications are somewhere inbetween. That is why there is no single answer to how one should plan and execute testing under Agile.

Trust the team – that works fine for Morticia’s website. Top down planning – quite a bit of that is needed for the EFT Management Portal, although we try to approach it in an Agile way by putting the team in the driver’s seat, by keeping the documentation light, by allowing things to evolve, to a degree, by doing testing repeatedly and with as much automation as possible, and by implementing a learning strategy built around coaching to help teams to learn what they need to know to address all of the things that need to be tested.

In the case of the EFT Management Portal we also found that external parties demand – have a right to – oversight, and their risk management team will want to talk to our development leaders – hence we need to have development team leaders if only for the purpose of interfacing to these risk management folks – and the risk management folks will want to review our testing strategies and the ways that we measure test coverage. They will also be watching closely when our first release is deployed for demonstration: by that time, we should have already tested the application at scale in our cloud based test environment and so we should know what the outcome will be – there should be no surprises – but the first release is still a major visible milestone, even if it is not a production release, and people’s credibility rides on it to a large degree.

The fact that credibility rides on a first release is cultural – but it is also human, and to a large extent independent of culture – and so even though Agile encourages tolerance for failure, that has its limits: the idea is to fail early to learn so that you do not fail when it counts and everyone is watching – including the risk management people who are tasked with finding out if you know what you are doing. Testing is Agile’s primary tool for ensuring early failure (feedback), so it is crucial to do the early planning needed to make sure that testing is thorough.

One sign that your organization is embracing early contained failure as a strategy for ensuring long term success is when teams start using demonstrations as a way of “proving” that compliance requirements are met. Teams often look forward to being able to do this. This then continues to strengthen trust within the organization. The more an organization implicitly trusts its teams, the less process rigor is needed – but this will only occur if continual learning is implemented as a strategic way of ensuring that the teams have and continue to have the knowledge that they need.

Authors (alphabetically):
Scott Barnes
Cliff Berg

As PDF.