transition 2 agile: Real Agile Testing, In Large Organizations

(Continued from Part 1)

Last time we saw that there is no single answer to the level of test planning needed for Agile projects – it depends!

We also remembered that the whole point of testing is to achieve an acceptable level of assurance that the system meets the actual business need – in every way that matters to the organization.

This time we will look at a kind of template for the pieces of an Agile test strategy. You can then add and subtract from this template for your own project – and perhaps even dispense with it altogether for a very simple project – but in that case it at least provides food for thought.

What about technical stories?

Many teams use “technical stories” to specify non-functional requirements. This is ok, except that these are not really stories: you never finish them – they are actually cross-functional acceptance criteria. But casting non-functional requirements as acceptance criteria does not work perfectly either: that means that no story is done until all of the non-functional criteria are done, and that is not a practical way to run an iteration.

Create a “theme” for each type of

non-functional requirement.

Thus, while the above approaches can work, it is often better to treat non-functional requirements as just that: requirements. Don’t try to fit that round peg into the story square hole. Instead, create a “theme” for each type of non-functional requirement, e.g., “performance”, “security”, etc., with theme level acceptance criteria – i.e., requirements! Then write stories for the work that needs to be done; but do not skip creating a strategy for how to test the requirements for each of these themes. A strategy (high level plan) is needed too, in order to think through the non-functional testing and other activities in an end-to-end manner. This is a strategy that the team should develop. The strategy is the design for the testing aspects of the release train. Without it, you will find it difficult to discuss testing issues and dependencies that arise during testing, because there will be no conceptual context, and you will also find it difficult to communicate the testing strategies to stakeholders.

You can define exploratory testing activities
that provide feedback to the application’s monitoring theme.

There is a side benefit. If you treat the testing pipeline as a system, then you are in a good position to identify ways to monitor the application. For example, exploratory performance testing will reveal bottlenecks, and the application can then be enhanced to monitor those bottlenecks during operation of the system. Monitoring platforms such as Sensu can be used to consolidate the monitors across the many components of the application. Thus, in your testing strategy, you can define exploratory testing activities that provide feedback to the application’s monitoring theme, resulting in stories pertaining to operational monitoring. Identifying this ahead of time – at a large grain level – is important for making sure that this type of work is not a surprise to the Product Owner and that it receives sufficient priority. The key is to treat the testing pipeline as an aspect of the development pipeline, and design it with feedback loops, minimum latency, and sufficient coverage of each requirement category.

The key is to treat the testing pipeline as an aspect

of the development pipeline.

The What, Why, Where/When, How/Who, and Coverage

Let’s look at the What, Why, Where, When, How/Who, and Coverage.

The “What” is the category of testing, such as “functional acceptance testing”, “performance testing”, or “exploratory testing”. If you like, these can be grouped together according to the “testing quadrants”.

The “Why” is the aspect of requirements that this type of testing is intended to address, such as “story acceptance criteria”, “system-wide performance requirements”, or “anomaly identification”.

The “Where” is the environment(s) in which the testing will occur. In a CI/CD process, most types of testing will occur in multiple environments (as shown in Figure 1), but not necessarily all – e.g., in the example shown in Figure 1, performance testing is only being done in the “SCALE” environment. Your test strategy should reference a table or information radiator depicting all of the identified test environment types.

The “When” is the event(s) that will trigger the testing, and the frequency if the triggering event is calendar or time based. (Examples are shown in Table 1.) The “How” is the strategy to be used for those types of tests, such as “Use JBehave/Java, Selenium”, or “Use JMeter in cloud instances”.

The “How” should include “Who” – i.e., who will do what: that is, who will write the tests, who will perform them if there are manual tests, etc.

The Where/When and How/Who are

especially important if you interface with

specialized “enterprise” testing functions.

The Where/When and How/Who columns are especially important if you interface with specialized “enterprise” testing functions of any kind, e.g., Security, Performance Testing, Independent Acceptance Testing, etc.: you want to integrate these groups in an Agile “pipeline” manner so that no one is ever waiting on anyone, and that requires that everyone have a very clear idea of what everyone will be doing and when.

Integrate these groups in an Agile “pipeline” manner

so that no one is ever waiting on anyone.

Table 1: Sample lines of a test strategy table.

What	Why	Where, When	How (Strategy), Who	Coverage 1. How measured; 2. How assessed; 3. Sufficiency
Functional acceptance testing	Story acceptance criteria	• LOCAL (Workstation or personal cloud). Continually. • Cloud “CI”. When code pushed. • Cloud TEST. Daily.	Use JBehave/Java, Selenium. Acc test programmer must not be the story programmer.	• How meas: Cobertura. • How asses: Use Gherkin executable test specs, to ensure that no acc crit are missed. • Reqd: Need 100% coverage.
Performance testing	System-wide performance requirements	• “PERF” (in cloud). Nightly.	Use JMeter in cloud instances. Perf test team and architect.	QA will verify coverage of executable test specs.
Exploratory	To detect unanticipated anomalies	• DEMO. Any time.	Manual. Anyone who volunteers – but not the story’s programmer.	Amount of time/effort should be indicated by the story.

The final column, “Coverage”, is really about thoroughness. It has three parts: (1) how test coverage will be measured, (2) how coverage will be assessed, and (3) what level of coverage is considered to be sufficient. This gets into an important issue for testing: How do you know when you are done testing? How do you know how much testing is enough?

How much testing is enough?

In a traditional waterfall development setup, there is often a separate Quality Assurance (QA) function that independently assesses whether the test plan is adequate. This is usually implemented as a gate, such that the QA performs its assessment after the application has been tested. That whole approach is a non-starter for Agile projects – and even more so for continuous delivery (CD) – where working, tested software is produced frequently and can be deployed with little delay. But let’s not throw out the whole concept of QA – like “throwing the baby out with the bath water”. QA can play a vital role for Agile teams: independence.

QA can play a vital role for Agile teams: independence.

My mother Morticia knows the people who are building her website: they are our cousins, and she trusts them implicitly. But the EFT Management Portal is another matter. In that case, an external technical auditor has been engaged to provide independent assessment. But what about inbetween situations? What about run-of-the-mill mission critical applications developed by most organizations? Should you just “trust the team”?

To “trust the team” is not to have blind trust.

To answer that question, we need to clear up a common point of confusion. To “trust the team” is not to have blind trust: if there is a-lot at stake, then blind trust would be illogical and naïve. The Agile adage that one should “trust the team” does not mean to have blind trust: it means to give the team substantial (but not absolute) leeway to do the work in the way that it finds most effective. That does not relieve the team from explaining their processes, or from the responsibility to convince stakeholders that the processes (especially testing) will result in the required level of assurance. After all, some of those stakeholders are paying the bill – it’s their system.

Self-directing teams are never without leadership and vision. Leaders need to ensure that teams have a clear understanding of the end goal (product) and why the business needs the product (vision). When vision and goals are clear, acceptance criteria and intent become much clearer. By producing what stakeholders have described and by being provided a clear set of goals and a vision for a product, teams typically are able to build significant trust with their stakeholders and the business. This trust continues and the team feels empowered.

When clear goals and vision (leadership) are missing, there tend to be longer testing cycles because the testing starts to focus on ensuring the requirements are correct instead of ensuring the requirements are met.

Another consideration is that teams are under great pressure to create features for the Product Owner. If the Product Owner will not have to maintain the application, then the Product Owner will not be very concerned with how maintainable the system is – that is “IT’s problem”. (When Product Owners fulfill the role because they are the responsible person for an application, they are much better within this role. When Product Owners are not responsible for the product because produced but are only responsible for delivery of a project, they are no longer Product Owners and are now back to Project Managers.) Further, the Product Owner will not have the expertise to ask about things such as “concurrency testing”, for checking that the application works correctly when multiple users try to update the same data. In fact, some software teams do not know too much about that either – so should you simply “trust the team”? Teams cannot always be staffed with all of the skills sets that are needed – resources might be constrained. These reasons are why organizations need independent trustworthy assessment of testing – as a second pair of eyes on whether the testing has been sufficient. It is just common sense.

Have QA work with the team on a continuing basis

– not as a “gate”.

If we don’t implement QA as a gate, then how should we do it? The Agile way to do it is to have QA work with the team on a continuing basis, examining the team’s test strategies, spot checking actual testing code, and discussing the testing strategies with the various stakeholders to get their thoughts. QA should have a frequent ongoing presence as development proceeds, so that when the application is ready for release, no assessment is needed – it has already been done – and in fact it has been used to help the team to refine its approach to testing. QA effectively becomes an independent test on the testing process itself – a feedback loop on the testing feedback loop.

How does QA decide how much testing is enough? I.e., how does QA decide what level of coverage is sufficient? That is a hard question. For functional testing there is a fairly straightforward answer: the organization should start tracking the rate at which bugs are found in production, and correlating that metric with the test coverage that was measured when the software was built. Over time, the organization will build a history of metrics that can provide guidance about how much coverage is effective in preventing production bugs – with sufficient assurance for that type of application.

In the previous article we mentioned that story development should include analysts, developers and testers: that one should consider testing strategy as an output of each story’s development, since each story might have unique testing requirements. We have found it very effective when testing or QA teams contribute to that discussion, so that the test plan evolves during the story writing rather than after software has been produced. The testers help write acceptance criteria during the story writing sessions. One of the great advantages of this is that the developer knows exactly how the story will be testing, thus helping implementation direction.

Accumulate operational robustness metrics

over time and use those to inform judgment

about the level of testing that is needed.

This is a little harder to do for other kinds of requirements, e.g., security requirements, performance requirements, maintainability requirements, and so on, but the concept is the same: accumulate operational robustness metrics over time and use those to inform judgment about the level of testing that is needed. We suggest that leveraging architectural themes will help teams keep an eye on key issues such as these.

Back to the table’s “Coverage” column. Consider the example for functional tests shown as row one in the table: we specify Cobertura for #1 (measuring coverage). But Cobertura checks code paths traversed: it does not check that you have actually coded everything that needs to be tested. Thus, #2 should be something like, “Express story level acceptance criteria directly in JBehave Gherkin”. That ensures that nothing gets left out. In other words, we will be using “executable test specs”, or “behavioral specifications”. Finally, as to what coverage is sufficient, we might specify that since we want a really, really robust application, we need 100% code coverage.

The real intent behind coverage is that the more important parts of the application are well covered. We typically do not see 100% coverage over entire application code bases, but that is a nice stretch goal. The most important part of coverage though is that coverage does not stagnate at anything less than 70% and steadily grows over time.

To test the response time, we can write “executable” specs.

Coverage is more difficult to specify for non-functional types of testing. For example, how would you do it for performance tests? The requirement is most likely expressed in SLA form, such as, “The system shall be up 99.99% of the time”, and “The response time will not be less than 0.1 second 99% of the time”.

To test the response time, we can write “executable” specs of the form, “Given that the system has been operating for one hour under a normal load profile (to be defined), when we continue that load for one hour, then the response time is less than 0.1 second for 99% of requests.” Of course, not all performance testing tools provide a language like Gherkin but one can still express the criteria in executable “if, when, then” form and then write matching scripts in the syntax required by the load testing tool.

Testing the up-time requirement is much harder: the only way to do it is to run the system for a long time, and to design in hot recovery mechanisms and lots of redundancy and elasticity. Defining coverage for these kinds of requirements is subjective and is basically a matter of checking that each requirement has a reasonable test.

The coverage requirement for Exploratory testing is interesting: In the example of Table 1, we list it as “Amount of time/effort should be indicated by the story”. In other words, for exploratory testing, decide this when the story is written: decide at that time how thoroughly the exploratory testing should be for that story. This gets back to writing stories that focus on outcomes, as we discussed in Part 1.

The test strategy wiki page is for recording decisions on how testing is actually being done. It is a living, evolving thing.

Most likely some narrative will be needed to explain the table entries, which need to be concise to fit in a cell. If the test strategy is maintained on a wiki (strongly encouraged), it is good Agile practice to use it as the scratchpad for thinking about testing and for recording decisions on how testing is actually being done. It is not a document that one creates and then forgets about: it is a living, evolving thing.

(Note: We consider Sharepoint to be a wiki if (a) everyone on a team can edit the pages, (b) everyone on the team can create new pages, and (c) full change history is maintained; but if you uses Sharepoint, don’t upload documents: create the content right in Sharepoint, as pages.)

The test strategy should inform the decisions on

what environments are needed.

The test strategy should inform the decisions on what environments are needed: this is an integrated decision that should be driven by the testing strategy. Since it takes time to provision environments, or to write scripts that provision cloud environments, this means that the testing strategy is something that is needed very early – enough to allow for the lead time of getting the environments set up. That is why testing strategy should be addressed during release planning, aka “sprint 0”. Ideally, a team continues to use a very similar testing process from one release to the next, or one project to the next, so that the environment architecture stays pretty much the same, and that way you always know what types of environments you will need.

What QA really does is inform us about the current state

of the system under test.

We believe that the term “QA” is a misnomer. We prefer the term “Quality Informers”. Due to the fact that the vast majority of people who make up QA teams are not allowed to touch source code, not allowed to hire and fire and are not allowed to impact or alter budgets, they clearly have no enforceable means of quality assurance. What they do really well though is inform us about the current state of system under test. This is an important point when you consider previous paragraphs where we talked about informing and feedback.

Not everything can be automated

Automation is central to Agile, and it is essential for continuous delivery. But not everything can be automated. For example, these things cannot usually be automated:
1.    Exploratory testing.
2.    Focus group testing and usability testing.
3.    Security penetration testing. (Basic automation is possible.)
4.    Up-time testing.
5.    Testing on every single target mobile platform.

To deal with these in the context of continuous integration (CI) and continuous delivery, the CI/CD process needs to focus on the tests that need to be run repeatedly, versus tests that can be done with sufficiently high confidence that things will not change too much. By automating tests that are repeatable, we free up more time for the type of testing that cannot be automated. For example, if usability testing is done once a month, that might be sufficient unless the entire UX paradigm changes. Security penetration (“pen”) testing should be done on a regular basis, but real (manual) penetration testing is an expensive process and so there is a cost/benefit tradeoff to consider – in many cases (depending on the level of risk), automated pen testing is sufficient, with perhaps expert manual pen testing done on a less frequent basis. Up-time testing can really only be tested in production, unless you are testing a new airplane’s system software before the first shipment and have the luxury of being able to run the software non-stop for a very long time.

Today’s mobile devices present a large problem for platform compatibility testing. There are so many different versions of Android out there, and versions of Apple’s iOS, and many versions of Android have significant differences. Fortunately, there are online services that will run mobile app tests on many different devices in a “cloud”. Even Apple’s OSX operating system can now be run in a cloud.

End Of Part 2

At this point, test-driven development (TDD) proponents are jumping up and down, feeling that their entire world has been sidestepped by this article, so in part 3 of this article we will start off with that. Also, while the Who column of the test strategy table provides a means for coordinating testing-related activities performed by multiple parties, we have not talked about who should do what. For example, some testing activities – such as performance testing and security testing and analysis – might require special skills; and if there are multiple teams, perhaps each working on a separate sub-component or sub-system (possibly some external teams), then how should integration testing be approached? We will examine these issues in a later part of this series.

As PDF.

Authors (alphabetically):
Scott Barnes
Cliff Berg

transition 2 agile

Interview with Madhur Kathuria

Interview with Elena Yatzeck

Monday, December 8, 2014

Real Agile Testing, In Large Organizations – Part 2

What about technical stories?

The What, Why, Where/When, How/Who, and Coverage

How much testing is enough?

Not everything can be automated

End Of Part 2

No comments:

Post a Comment