Interview with Madhur Kathuria

Madhur Kathuria has coached nearly 300 teams for almost 75 clients across the US, Europe, South East Asia, Malaysia and Thailand. In this interview he talks about some of the cultural challenges for agile adoption. Read it here.

Interview with Elena Yatzeck

Elena was Chief Agilist for JP Morgan Chase Treasury Services and is now a VP of Corporate Compliance Tech. Find out how JP Morgan Chase reconciles agile with compliance and risk management demands. Read it here.

Sunday, September 28, 2014

Key Transformation Step: Establish Agile Release Planning

The come to us at the last minute, and then complain that we can’t create their production test environment when they want it!

They come to us at the last minute, and then despair that we don’t have any testers available and also wonder why they don’t have any automated testing set up!

They come to us at the last minute, and then wonder why the security review is going to delay their application’s deployment!

They scheduled their performance testing for the end, and then wonder why they have no time to fix the problems that were found!

They ignored the enterprise architect’s advice and designed their app with no regard for enterprise standards, and then wonder why people are upset!

They talk about continuous delivery, but we are the ones who have to operate their app, and they have not designed it be easy to operate, and so they wonder why we are having trouble restarting instances!


These are common laments from “functional managers” in large organizations that are trying to go agile. Functional managers are those who are responsible for traditional development lifecycle “steps”, such as enterprise architecture, security, testing, quality assurance, release management, deployment, operations, and other “silos” of a typical legacy IT organization. Agilists often blame these silos for the problems, but that is not helpful, and in fact agilists are brought in to help to streamline things, so they need to have answers about how to replace these functions with their agile equivalents.

Replacing functions is a huge step, however. A great deal of learning is required for an organization to go agile, and most organizations need intermediate steps. One of the most effective intermediate steps – one that fosters the required learning – is beefing up the release planning process, to make it more inclusive. In other words, invite each of these silos: get them in a room and talk through the issues, so that nothing is left for the end of the release when there is no time left. Allow people advance notice of what is coming, so that they have time to manage their own processes and their own workloads.

Busy functional managers will not always have time to attend each project’s release planning session. They will want to send someone on their behalf – one of the technical staff. That means that you need to give them advance notice of a release planning session and what the project is about, so that they can see who is available and send the right person. You also should tell them that it will be optimal if that person can be the one who continues to work with the team throughout development, as needed: that might affect who the functional manager sends, based on that function’s resource work schedule.

Not everyone needs to be in a release planning meeting from beginning to end. A release planning meeting can take from an hour to several weeks, depending on whether the project is new and how complex it is. If you expect it to take more than a few hours, it is best to plan which issues will be discussed when, and invite the right functional representatives for those time slots: they will be appreciative. However, there might be some topics for which you want everyone there. More on that in a moment.

Release Planning Topics

So what happens during this type of release planning? What things should you talk about?

The answer is conceptually simple: talk through all of the top level issues that affect what will be done, who will do what, how it will be done, and how things will work. I call these the conceptual pillars:
1.    The core vision and high level requirements.
2.    The team roles and collaborative processes.
3.    The core design concept.
4.    The development pipeline.
5.    The testing strategies.
6.    The external dependencies.

Most agile teams focus a great deal of attention on #1 – the release backlog. That is indeed the foundational element: it is what you will be building. So I am not going to say any more about that: everyone reading this knows all about it. There are books on how to do it, and it is part of core agile training. Teams also generally spend time on #2 – team roles – but not sufficiently, so I will discuss that below.

The rest of the pillars are things that teams often miss. These things are really important for continuous delivery (CD). Without nailing these, you will have a hard time getting to CD: you will find things not working well. Let’s take them one by one.

#2: Team roles and collaborative processes

Many teams come up with a list of team roles beyond the fairly common Scrum roles of Product Owner, Scrum Master, and Team Member. The additional roles accommodate the “silo” processes that are imposed by the organization. Remember, baby steps ;-) Such “extended team” roles might include test programmer, acceptance tester, QA analyst, security analyst, agile coach, project manager, tech lead, enterprise architect, release management liaison, data center engineer, and so on. The point of this discussion is not to get into what each of these roles might do and whether it is needed: it is assumed here that the organization currently requires these roles, and the discussion here is how to accommodate them so that everyone can do their job in the most agile way possible.

Agile teams make a-lot of decisions in ad-hoc discussions. The problem is, when there are extended team roles such as those listed above, those roles are easily excluded from the ad-hoc discussions, yet these discussions often impact those functions. It works the other way too: these extended team roles often make choices that affect the team. All these call for more communication: the extended team and the immediate team (development team) must collaborate on an ongoing basis, as needed. During release planning, it is important to discuss this important issue with each of the extended team roles, and collectively decide what the best method is for collaborating in an ongoing manner and keeping each other up to date. Simply asking everyone to join the standup might not be the best way: there might be too many people, and they also might not all be available at the standup time. Work out what makes sense. I am not even going to propose an approach here, because there are so many ways and each functional area will have such different collaboration needs.

#3: The core design concept

The Scaled Agile Framework (SAFe) talks about “architecture runway”. Scott Ambler has long talked about “agile architecture” and “agile modeling”. Feature Driven Development (FDD) talks about the importance of modeling the business at the start of a project (great explanation here) to establish a shared conceptual understanding of key data and relationships. (Note: I was on the Singapore project that the article refers to.) The importance of having face to face, early discussions about the primary models and design patterns is extremely important. This is not “big design up front” (BDUF), in which too many details are figured out too early. Instead, early discussions about models and design get everyone on the same page, using “broad brush strokes”. This greatly catalyzes future discussion and increases the “emotional intelligence” of the team as a whole. It is about Peter Senge’s Five Disciplines: establishing and exchanging models of understanding and working to resolve those.

Up front high level modeling and analysis also informs decisions about what kinds of testing will probably be needed, what components and services will probably be needed, what the major interfaces of the system will probably be, and who needs to be involved when, because all of these choices vary with different technology stacks and different design requirements. And I emphasize the word “probably” in all this because up front decisions are always subject to change: they are merely a starting point.

There is no better time to establish the architecture runway than at the start of the project (or release), in an all team discussion, including those members of the extended team who might be affected. For example, some technologies are easier to deploy using automation than others, and a data center representative should be present for discussions about how the design might affect deployment – that is, if continuous delivery is important to you, and if the development team does not have direct access to the production deployment environment.

#4: The development pipeline

The “pipeline” is the end-to-end process from the Product Owner’s conceptualization of features through deployment, operation, and maintenance of an application. It even extends farther than that, but this scope is fine for this discussion.

Defining the pipeline consists of talking through and deciding what will happen when, why it is happening, how it will be done, who will do it, where it will occur, and what the acceptance criteria are. It extends to everything that is involved in the planning, creation, release, and operation of the software. Not all of these decisions need to be made up front, but it is crucial to get everyone together and agree on the basic outlines of the process, and identify decisions that still need to be made and who “has point” on those decisions. This is about designing the continuous integration and continuous delivery process as a whole, looking at it as a system.

When defining the pipeline, make sure that you also define how progress will be tracked for every aspect of the pipeline. The software development team uses agile work management tools such as a story board, which makes its progress very visible. There needs to be equivalent visibility for the work of the extended team, aggregated in a manner that the development team, as well as management, can see at a glance what the rate of progress is (“velocity”) and what is holding things up (“blockages” or “impediments”). It might be hard to aggregate this information because each silo area generally has its own work management tool, but this aggregation is perhaps something that the project manager can do, if there is one. Alternatively – and this is more agile – each extended team member can update a shared task board, possibly on a project wiki, that tracks the external tasks that the development team is depending on.

#5: Testing strategies

Agile teams know a-lot about testing. Agile turns testing into programming: testing is replaced by test programming. This is not new – I did lots of this during the 1980s and I am sure many others did it way before that – but agile emphasizes the importance of it, as an enabler for continuous integration, whereby code is not checked in until it passes all the tests, and this is done frequently – often many times a day.

There is a gap here though. Most agile testing practices focus on functional testing. Continuous delivery requires that we expand that, to include every kind of testing: failure mode testing, scalability testing, security testing, and whatever other kind of testing is required for production deployment. Also, many agile teams forget to plan for test data. If you will need complex test data from a business area, invite that business area to the release planning session and pin down how the test data will be created and when it will start to be available.

The Definition Of Done – Revisited

Teams often define a “definition of done” (DOD) that lists the criteria that all stories must meet in order for them to be considered “done”. The DOD usually specifies that all functional tests have passed. This is not easily extended to other kinds of tests, because non-functional tests are often not story-specific, and some kinds of tests, e.g., full performance tests, are not tests that you want to run many times during an iteration. We therefore need to expand the DOD concept in some way.

One approach that I have seen work is to have the DOD apply only to tests that are story-specific. These are generally acceptance tests. There needs to be an automated suite of acceptance tests to make this feasible, and they must be organized by story, built around the each story’s acceptance criteria. That is pretty common agile practice. Other tests that are not story specific should not be covered by the DOD, but instead should be run many times during an iteration. This can be nightly for some, or nightly for others if they take a long time to run. Integration tests fall into this category, and they are often run when code is checked in. Again, it depends on the duration of and resources required by the tests.

Some failure mode tests are story specific, and others are not. For example, stress tests are generally not story specific, and they should be run on a regular basis, but not necessarily every time code is checked in.

Security tests are especially problematic. You might even wonder, What are security tests? Many teams now add code scanning to their CI process. Code scanning is not enough though if you want to have secure code. To have secure code, you need to make an “assurance argument” for each feature that you design, and you need to verify through analysis that the assumptions of that assurance argument are met. (There is a good discussion here, but an “agile” approach to this should be less structured and more based on the programmer’s knowledge of secure design patterns. That is the primary topic of my book High-Assurance Design. For secure design patterns, see also the book Core Security Patterns by Steel et. al., and visit owasp.org for Web app security techniques.) Note that this is an analytical process – not a testing process. However, it can be turned into a testing process by testing that the assumptions of each assurance argument remain true. Tools such as Fortify can be used for this purpose. Not many agile teams do this today, but as security becomes more important, it is essential that teams start to learn these techniques, because security scanning will never be sufficient, as it catches only a small fraction of vulnerabilities. (See this article.)

Test Coverage

To achieve continuous delivery, you have to have confidence that your automated test suite is verifying that everything that presents a substantial risk is covered. Notice that I said “substantial”. Life is full of tradeoffs: continuous delivery carries the great benefit that you can push changes to users as quickly as you want to, but it does not eliminate all risk. Some things will slip through your tests, so you need to make sure that the risk is acceptable – not zero.

The question is, how do you know how complete your tests are? For functional tests, we measure test coverage. Test coverage needs to measure how completely the requirements (functional and non-functional) are met. For continuous delivery, the non-functional tests become just as important as the functional ones. But how do you measure “coverage” for security? For performance? For reliability? For enterprise architecture compliance? For maintainability?

The first step is to actually define those requirements. The next is to add them as acceptance criteria, either at a story level, or at a system level. The system level acceptance criteria apply to the tests that are not story specific. Coverage for performance means that all performance requirements are met. The same applies to each other area. Work with the extended team members to define what coverage should mean for each of their areas, and how it can best me measured or assessed, as automatically and repeatably as possible.

#6: The external dependencies

Dependencies on events beyond the development team represent immense risks that are generally beyond the control of the team. I am not talking about dependencies on the work that needs to be done by the extended team, because we already already discussed that. Here I am referring to things that are beyond the control of even the extended team: things like the arrival of equipment from a supplier, the availability of a test instance of a system from a third party, etc. This is an area in which a project manager – if you have one – can really help: by persistently pursuing external dependencies.

Recap

Functional silos like this are not compatible with agile. Agile and continuous delivery work best if the development teams have full responsibility and control for every step of the solution development and delivery pipeline, rather than relying on external parties to perform isolated steps independently. However, for it to be possible for the development teams to have full control, they need to learn the many skills and concerns that are represented by each silo area. QA exists for a reason. Release Management exists for a reason. That learning takes time.

Eventually the silos can be converted into training and coaching teams that teach development teams about security, regulatory compliance rules, test automation, deployment, and all the other things that these silos currently do. Some of the silo functions will go away entirely, or will transform: QA can shift from checking documents to performing actual assessment of risk coverage by the automated tests – including the non-functional tests. Security can shift from scrutinizing security plans to teaching teams how to conduct threat modeling and how to use tools such as Fortify more effectively. Testing can change from providing manual testers to teaching teams how to set up and use test automation tools. The silos change from being functions that “do” to functions that “teach”. That takes time though: all those functions need to change to make that possible. They must learn about using more automation, they must learn about teaching and coaching, and they must learn about how development teams work. Setting up a collaborative relationship between the teams and the silos at the start of each project is a crucial first step.

PDF here.

Monday, September 15, 2014

New section launched: Agile Around the World

With today's interview of Madhur Kathuria we launch a new section, Agile Around the World. In this section we focus on the way that regional and international cultures impact agile adoption efforts and the strategies that are most effective. Read the interview here!


Thursday, September 11, 2014

How Can Agile Support Business Agility?

At the end of 2012 Iberia Airlines embarked on a transformation to return to profitability. Some of the highlights of the transformation plan were:
1.    Stem Iberia's cash losses by mid-2013.
2.    Turnaround in profitability of at least €600 million from 2012 levels to align Iberia with IAG's target return on capital of 12 per cent by 2015.
3.    Network capacity cut by 15 per cent in 2013 to focus on profitable routes.
4.    Downsizing its fleet by 25 aircraft - five long haul and 20 short haul.
5.    Reduction of 4,500 jobs to safeguard around 15,500 posts across the airline. This is in line with capacity cuts and improved productivity across the airline.
6.    New commercial initiatives to boost unit revenues including increased ancillary sales and website redesign.
7.    Discontinue non-profitable third party maintenance and retain profitable ground handling services outside Madrid.
8.    The transformation will be funded from Iberia's internal resources.

Basically, Iberia needed to pivot – and pivot hard and fast. Here is a quote from Rafael Sánchez-Lozano, Iberia’s CEO: “Time is not on our side. We have set a deadline of January 31, 2013 to reach agreement with our trade unions. We enter those negotiations in good faith. If we do not reach consensus we will have to take more radical action which will lead to greater reductions in capacity and jobs”.

Negotiations with the union were ultimately successful and the company has nearly returned to profitability. One of the reasons cited for the success was that, “Iberia’s move to make 3100 staff redundant and axe 23 aircraft helped it cut its operating loss by €185 million to €166 million.” [http://www.independent.co.uk/news/business/news/iag-returns-to-profitability-on-iberia-overhaul-9160554.html] But according to the same article, “…cargo revenues collapsed by almost 12 per cent, and Walsh [CEO of IAG – Iberia’s parent company] warned not to expect too much, too soon. ‘It’s going to be a better year, but the environment is still lower than where we were at the peak.’ ”

From the article, “BMI and Vueling have been the only acquisitions, and today Walsh added: ‘We haven’t got anything planned at the moment, but we are always keeping our eyes open,’ and called on the Government to change its rules because ‘people want to come to London, but many don’t because they struggle with the visa regime.’ ”

How might agile fit into any of this? Certainly not in the union negotiations – not unless you change the collective bargaining process, but as they say, one should “pick one’s battles”. Certainly not in the lobbying of the government to change visa policies – that is an art that happens behind the scenes with little transparency (and that itself is not very agile and is a problem – but changing that is surely an even steeper hill to climb).

And one might presume that agile is not applicable in the consideration of possible acquisitions – or is it? Merger and acquisition (M&A) analysis is a highly complex financial and strategic analysis process, often supported by simulation models – along with judgment on the part of seasoned executives who have prior experience in M&A. Those kinds of decisions are not made using a “business model canvas” – there is too much at stake, and one cannot usually pivot if things don’t work out: it is game over – at least for the chief executive. Can agile apply to this process?

The kinds of complex financial modeling that these kinds of businesses do for an M&A involves defining multiple scenarios and creating a model for each, and often running simulations of the scenarios to account for external variables that might change over time (interest rates, costs, demand, and indirect factors that affect these) as well as various types of future events that would affect the outcome (e.g., a new competitor appears on the scene, or there is a labor strike). The parameters of these scenarios are subject to debate, as are the scenarios themselves. Collaboration between executives and the economic analysts would surely be very valuable, rather than trusting a team of analysts to come up with the numbers and merely present them as fact. High quality models should explicitly account for uncertainty, and that uncertainty is also open for debate, because this kind of modeling is an art – not a science. The scenarios are up for debate because a scenario presumes a strategy: strategies are defined as scenarios and then compared. Strategies could be cast in understandable terms, and then the details filled in by the economic modelers, with key parameters called out. Pivoting might even work in some cases: for example, an airline could try a new strategy on some routes before scaling up the strategy. However, if things have come to a point of crisis, there might not be time for trying things out or “failing early” – one might be forced into a situation of making an existential gamble.

Managing in a crisis is actually easier than otherwise, because people will shift all of their priorities without resistance: survival is at stake.

Iberia was clearly in a crisis: according to Sánchez-Lozano, “As well as halting Iberia‟s financial decline we will establish a viable business that can grow profitably in the long term.” Thus, the first element of the plan was to stop the bleeding, making cuts by “suspending loss making routes and frequencies and ensuring there is effective feed for profitable long haul flights.” This immediate triage required decisive executive action – an important form of leadership in a crisis but not always the best form for long term durable changes.

Managing in a crisis is actually easier than otherwise, because people will shift all of their priorities without resistance: survival is at stake. Things get more difficult when the crisis has passed. Enduring change for systemic problems is much harder to achieve. That’s why one must ask, How did we get here? Walsh said, “The boom years are definitely behind us, and I can’t see anything that will bring us back to those levels of growth.” So the environment changed: it was external factors in this case. Or was it? Why did Iberia not position itself for more resiliency? Airlines are highly competitive and operate on very thin margins, and Iberia has a union, so maybe it was not possible to build in a buffer for hard times. We cannot know from the outside.

Item 6 of the plan also gets my attention: new commercial initiatives. This is always a fertile area for agile methods – including the business model canvas. Another area that comes to mind is increasing efficiency, using Lean techniques. Especially given that the airline had to restructure, and implement mergers, there was surely an opportunity for collaborative management and the application of Lean methods for defining new and better ways to operate – perhaps even innovative non-hierarchical decision-making structures. And of course, having an IT function that is ready to execute with rapid delivery is the most obvious way to support business agility.

Even though much of agile has its roots in business, most people in the agile community today work in an IT setting and that setting is the bulk of their experience. If you are an agilist in an IT setting, and would like to advocate for the use of agile ideas in a broader business context, the question is, How should one engage with business stakeholders on these issues? Business stakeholders generally have years of experience in business operations and often have MBAs and financial training. They also often have market facing experience that provides them with judgment about external opportunities and risks, as well as profit-and-loss responsibility experience that grounds their judgment about operational risks and makes them very results oriented. An IT agilist seeking to have a dialog with someone from a business area on how to apply agile in a business context – rather than merely an IT software development context – needs to win the trust of the business stakeholder. That stakeholder rightly feels that they know a-lot and that they understand their business much better than someone from IT. Talking to them is therefore an opportunity to collaborate – not to teach.

A good approach is to express an interest and listen, and bring lots of humility. These people do not need to be told that if they would merely “go agile” that all of their challenges would be solved. Nor will they be interested in learning an entirely new vocabulary, so it is best for an agilist to learn business equivalents for agile terms wherever there is one. Instead of “velocity”, say “throughput”. Instead of “spike”, say “prototype”. Instead of “fail fast, fail early”, say “do a prototype”. Instead of “team happiness”, say “team morale”. According to Linda Berens, an expert on human agility and a partner in HolacracyOne, “Communicate in the style the client most likely wants and using the words they most easily understand.” [https://lindaberens.com/consulting/communicating-well-with-non-technical-people/]

Business agility is a complex topic and we need to be careful not to presume that agile – as understood and experienced in an IT context – has all of the answers for how to run a business. However, many of the ideas are greatly applicable. The best way to find out which ones is to start a dialog, show an interest, listen before you speak, and then share your own experiences. Business people need to learn about today’s IT as much as IT people need to learn about today’s business – they simply do not know what they don’t know (that applies to both camps) – but to get them to listen, you need to get them to trust you, and the best way to do that is to acknowledge their experience, knowledge, vocabulary, and points of view.

As PDF.

Sunday, September 7, 2014

DevOps and Security – How Is It Possible?

Traditional IT approaches to security entail careful step-wise processes of design, review, testing, and monitoring. These steps are usually separated by gates: software is not permitted to progress to the next step until it has met the criteria of a gate review. How can such a careful and step-wise process be used in concert with a DevOps pipeline, which requires that software be delivered to production continually and with little delay?

The challenge is made more difficult by a fundamental fact about security: security is about negative requirements. That is, security requirements are often that a certain thing cannot be done, rather that it can be done. Such negative requirements are notoriously difficult to test for, and creating automated tests for these types of requirements is even more difficult. Yet, DevOps relies on the ability to create automated tests for everything.

Before I go any further, I should say a little about the lay of the land – the scope of what is encompassed by IT security. IT security is often thought of in terms of the levels of a “stack”, such as (starting from the bottom) network, OS, application platform, applications, and user (this is an over simplification). It is really, really important for applications to have a secure application platform, OS, and network: otherwise, the application cannot trust the services that it uses from those levels. In my discussion here, I am only going to talk about the application platform and up, because those are things that I know something about.

Application platforms are software systems that enable programmers to create customized applications. Examples of such systems include databases and application servers. It takes a-lot of time and effort to secure these platforms, and so it is advisable to create standard configurations that can be reused across all of your software projects. In the control-oriented parlance that is used by government agencies, you can have a fixed set of application platforms that meet the requirements of the applicable controls, and those controls can then be “inherited” by systems that use those platforms – assuming that the configurations are not changed. This approach is very “DevOps friendly” because no testing of the application platforms is required during development or deployment if the application platforms use the standard configuration that has already been verified as secure. The only issue then is when to upgrade the platform, when a new version is available: that’s not a simple issue but it is beyond the scope of the discussion here.

The undiscovered country – the application level

The next level of the stack is the application level, and that is where the fun begins. However, before worrying too much about application level security, you should ask yourself how much you are worried about a targeted attack. A targeted attack is one in which someone singles out your organization and undertakes to penetrate it, spending perhaps months to achieve their goal. Not every organization is a likely target for this level of attack. Organizations that are include those that handle large volumes of slightly sensitive information (for example, credit card numbers), or small volumes of highly sensitive information (for example, plans for nuclear weapons). If you are not in these categories, then your organization is probably safe from a targeted attack, and it is unlikely that someone will go to the trouble to discover the unique vulnerabilities in your custom application code.

If your organization does store sensitive information, then you are a potential target at the application level, and you should be thinking about how to secure the processes used to code and deploy your custom software applications. From a DevOps perspective, the goal is to automate as much security testing as possible.

The most common approach to automated security testing of application code is to use code scanning tools, aka “static analysis”, or “static application security testing” (SAST). This is an old method, but it is continually refreshed to support the new application languages. Static analysis has its history in code quality analysis tools, going back to “lint” – a Unix tool that scans C language programs and finds potential bugs. Nowadays, one of the most widely used tools for code level security scanning is Fortify, but there are many others. And one caveat: static analysis does not work very well for dynamic languages such as Ruby and Python.

Static code scanning a crucial for security. It is not a silver bullet though. In fact, it has severe limitations: (1) it only finds a fraction of the vulnerabilities [http://samate.nist.gov/docs/SA_tool_effect_QoP.pdf]; and (2) static analysis generates a-lot of false positives – and by a-lot, I mean a-lot: I am talking about thousands of false positives for every actual problem found.

The false positive problem is not as bad as it sounds, because some tools allow a programmer to flag a false positive as having been checked, so that it is not generated when the tool is run again later, although doing that carries its own risks. The more severe problem is that static analysis only finds a fraction of the vulnerabilities. Consider the latest IT security debacle in the news, in which private photos of movie stars were released to the public. Many of these photos appear to have come from the personal Apple iCloud accounts of the actors, and there is evidence that those accounts were penetrated using brute force guessing of passwords (a “dictionary attack”) in the Find My iPhone web application. This web application was apparently vulnerable to this type of attack because it permitted the user to try passwords again and again, without limit. [http://theconversation.com/novice-mistake-may-have-been-the-cause-of-the-icloud-naked-celebrities-hack-31272] Static analysis would not have detected this type of error: it was due to an application logic choice made by the programmer.

There are other automated tools that can come to the rescue. For example, password cracking tools would have discovered the Find My iPhone flaw; but to know to use that tool requires familiarity with the range of security testing tools and what they are used for – there are so many tools, and one cannot blindly use them all. There are also “dynamic” tools that will try to penetrate a system while it is running, and those can be helpful. However, it is often difficult for these tools to know that something that seems ok is actually not ok. For example, how would a security scanning tool know that if an administrator logs in, that the administrator should not be able to read the data of users of the system? It cannot know, because such a constraint is a business rule, and tools cannot anticipate business rules.

The challenge here is quite large: any programmer can write a line of code that will subvert all of the security tools that are in place. Whether this is done intentionally or unintentionally (that is, whether it is an insider attack or merely an error), scanning tools cannot read the minds of programmers and product owners and deduce what is an intended action versus what is an inappropriate action. The only real solution here is to have programmers educate themselves about application security, so that they make fewer unintentional mistakes. There are voluminous online resources for learning about application security, such as https://www.owasp.org. (See in particular [https://www.owasp.org/index.php/Testing_Guide_Introduction]) The organization could also provide incentives for developers to obtain CSSLP certification. This is not a lightweight certification: it requires a-lot of work to obtain this, and it is highly worthwhile. Your team should also employ security analysis practices such as threat modeling: threat modeling is a collaborative session in which the team sits in a room and tries to come up with ways to penetrate the application, based solely on the application’s design and code. It is a thought experiment. Its value is that it gets everyone on the team thinking about security. Your architects or technical leads should also be thinking about secure design patterns, applying concepts such as compartmentalization, least privilege, separation of duties, and privileged contexts (for a discussion and design pattern, see my book High-Assurance Design, p. 219). They should also be choosing robust security frameworks so that application developers have a trustworthy toolset for implementing security functions such as authentication and authorization.

Before you undertake to secure all of your code, you should also think about where the risks really are. Generally, applications have only certain modules that need to perform sensitive operations. Identify those, and put your security testing attention and code review efforts on those. It is a waste of time to do threat modeling on parts of a system that do not do anything sensitive and do not connect to anything sensitive. A good starting point is to consider what the application’s “attack surface” is.

Another thing to consider is how secure your development environment is: what good is your application level security if a hacker can get into your development git repository – possibly in a public cloud protected only by a password that is itself not well protected – so that the hacker can inspect your code for vulnerabilities or possibly even insert some malicious code into it?

It is also important that security be treated as a first class application requirement. In an agile context, that means writing security stories that are put into the backlog. (Here is a good article: http://www.infoq.com/articles/managing-security-requirements-in-agile-projects) It also means adding security acceptance criteria to all stories that have security ramifications. Knowing enough to do that requires a focus on security, as a critical application concern, from the outset. It requires having an application security expert on the team, involved from day one. And it requires a product owner who understands the importance of security: for example, if there is a stakeholder team that communicates requirements and concerns to the product owner, there should be a security stakeholder on that team.

There is also the problem of external partners: other organizations that build pieces of a system that you then connect to and trust. Attacks often occur at the weakest link. In the attack on Target, in which millions of credit card numbers were stolen, the attack began with an attack on Fazio Mechanical, a heating, air conditioning and refrigeration company – a firm that Target had contracted. [http://krebsonsecurity.com/2014/02/email-attack-on-vendor-set-up-breach-at-target/] Target had no direct control over how secure Fazio’s systems and procedures were, yet Target gave Fazio access to Target’s systems. The lesson here is that if you have external partners, treat them as a potential weak link, and only give them the least privileged access that they need, and monitor their access with intrusion detection tools.

Deployment processes are application level code!

People often think of deployment processes as part of infrastructure and support, but DevOps changes that. An important aspect of DevOps is to implement all processes as code that can be put under configuration control and made automatically repeatable. That means that the people who create the deployment processes are writing code, and if this code is unique to the application, then it is best treated as application code, with all of the same quality considerations – including security. This means that all of the practices that you adopt with respect to writing secure code should apply to the deployment scripts and deployment configuration files.

In a DevOps world, deployment does not stop at putting the code onto servers: deployment is about building the process for putting code onto servers, and that includes testing. That testing should include security testing. You do not want to burden your development build process with full blown security testing each time a developer checks code in. Instead, there should be a series of progressively thorough security testing the farther down you are in the continuous delivery pipeline. For example, the development build process should include basic static scanning, as well as other kinds of scanning by selected tools. Down the line there should be processes that are run periodically to perform script based penetration testing, password cracking, and “fuzzing”. Manual penetration testing should also be done on a periodic based – not merely at the end before release. Otherwise, there will be no time to fix problems within the scope of the agile timeline.

Designing all of these processes is a significant architectural effort. It is not possible to have a standard process that will be appropriate for all applications. The types of security testing that are appropriate depend on the nature of the application and on its design. Design of the security aspect of the testing pipeline is therefore a companion activity of the application design. That said, it is possible to reuse the security testing pipeline to a large degree if the basic application architecture is reused.

The user – the weakest link

In the Target security breach, one can blame the security procedures of Fazio, yet it turns out that Fazio was compromised through the use of an email malware attack. This implicates users of Fazio systems – users who might have naively clicked on malware attachments or phishing hyperlinks. In the end, users are the weakest link.

The only way to mitigate against security mistakes by users is to educate them. Users who do not have sufficient knowledge of the ways that they might be tricked into compromising systems should not be given access to those systems: it is that simple. Just as driving or flying a plane requires expertise to do it safely, the same applies to using computer systems that manage sensitive information. If you want to see the range of ways that users can be tricked, check out my own taxonomy of social engineering methods in section 5.2 of this chapter.

In the final analysis, security is not a DevOps issue: it is a human issue.

PDF available here.

Thursday, September 4, 2014

Does Organization Culture Impact Strategy?

Zappos, a maker of shoes, has been in the news a-lot lately because of its adoption of a “holacratic” operational model in lieu of a traditional management hierarchy. Zappos expects all of its employees to make their own decisions about everything, and to work things out with others through collaboration. To scale this, they have defined a governance structure for overlapping teams and how those teams make decisions. It is significant that Zappos only hires people who can thrive in this type of environment. The Zappos core values are:

1.    Deliver WOW Through Service
2.    Embrace and Drive Change
3.    Create Fun and A Little Weirdness
4.    Be Adventurous, Creative, and Open-Minded
5.    Pursue Growth and Learning
6.    Build Open and Honest Relationships With Communication
7.    Build a Positive Team and Family Spirit
8.    Do More With Less
9.    Be Passionate and Determined
10.    Be Humble

If your organization is made up of people who have these values, then you pretty much have no choice but to let them have a great deal of autonomy about how they work: they will not be successful any other way, because these are all highly individualistic, creative, and self motivated people.

Now consider a government agency. The mission and core values of the US Forest Service are:

1.    Safety - Safety will never be compromised, regardless of the work at hand.
2.    Tradition, Pride and Respect - Don’t rest on the Hotshot name!
3.    Physical fitness and mental toughness - Rise to the occasion, strive to be the best.
4.    Productivity - An honest day of work in return for an honest day’s pay, regardless of assignment.
5.    Unity and Diversity - Communication and teamwork get the job done.
6.    Professionalism - Constantly being watched and evaluated… the details matter.
7.    Training - MIHC is committed to training quality leaders, both in the classroom and on the line.
   
Do you see any mention of passion? Of embracing change? And notice that “training” replaces “pursue growth and learning”, indicating that employees expect to be trained rather than learning on their own. There is nothing wrong with these values – they are all good things – but they emphasize different things such as safety (some Forest Service workers do field work), tradition, and professionalism.

This brings to mind the comments made in a recent article by conservative pundit Grover Norquist about his experiences at the very bohemian Burning Man festival in Nevada: “The story of Burning Man is one of radical self-reliance…A community that comes together with a minimum of ‘rules’ demands self-reliance – that everyone clean up after themselves and help thy neighbor. Some day, I want to live 52 weeks a year in a state or city that acts like this…This is hard work. Indeed, there is entirely too much work involved at Burning Man for lazy people to get to the Playa, nevermind build a camp or feed yourself.”

The last sentence is significant: people who are not interested or not able to meet the expectations of the Burning Man ethos do not attend it. The “radical self-reliance” that one sees at Burning Man is a result of its culture and the kinds of people who it attracts. Culture is not only a result of learned patterns of behavior and experiences: it is also a result of the natural tendencies of the individuals in a population. Nature versus nurture: they both matter.

Given this, one cannot expect two very different workforces to react the same way given the same approach from their executives. It is therefore absolutely crucial to understand the culture when planning for transformation.

One cannot expect two very different workforces to react the same way given the same approach from their executives.

One of the most respected methods of formal cultural assessment is the Barrett Cultural Values Assessment. The Barrett approach involves having staff take this assessment, and then a facilitator helps them to discuss the results, which enables them to account for their own cultural biases when they plan for change. In this approach, leadership is about guiding this process.

This sounds like it might be generally applicable to all organizations, but then consider the case of Steve Jobs, who was famous for being autocratic, yet he made Apple into a huge success – twice. Barrett points out that people who have exceptional “charisma” and “reputation” can get away with being autocratic because they inspire people: people will put up with command-and-control behavior if they trust that the leader is taking them somewhere and they feel that they are making a contribution. They “feed their self-esteem by association, and shared identity,” as Barrett puts it. One might call exceptional people such as Steve Jobs “unicorns” – you hear about them, but few have seen them, and so what works for them might not work in a general repeatable manner for others. The same applies to organizations: what works for “unicorns” like Netflix, Amazon, and Google – or Zappos – might not work for other organizations that have different cultures, or different levels of financial resources for making things work at any cost.

Unicorns aside, let's use an example to think this through. Consider the very last impediment listed on the chart from my July 25 post: “Technical skill silos need to be broken down”: in order to address that, it will likely be necessary to change how many of IT’s traditional functions work, and that will require getting senior IT functional managers to meet and discuss that. Since these individuals have been operating in silos (per the impediment’s proposition that silos need to be broken down), the manager of each of these silos has been operating in a kind of zero sum game with respect to the other silo managers, and so it is almost inevitable that there is an atmosphere of competition among them rather than an atmosphere of collaboration. Changing that behavioral dysfunction will take more than getting them to understand themselves better – it will require some very aggressive and meaningful action on the part of the CIO, possibly including changing compensation incentives, reorganizing the functions and how budgets are allocated, and creating some team efforts that can only success if each of these individuals makes an effort – in other words, truly putting them all on the same team rather than arranged as competitors. These are structural changes. The real question is how those changes will be conceived and implemented: (A) autocratically with a dictum to come up with a solution without any facilitation or guidance – i.e., “make it so”, (B) through micromanagement that dictates the details of every change, or (C) through a facilitated collaborative process that is somewhere inbetween the extremes of A and B, that is sensitive to the organizational cultural that exists at that point in time and that allows the staff to participate in defining the changes – including the required cultural changes.

Which do you think would work best? But even if you pick C, there is still the question of how much guidance to give, and the answer to that depends on the organization's culture.