This is part 3 in a series of blog posts dedicated to helping companies learn what it takes to achieve a Zero Trust security architecture of their own; much like Google’s BeyondCorp.
For a deeper technical dive, ScaleFT is offering a free chapter of the O’Reilly book titled Zero Trust Networks: Building Secure Systems in Untrusted Networks. Get your free copy today!
My previous post in this series focused on the data you should be collecting to gain a better understanding of your current environment to put you in a better position to migrate towards Zero Trust. From those exercises, you should have a concrete inventory of your users and their devices, a clear picture of your network architecture, and a good look into your traffic patterns. This information will lead you to forming the right access policy framework for your organization.
In their BeyondCorp papers, Google talks extensively about the challenges they faced when forming their own policy framework. To be effective, the policies have to be enforceable company-wide, cover a wide range of environments, and be easy to manage. This was no easy task for them, but we can learn from their experience.
Before diving into the policies themselves, it’s a good idea to level set by focusing on the desired outcome - a more secure corporate architecture that doesn’t impact user productivity. We know that removing trust from the network eliminates common attack vectors, but one must be careful when deploying sensitive company applications and services to the public Internet. With Zero Trust, every request must be fully authenticated, authorized, and encrypted, with trust being determined based on dynamic user and device conditions. This model places extra emphasis on the authorization process as we’re no longer making a binary decision based on the network. It’s the role of the access policies to enable smarter decision making that factors in dynamic user and device conditions across ever changing landscapes.
Creating Job Stories to Understand Behavioral Patterns
We have one more exercise to perform to help define your policy framework. As I often mention in this context, I’m a proponent of the Jobs To Be Done framework for product development, and believe that the principles apply here. It’s a similar concept as creating user stories for various personas, but the idea is to do so in the context of the end user doing a job. In doing so, you can ensure you are delivering real value to those using your product.
Considering that the Zero Trust model is centered around an access model for corporate resources, the end user experience is crucial to get right. This was made abundantly clear from the top at Google when the BeyondCorp project - Sergey told the team it has to work and the users have to love it. What they learned throughout their 7-year implementation is that contextual decision making is better done with contextual understanding. In other words, the more you know about how your employees work, the easier it will be to create an effective policy framework.
To get there for your own implementation, the exercise to perform is to select a few of your users and a few of your corporate resources to create stories around how they should and should not be accessed. For example, Alice is a build engineer who needs access to the Jenkins instance. During a build, she is known to be logged in to the dashboard from her company issued laptop, as well as logged into the server over ssh to watch the logs. Following the Job Story format, this would read as:
When a release is ready, I want to open the Jenkins dashboard, so I can prepare and execute the build.
When a release is ready, I want to login to the build server over ssh, so I can inspect the build log in real-time.
These stories will tell you that a member of the Engineering team from a known device should be granted privileged access to the build server, and should be granted web access to the Jenkins instance during a release. Let’s break that down – you can extract a policy decision that factors in the user, her device, her role, the resources being accessed, and even a valid time window.
Getting a few of these stories under your belt only further enhances your ability to make smarter policy decisions based on context. Try a few out for yourself and see what you get.
Defining Your Access Policy Framework
Every exercise up until now has been preparing for the next phase - forming the right access policy framework. Up until now, we’ve looked to Google’s BeyondCorp as a guide, which we will continue to do, but it’s time to really start thinking independently. The reason being that what works for Google in this regard may not work for you. The results of this Job Story exercise will vary from company to company, and the associated policy framework should be able to adapt to accordingly.
What elements to factor
First things first, you need to decide what elements you want to account for in the decision making process – a combination of attributes and state. This will form the foundation of the policy framework, and determine the authentication and authorization workflows as the decision will ultimately be based on data that is gathered.
The most common, and what most everyone is already familiar with, is user conditions – such as team membership, role, and whether they are in good standing with the company. With a Zero Trust model, user conditions are paired with a device for a better definition of identity at the time of the request. This includes device attributes – such as whether it’s in inventory and managed by the company, and device state – such as whether the OS is up-to-date and if the disk is encrypted.
You could easily start there and have the basis of a solid Zero Trust policy framework, but it’s worth introducing additional elements such as multi-factor auth, geolocation rules or time-based controls. Where you want to be careful is introducing too many possibilities where it’s hard for someone to grasp. It’s my recommendation to start with the basics when first implementing access controls so that you can identify patterns and adjust accordingly.
How granular to make the rules
With the elements to factor under consideration, the next step is to determine the rules. Looking back to our Job Story exercise, we want to have rules that are closely associated with behaviors. This is where your potential threat model enters the picture as it’s important to grasp the sensitivity of the resources themselves, and how they should or should not be accessed. What would you do if a request from Alice came from an unknown device trying to get into a Finance application? You may also have specific compliance considerations such as a billing system under PCI-DSS or a user database under HIPAA.
A common pitfall to avoid when making a policy framework is setting too many fine-grained rules. Google ran into this challenge, and eventually found the right balance between coarse-grained rules handled by the access proxy service, and fine-grained rules handled by the resource backends. Their two examples from the research papers were:
Global rules: Usually coarse-grained and affect all services and resources. For example, “Devices at a low tier are not allowed to submit source code.”
Service-specific rules: Specific to each service or hostname; usually involve assertions about the user. For example, “Vendors in group G are allowed access to Web application A.”
If the rules are too complicated or too specific to an individual resource, it will be too challenging to eventually create the language to support them. Not to pick on AWS, but the IAM spec is a beast of a language, and require a deep level of expertise to grasp. With your company-wide Zero Trust implementation, what you really want is a set of rules that anyone can understand and actually put into policy. In a similar vein as the elements to choose, it’s my recommendation to start with coarse-grained rules that can be applied during the request authorization process. Resource-specific rules are then part of the application logic. As you get more comfortable, you can start to introduce RBAC and ABAC rules into the global policies if identifiable patterns emerge.
You still have to grant access
Zero Trust doesn’t mean you never attest trust, it just changes the determination method away from the perimeter to the endpoints themselves. Our goal is to make smarter trust decisions based on dynamic user and device conditions across an always changing landscape. The elements you are evaluating against, and the rules you create still need to come together to form a decision - whether to grant or deny access. In a Zero Trust environment, it’s rarely a binary decision, however, leaving you with, ahem – a decision to make.
Google went with a tier-based model that follows the least privilege principle with BeyondCorp. They are continually processing user and device data, placing each into a Trust Tier. Every resource then has an associated Trust Tier that at minimum must be met to gain access. For example, a user’s mobile phone may not be up to date, so the system will place in a low Trust Tier. Accessing a sensitive HR application might require a higher Trust Tier so any requests from that phone will be denied until the user self-remediates. Another similar model might be a scoring system, where the resources have a specific threshold where access is denied until met. The combination of elements make up the score.
While these models have been shown to work, we at ScaleFT have a different approach that we believe is more effective, and much easier to reason with. We refer to it as a Fact-based model, which means that every rule is specified as a boolean that must be true. We form each rule as a question: Is the OS up-to-date? Is the disk encrypted? When a manager adds the rule to a policy, the answer must be true for access to be granted. Each Policy can hold up to any number of Facts, and the authorization process follows first-match semantics, denying the request when encountering a Fact that evaluates as false.
This model is straightforward for the managers making the policies, and ends up being easier on the users because any necessary remediation can be clearly called out.
Now what do you do with a set of elements and rules used to make dynamic trust decisions? You write a domain-specific language, of course. Okay, don’t run away just yet - that was merely a test to make sure you were still paying attention. The answer to that question deserves its own post, which is what will be coming next. Stay tuned!