We covered the key role of attributes in my last blogpost, moving from the blunter scope of groups and roles to the more fine-grained approach of attributes. Now we’re going to take this progression a step further, as we narrow in on my favorite topic: digital context. (If you haven’t already, check out my first two posts on context, where I laid out the roadmap and looked at groups, roles, and attributes.) Our first order today is to travel back to logic class and think about predicates.* But Michel, you’re thinking, what does all this have to do with digital context? Well, one way to describe a context about something is to express it using sentences related to the question. While we will come back to the definition of context in a following post, for now let’s just say that we need some building blocks to express facts about the world, some form of sentences that can be interpreted by a computer, and logic is one of the tools for that.
Subject-Predicate-Object: First Order Logic 101
In my most recent post, we saw how the notions of groups and roles ended up in the increased use of attributes as a way to categorize or define identities. This should not be surprising. Behind this use of attributes lays a fundamental mechanism—a way to represent a simple fact. And it’s the same mechanism that we use when we reason based on the rules of formal logic, which has been in practice forever, or when we represent a fact on a computer (think SQL). In fact, one of the greatest achievements of the early 20th century has been the formalization of logic (needed for mathematic foundation) and computation. This type of logical representation is core to everything we do, as reasoned thinkers and as computer scientists.
But in case you’re a few years removed from logic class, let’s examine this mechanism at work by looking at some very simple diagrams about what we are doing when we associate some attribute with a person or an object, such as assigning a person to a group:
Or assigning a subgroup to a group:
Each of these constructs can be summarized by the following diagram:
In this diagram, a fact can be asserted by the notation: subject-predicate-object. In predicate logic (AKA first order logic), it’s conventionally written as predicate(X,Y), where the variables X and Y could be themselves objects (references to entities) and/or values (arbitrarily “quoted” labels belonging to the initial vocabulary of our logic system). For instance, in our example above, the fact that “Jane is member of the product marketing group” can be written as memberOf(“Jane”,”Product Marketing”) and subGroupOf(“Product Marketing”,“Marketing”).
These kinds of predicates are called “binary” predicates and they are quite common. So if there are binary predicates, the astute reader (that’s you!) might well wonder if there are also unary predicates and, more generally, n-ary predicates. Indeed, the unary predicate exists and generally it’s used to assign a label to an entity—so if we want to say that Jane is an executive, you would write it as executive(“Jane”). As for the n-ary predicate, well here’s where you will find the usual “n-slots” notation of entities/tables as they’re used in the relational/SQL world. So we’d see something like this: age(“Jane”, “33”) or employee(“Jane”, “33”,”product marketing”).
Now, if you look at all those diagrams above, you’ll notice they have a direction, an orientation that tells us which entity plays the role of subject, since the object for a given predicate cannot generally be substituted. This translates into a given order for the different slots of a predicate; for example, in the notation age(“Jane”, “33”), the first slot—“Jane”—is for the person, and the second—“33”—is for her age. Of course, there are always exceptions where the slots are permutable, such as the “brother binary predicate,” where if x is a brother of y—brother(“x”,”y”)— then y is also a brother of x, which could read: brother(“y”,”x”)= brother(“x”,”y”). But in general order, orientation matters.
The diagrams above form directed graphs and the orientation is essential for preserving the semantics of this representation. After all, saying that x kills y—Kill(“x”,”y”)—is very different from saying that y kills x—Kill(“y”,”x”)!
read more →
Last week, I introduced my favorite topic—digital context—and laid out a plan for how to consider the case. Today, we’ll dive in with a real-world example, looking at how freeing context from across application silos helps us make more considered, immediate, and relevant access control decisions. For those of you who have been following along (and thanks for sticking with me in my madness), this is blog 8 in response to Ian Glazer’s provocative video on killing IAM in order to save it. And if you haven’t been with me from the beginning: I’m in favor of skipping the murder and going straight to the resurrection. Those of you who are coming in late to the game, here’s the recent introduction to context, or you can catch up with the entire story in order here: one, two, three, four, five, six, seven.
It All Starts with Groups: The Simple, Not Especially Sophisticated Solution
Let’s start first with the notion of groups and their implementation. On the surface, nothing could be more straightforward: If I have to manage a sizeable set of users and assign them different rights to applications, I need to categorize those users into groups with the same profile, whether that’s by function, role, need to know, hierarchy, or some other factor. This is the simplest approach to any categorization, creating some “relevant” labels, then assigning people that fit within those label to define groups.
So let’s say we’re creating groups based work functions, such as sales, marketing, production, and administration. All we need to do is list all the people under a particular function, create a label, and then assign this label to those people. Couldn’t be easier, right? The simplicity of the process explains the huge success of groups—and although we implementers tend to make fun of groups as crude categorizations, I would guesstimate that at least 90% of our authorization policies are still implemented through groups. (So much for all that talk about advanced fine-grained authorization! But I’m getting ahead of myself here…)
In fact, we’ve become so dependent on groups that in many cases, especially with sizeable organization where the business processes are quite refined and well managed, we’re seeing that there are often more groups than users! At first glance, this seems paradoxical—after all, what’s the point of regrouping people if you have more groups than people? But the joke is on us technical people because we ignored another key reality: the business one. Sure, we could have a lot of people, but generally a well-managed and productive organization can have more activities (or different aspects of a given activity) that require the multiplication of those groups. So we gave our users a simple mechanism to categorize people into groups, and they used it—talk about being a victim of our own success! 🙂
Basically, we played the sorcerer’s apprentice and our simple formula yielded a multiplication of groups, which quickly became un-manageable. So we went back to the formula and started to tweak it, creating groups inside groups, hierarchies of groups, and nested groups; introducing Boolean operations on groups; aggregating them into roles, and so on. So what we were just saying about groups being simple? Simple for whom? Simple for the group implementers—yes, definitely. Simple for a user in charge of the initial creation of the group—sure. But add any complexity into the mix and the chaos begins.
So Much for the Digital Revolution: Every Change, Managed Manually
From a computer’s point of view, the assignment of a user to a group is totally opaque—just an explicit list entered by the person in charge of creating the group. This explicit list contains no information about why or how a user is dispatched into or associated with a group. In short, the definition of membership rests with the group owner, which is fine on the face of it. But that excludes any automated assignment of a new member to the group without manual intervention of the group owner. That means every change must be entered by hand—imagine the complexity as people constantly change roles and shift responsibilities. And imagine how easy it would be for an overworked manager to miss removing the name of the person she just fired from just one of the groups he was part of. Now imagine the security risk if that guy’s still got access to sensitive files.
Without explicitly externalizing those rules, those policies, the administration of the system becomes tied to the group owners/creators. The effort of sub-categorizing with nested groups or introducing more flexible ways to combine groups by using Boolean operators just reveals the root of the problem: When you give users better ways to characterize their groups, you are forcing those users to either make explicit the formation rules of their groups—or continue to make every single change manually, even as those changes become more complex and unmanageable.
And that’s how we (re)discovered the value of attribute-based group definitions.
Welcome to the sixth post of my series in response to Ian Glazer’s video on killing IAM in order to save it (AKA my Russian novel on identity, security, and context :)). We’ve looked at a lot of the issues surrounding identity today, and if you’ve been following along, my perspective on how to solve these issues is probably pretty clear by now (and if you haven’t, you can catch up here: one, two, three, four, and five). We at Radiant feel strongly that you need to federate your identity layer, just as we’ve all been busy federating access. But what does that look like? Here’s how the VDS, our RadiantOne virtualization engine, implements a federated identity service—we’ll be focusing on the highlighted part of this diagram today:
The common abstraction, data modeling, and mapping serve as the “backplane” for the whole system. As we see in the leftmost section of the diagram above, the system starts with the existing identity sources and transforms this identity data through three higher-level services built on top of the VDS virtualization layer.
First, the “aggregation” service regroups the different identity sources side by side. In directory talk, this is like “mounting” each different identity source into its own separated branch. Here, we don’t try to merge the different identities, we regroup them under a common virtual root, while keeping each namespace separated. Metaphorically, it’s like people putting different objects (identity sources) into a common bag (the “virtual directory”). The main advantage of this structure is that it allows you to send a global search, from the root to the top of the tree, to find an identity that can be defined in one (or more) aggregated identity sources.
Then, the “correlation” service determines if there are any commonalities across those different identity sources. Beyond the local/specific identifier for a given identity, the service discovers any correlation based on attributes and rules that can disambiguate an entity—a person or object—from across the different identity sources and representations. From a logical point of view, the correlation service is used to create a union of the different identity sources. The end result is a new identification scheme, where each entity in the system is uniquely—and uniformly—identified by a “global identifier” from across all identity silos. This global identifier does not exist in isolation; it’s also tethered back to each specific local identifier, so we always know where every piece of information lives. After the correlation stage, the entire set of existing identity sources is regrouped into a global namespace, where each identity is totally disambiguated.
Finally, the “integration” service links identities with their attributes for a complete global profile. For any given entity that exists in more than one data source (which we determined via the union operation described above), we can now take advantage of the common link—or global identifier—to “join” different attributes that are specific to each source. Through this join operation, we obtain a global profile out of each local description.
read more →
I’ve been blogging in response to Ian Glazer’s video about killing IAM in order to save it (I’m in favor of saving, even if we don’t agree on the killing part). Get caught up on my earlier posts here, here, here, and here.
A Millions Paths to Everywhere: The Internal Authentication Challenge
As we discussed in my most recent post, authenticating users and securely communicating authorization information with a cloud application—or any web-based portal—requires a common endpoint acting as the enterprise IdP. We also know you’ll need to be able to access multiple cloud applications, such as Salesforce, Workday, and Google Apps, as your enterprise moves toward this model. And we’ve seen that you’ll need many token translations on a per-application basis. But this is only one part of the requirement.
Another key function to support is being able to authenticate an incoming user against multiple internal authentication sources. Think about all the legacy applications and identity stores deployed across your infrastructure, with their various authentication methods and protocols—they’re all over the map, right? First, you encounter the AD domains, and get lost in all those forests. The authentication method here could be name/UPN and password or based on Kerberos and Windows integrated authentication. But the user could also be stored in some SQL database with a proprietary hard-coded password encryption. Chasing the user across diverse forests and data stores and knowing which authentication method is appropriate for presenting and checking credentials is a full-time job—one that predates the challenge of cloud applications. In fact, the search for a common identity structure has been a primary headache for IAM for as nearly long as the category has existed. Multiple attempts have been made to solve these issues, from in-house script-and-sync to metadirectories to virtual directories. These new requirements for supporting the cloud have just made it more acute.
No matter what you call it (or how it works)—whether it’s an enterprise directory, metadirectory, or virtual directory—the logical mechanism you need is a federated form of identity. Why federated? Because you don’t want to reinvent this layer, which already exists in a highly distributed, heterogeneous way across your identity silos. Better to tap into what already exists, while giving your underlying data more scope and flexibility by bringing it—or a flexible representation of it—into an identity hub. Now, you could implement this hub in many different ways, but we believe that a virtualization layer, based on a global data model that rationalizes and reconciles the different local views, is the most effective way to do it. And in a world where you will connect to multiple applications using “federation” standards, you need to do more than just federate access via the SaML or OpenID Connect layer—you need to federate your identity layer, as well. And the way to get to a federated identity is through virtualization.
All Roads Lead to the Hub: The Need for a Common Attribute Server and Better Provisioning
Fast forward and think about the n different applications in the cloud you need to provision to, and it’s like déjà vu all over again…with even more complications. So again, as both a final authoritative source of your identity and as an attribute server, you will need a rationalized view of your identity, a federated Identity system.
Next time we will look at the architecture of this federated identity, or “FID,” and then we’ll end this series with a deep dive into my favorite topic—context—along with graph and protocols support.