In my last blogpost (was it really more than two months ago—I plead CEO-overload!), we looked at how SQL and data integration are essential to the development of a truly useful customer profile. At the end, I promised to step through the process of nurturing relationships, where we guide prospects and customers through each stage, sharing and collecting information in a step-wise cadence. So here goes—and note that I’m using the vocabulary and categorizations from Salesforce, one of the main customer relationship management apps on the market:
- First, a set of information is collected from an interested party—also known as a lead—and further information is sent to match the needs of that lead.
- After that, the lead is qualified as a prospect, and the sales rep conducts further qualification discussions to move that prospect to the next stage of the pipeline.
- At this point, enough information is known on the needs of the prospect to determine if an opportunity for a sale exists. If yes, the sales rep takes the final qualification step by negotiating the terms of a deal.
- When (and if) a deal is struck, that opportunity becomes a customer.
What we can see in this nurturing process, as in most business processes or complex transactions, is that the whole operation is built around a series of steps, or a business workflow. At each step, specific information is gathered and you move to the next steps only when the information requirement of the current step is fulfilled, as we see below:
What I am describing here is obvious at the business level—or “conceptual level” in the parlance of the data-modeling world. However, when it comes to the details of low-level implementation at the data structure or database level, things are not so cleanly delineated and as a result, currently deployed solutions are far from optimal. So let’s revisit this pattern as it applies to the integration of a user profile at the level of SQL.
read more →
Last week, we took a look at the challenges faced by “traditional IAM” vendors as they try to move into the customer identity space. Such vendors offer web access management and federation packages that are optimized for LDAP/AD and aimed at employees. Now we should contrast that with the new players in this realm and explore how they’re shaping the debate—and growing the market.
Beyond Security with the New IAM Contenders: Leveraging Registration to Build a More Complete Customer Profile
So let’s review the value proposition of the two companies that have brought us this new focus on customer identity: Gigya and Janrain. For these newcomers, the value is not only about delivering security for access or a better user experience through registration. They’re also aimed at leveraging that registration process to collect data for a complete customer profile, moving from a narrow security focus to a broader marketing/sales focus—and this has some consequences for the identity infrastructure and services needed to support these kind of operations.
For these new contenders, security is a starting point to serve better customer knowledge, more complete profiles, and the entire marketing and sales lifecycle. So in their case it is not only about accessing or recording customer identities, it’s about integrating and interfacing this information into the rest of the marketing value chain, using applications such as Marketo and others to build a complete profile. So one of the key values here is about collecting and integrating customer identity data with the rest of the marketing/sales activities.
At the low level of storage and data integration, that means the best platform for accomplishing this would be SQL—or better yet, a higher-level “join” service that’s abstracted or virtual, as in the diagram below. It makes sense that you’d need some sort of glue engine to join identities with the multiple attributes that are siloed across the different processes of your organization. And we know that LDAP directories alone, without some sort of integration mechanism, are not equipped for that. In fact, Gigya, the more “pure play” in this space, doesn’t even use LDAP directories; instead, they store everything in a relational database because SQL is the engine for joining.
So if we look at the customer identity market through this lens of SQL and the join operation, I see a couple of hard truths for the traditional IAM folks:
- First, if we’re talking about using current IAM packages in the security field for managing customer access, performance and scalability are an issue due to the “impedance” problem. Sure, your IAM package “supports” SQL but it’s optimized for LDAP, so unless you migrate—or virtualize—your customers’ identity from SQL to LDAP in the large volumes that are characteristic of this market, you’ll have problems with the scalability and stability of your solution. (And this does not begin to cover the need for flexibility or ease of integration with your existing applications and processes dealing with customers).
- And second, if you are looking at leveraging the customer registration process as a first step to build a complete profile, your challenge is more in data/service integration than anything else. In that case, I don’t see where there’s a play for “traditional WAM” or “federation” vendors that stick to an LDAP model, because no one except those equipped with an “unbound” imagination would use LDAP as an engine for integration and joining… 🙂
The Nature of Nurturing: An Object Lesson in Progressive, Contextual Disclosure
Before we give up all hope on directories (or at least on hierarchies, graphs, and LDAP), let’s step beyond the security world for a second and look at the marketing process of nurturing prospect and customer relationships. Within this discipline, a company deals with prospects and customers in a progressive way, guiding them through each stage of the process in a series of steps and disclosing the right amount of information within the right context. And of course, it’s natural that such a process could begin with the registration of a user.
We’ll step through this process in my next post, so be sure to check back for more on this topic…
Current Web Access Management Solutions Will Work for the Customer Identity Market—If We Solve the Integration Challenge
I find it ironic that within the realm of IAM/WAM, we’re only now discovering the world of customer identity, when the need for securing customer identity has existed since the first business transactions began happening on the Internet. After all, the e-commerce juggernauts from Amazon to eBay and beyond have figured out the nuances of customer registration, streamlined logons, secure transactions, and smart shopping carts which personalize the experience, remembering everything you’ve searched and shopped for, in order to serve up even more targeted options at the moment of purchase.
It reminds me of a parable from a classic book on investing*: Imagine a Wall Street insider at the Battery in New York, pointing out all the yachts that belong to notorious investment bankers, brokers, and hedge fund managers. After watching for a while, one lone voice pipes up and asks: “That’s great—but where are the customers’ yachts?”
Could this new focus on “customer identity” be an attempt by IAM/packaged WAM vendors to push their solution toward what they believe is a new market? Let’s take a look at what would justify their bets in the growing customer identity space.
Customer Identity: The Case for the WAM Vendors
The move to digitization is unstoppable for many companies and sectors of the economy, opening opportunities for WAM vendors to go beyond the enterprise employee base. As traditional brick and mortar companies move to a new digitized distribution model based on ecommerce, they’re looking for ways to reach customers without pushing IT resources into areas where they have no expertise.
While there are many large ecommerce sites that have “grown their own” when it comes to security, a large part of this growing demand will not have the depth and experience of the larger Internet “properties.” So a packaged solution for security makes a lot of sense, with less expense and lower risks. And certainly, the experience of enterprise WAM/federation vendors, with multiple packaged solutions to address the identity lifecycle, could be transferred to this new market with success. However, such a transition will need to address a key challenge at the level of the identity infrastructure.
The Dilemma for WAM Vendors: Directory-Optimized Solutions in a World of SQL
As we know, the current IAM/WAM stack is tightly tied to LDAP and Active Directory—these largely employee-based data stores are bolted into the DNA of our discipline, and, in the case of AD, offer an authoritative list of employees that’s at the center of the local network. This becomes an issue when we look at where the bulk of customer identities and attributes are stored: in a SQL database.
So if SQL databases and APIs are the way to access customer identities, we should ask ourselves if the current stack of WAM/federation solutions, built on LDAP/AD to target employees, would work well as well with customers. Otherwise, we’re just selling new clothes to the emperor—and this new gear is just as invisible as those customers’ yachts.
Stay tuned over the next few weeks as I dive deeper into this topic—and suggest solutions that will help IAM vendors play in the increasingly vital world of customer identity data services.
*Check out “Where Are the Customers’ Yachts: or A Good Hard Look at Wall Street” by Fred Schwed. A great read—and it’s even funny!
How a Federated ID Hub Helps You Secure Your Data and Better Serve Your Customers
Welcome back to my series on bringing identity back to IAM. Today we’re going to take a brief look at what we’ve covered so far, then surf the future of our industry, as we move beyond access to the world of relationships, where “identity management” will help us not only secure but also know our users better—and meet their needs with context-driven services.
We began by looking at how the wave of cloud services adoption is leading to a push for federation—using SAML or OpenID Connect as the technology for delivering cloud SSO. But as I stressed in this post, for most medium-to large-enterprises, deploying SAML will require more than just federating access. By federating and delegating the authentication from the cloud provider to the enterprise, your organization must act as an Identity provider (IdP)—and that’s a formidable challenge for many companies dealing with a diverse array of distributed identity stores, from AD and legacy LDAP to SQL and web services.
It’s becoming clear that you must federate your identity layer, as well. Handling all these cloud service authentication requests in a heterogeneous and distributed environment means you’ll have to invest some effort into aggregating identities and rationalizing your identity infrastructure. Now you could always create some point solution for a narrow set of sources, building what our old friend Mark Diodati called an “identity bridge.” But how many how of these ad hoc bridges can you build without a systematic approach to federating your identity? Do you really want to add yet another brittle layer to an already fragmented identity infrastructure, simply for the sake of expediency? Or do you want to seriously rationalize your infrastructure instead, making it more fluid and less fragile? If so, think hub instead of bridge.
Beyond the Identity Bridge: A Federated Identity Hub for SSO and Authorization
This identity hub gives you a federated identity system where identity is normalized—and your existing infrastructure is respected. Such a system offers the efficiency of a “logical center” without the drawbacks of inflexible modeling and centralization that we saw with, say, the metadirectory. In my last post, we looked at how the normalization process requires require some form of identity correlation that can link global IDs to local IDs, tying everything together without having to modify existing identifiers in each source. Such a hub is key for SSO, authorization, and attribute provisioning. But that’s not all the hub gives you—it’s also way to get and stay ahead of the curve, evolving your identity to meet new challenges and opportunities.
The Future’s Built In: The Hub as Application Integration Point and Much More
Another huge advantage of federating your identity? Now that you can tie back the global ID to all those local representations, the hub can act as a key integration point for all your applications. Knowing who’s who across different applications allows you to bring together all the specific aspects of a person that have been collected by those applications. So while it begins as a tool for authentication, the hub can also aggregate attributes about a given person or entity from across applications. So yes, the first win beyond authentication is also in the security space: those rich attributes are key for fine-grained authorization. But security is not our only goal. I would contend that this federated identity system is also your master identity table—yes, read CDI and MDM—which is essential for application integration. And if you follow this track to its logical conclusion, you will move toward the promised land of context-aware applications and semantic representations. I’ve covered this topic extensively, so rather than repeat myself, I will point you to this series of posts I did last spring—think of it as Michel’s Little Red Book on Context… 😉
- First we introduced the topic of Context as the Next Frontier of Your Digital Identity.
- Then we went From Groups to Roles to Context, looking at the Emergence of Attributes in Authorization.
- Then we explored Attributes, Predicates, and Sentences as the Building Blocks of Context.
- And finally, we achieved Valhalla: Man and Machine, Speaking the Same Language.
So the way we see it here at Radiant, the emergence of the hub puts you on the path toward better data management and down the road to the shining Eldorado of semantic integration, where your structured and unstructured data comes together to serve you better. But you don’t have to wait for that great day to realize a return—your investment starts to pay off right away as you secure your devices and cloud services.
read more →
In my last post on digital context, we took a trip back to logic class, looking at how we could begin to describe our world using “sentences” based on first order logic. This essential “predicate semantics” is the foundation of all mathematics, and hence, computing. In fact, it’s the basis for our most key data storage mechanisms (think SQL). With so much of structured information already encoded in this predicate representation, this gives us an excellent foundation for more semantically-driven contextual computing.
Let’s Begin at the Beginning: What is Context, Anyway?
According to my Webster’s, the word “context” comes from the Latin ”contextus,” which means a joining or weaving together. There are a couple of different types of context:
- There’s context as represented through language, or “the parts of a sentence, paragraph, or discourse immediately next to or surrounding a specified word or passage and determining its exact meaning (e.g., to quote a remark out of context)”.
- And there’s the context we glean through perceptions, meaning “the whole situation, background, or environment relevant to a particular event, personality, creation, etc…”
It’s this second aspect, the perceptual side, which most would agree upon as the meaning of context. Using this definition, our animal friends are “context-aware” up to some level, able to “read” a situation and act accordingly. But we also have the first aspect, language, which allows us to describe the world in sentences, sharing contextual information. So context can be represented by a set of related sentences about a given subject—that’s our “parts of a…discourse immediately next to or surrounding a specified word.” And what makes this especially interesting from my perspective, which begins in the narrow field of security, a “security context” is a set of facts about a given “subject” represented by attributes and relations between entities. As such, a security context can be represented as a subset of first order logic—or by sentences in a limited, constrained form of English.
So if you can find a way to extract information for a given subject from a structured system and represent it as sentences then you are, in fact, extracting the underlying “application” context for this subject. And—drumroll, please—that’s just what we’ve done! Basically, we’ve returned to first principles here at Radiant, devising a “contextual and computational language” method to reverse engineer metadata from an application and represent it in a way that’s as easy to interpret at the human level as it is to execute at the machine level.
Now, this wasn’t my idea alone—if you follow the developments around the semantic web, you know that the idea to semantically encode the web (HTML/text) so that our machines can more meaningfully interpret our descriptions and intentions is based on this same foundation. But standards such as RDF and OWL depend on adoption, which cannot be controlled and is currently confined to a minuscule part of the web. On top of that, they have a different purpose. While they are tagging text the same way than we do—object attribute/verb value or other object—their objective is for machine to be able to interpret these tags. Our goal is bigger: we want to create sentences that are readable by both man and machine. So unless you can read the XML that’s behind RDF as if it were your own language, why not speak in plain English instead, rather than working at the interface level and supporting RDF at the generation phase? But we’ll get to that part a little later on…
From Database Standards to Semantics: Making Structured Data Searchable Across Silos
There’s no single data standard representation in our enterprises—you have vital data stored across SQL databases, LDAP directories, XML, web services/REST, and more. While useful on their own, this “Babel tower” of protocols, data representations, and abstractions makes it difficult, if not impossible, to connect the information across different application kingdoms. Why is this so important? Because each silo offers plenty of powerful contextual richness that we can leverage well beyond the scope of that application.
This is essential because even in the very specialized scope of security, you can’t adequately protect a system of applications if you don’t have a clear picture of what’s really enforced at the level of each application, and how all your applications are interrelated. This is why, despite lots of tools for creating roles and policies, progress in authorization has been extremely slow. The challenge is not just in knowing what you want to enforce—that’s the easy part—you must first understand what exists and what is really enforceable, both at the level of a single application and across a complicated process made up of multiple applications. For instance, when I talk to people in the banking sector about their compliance efforts, what I hear is that it’s not only about defining what they want to enforce, it’s about understanding what they have in the first place.
Context is also vital because this structured data is so valuable. It represents perhaps only 10% of the data in the world, but 90% of the value that we derive from automation. Without structured data, automation would be extremely limited, and the productivity that we derive from automation would evaporate. So wouldn’t it be great if we could understand at the layman’s layer what exists in an application (beyond just forms and interface), and link it to the rest of the infrastructure?
Think about what HTML did for text and other unstructured data on the web, making it searchable, discoverable, and so much more useful. Now imagine your structured data, all that incredible process-driven information and context trapped in application silos. What if we could read all that information, link all that data, and free all those contextual relationships that exist between silos? After all, it’s not only the facts, it’s the links between facts that build up a context. Go back to the etymology we discussed above: “context” is from the Latin contextus and it means the joining, the weaving together.
Again, these ideas are not mine alone—there’s a whole discipline within the semantic web dealing with “linked data,” based on how you could link information once it’s tagged under the form of RDF, which means subject-verb-object or subject-attribute-value. (See my last post for an in-depth look at this.)
read more →
I know I’ve been the Old Man of Novato, ranting about context all these years, but the market, the industry, and—most importantly—the technology are finally evolving toward this direction. For the longest time, it was just me and the usual suspects in academia and elsewhere, muttering in our corners about the Semantic Web, but now we’re hearing about context-aware computing from every direction. While I’ve refined a set of slides on context that I’ve delivered to groups large and small over the years, along with a demo of our Context Browser technology, now seems like a great time to put everything I know down in writing.
Although my French heritage and Math background prefer to start from theory and illustrate through examples, my newly American pragmatic tinkerer side is planning to do a quick roadmap here, then look at examples from our existing systems and, through them, make the theoretical case. It’ll take a few posts to get there, but then, I’ve really been enjoying blogging lately, as my manifesto in response to Ian Glazer will testify. Read it from the beginning, if you’d like a peek into my recent madness: one, two, three, four, five, six.
Context Matters: Where We’re At, Where We’re Headed
We’ve already seen the word creeping into marketing materials, but one of these days—okay, maybe months or years—it’s going to be more than a promise: digital context will be everything. As we get closer to digitalizing our entire lives, we’re also moving toward a context-aware computing world. Now, when we’ve talked about context-aware computing so far, it has seemed like one of those woolly concepts straight from a hyper-caffeinated analyst’s brain (or an over-promising marketer’s pen). But the truth is, any sizeable application that’s not somehow context-aware is pretty useless or poorly designed.
Sure, there are pieces of code or programs that exist to provide some transition between observable states and, as such, are “stateless.” And I know that on the geeking edge, it’s trendy to talk about stateless systems, which are an important part of the whole picture. In reality, however, the world needs to record all kinds of states, because a stateless world also means a world without any form of memory—no past, present, or future. So it’s not like most of our programs and applications are not context-aware. They are, and most of the time they’re pretty good at managing their own context.
The problem is that we move from context to context, and in the digital world this means that unless those programs, those agents, those devices share their context, we are facing a stop-and-go experience where the loss of context can be as annoying—or as dangerous—as an interrupted or broken service. The lack of context integration can mean a bad user experience—or a dead patient due to a wrong medication. In a world where actions and automated decisions can be taken in a split-second, this absence of context integration is a huge challenge. Nowhere is the issue is more acute than in security, in authentication and authorization.
read more →
I’ve been blogging on Gartner analyst Ian Glazer’s recent video on whether we need to kill IdM in order to save it. It seems appropriate to begin these with words of Martin Kuppinger, in response to the Glazer video and my first post on this topic:
“…. I believe in approaches that build on existing investments. IAM has to change, no doubt about that. But there will still be a lot of “old school” IAM together with the “new school” parts. Time and time again it has been proven that change without a migration path is an invitation to disaster. Embrace and extend is the classical migration methodology for classical technical transformative strategies.”
Anyone who’s been following along understands that Martin echoes my perspective here—I am all about taking what still works and making it work better. So before we dive into the global picture about what is dysfunctional with current IdM deployments amid the clamor for access (SSO and cloud), access rights (authorization), and governance, I want to share some final thoughts on identity representation and storage.
SQL and LDAP: Strengths and Weaknesses
As I mentioned in my most recent post, if we’re focusing only on finding the most flexible, non-redundant, easy-to-maintain system to represent and record identity (and information in general), it’s tough to beat the relational model. And as a result, the majority of our applications are based on SQL, which means the largest chunk of key identity information is also managed by SQL.
While it’s one thing to accurately capture information, however, querying and retrieving this data is another story. Fast lookups—especially those involving some form of contextual navigation and link-hopping—require a lot of “joins.” And joins are a very slow and expensive operation with SQL, because each join means a “multiplication of links” between objects. So after two or three hops, you have exponential growth of your database operations—and even on the fastest computer in the world, an open and potentially infinite number of operations will take forever.
The key problem with navigating hierarchies or graphs in SQL is that the paths are generated on the fly—and this is precisely where hierarchical or a graph database is strongest. In such tools, the links are hard coded and stored. Because the paths are pre-defined through indexation, you can always pick your starting point and navigate quickly, thanks to these pre-established paths.
And here we have the dilemma: on one hand, we have a system that reflects reality in a faithful, non-redundant way, not only capturing the information about a given object, but also reflecting the relationships around that object. And it can also generate all the paths to and from that object, whether in a graph or a tree. But if there are a lot of relationships—a lot of context surrounding that object—those queries will be slow and, in some cases, with a response time that’s difficult to determine. And that doesn’t work in a world where users expect immediate access to resources.
Or you have a system that’s easy to navigate, once you get the data in—but far from optimal when it comes to write and update. Redundancy is one big challenge, and being able to flexibly modify the structure—deleting or updating a link—is another.
- What makes hierarchies, along with graphs, so important (think contextual relationships).
- How SQL databases can capture information without redundancy, but still need a materialization layer (think cache) to deliver contextual views at the speeds required for security and context-aware applications (think mobile).
- Why virtualizing both identity and context is key to breathing new life into IdM, while respecting your current identity investments (think Radiant).
These can be regrouped into two topics: The first is identity representation and how we should record it, which is a problem of structure and storage. What is the ideal directory structure and storage for identity and how could we better integrate this life cycle with the larger field of data management? Or, as Ian stresses, how could IdM be a part of IT, instead of apart from it? The second is architecture and real life deployment or how in practice we could and should manage the identity lifecycle in a world of silos. I’d argue that the current fragmented identity infrastructure landscape requires a more rational solution, one that would involve what we at Radiant call a federated identity service, based around a virtualization layer for identity and context.
Today, we’ll be considering the structure and storage side of this equation: how to build the most modern, flexible, and future-oriented way of representing, storing, and sharing essential identity information.
The Rise (and Fall) of the Comma: On SQL, LDAP, and Graphs
As I mentioned in my previous post, although Ian wants to “kill the comma”—and don’t get me wrong, I agree with most of his critiques of the system—I believe there’s more life left in LDAP. To make my case, let’s take a trip back in time, because those who do not know their history are doomed to repeat yesterday’s mistakes—and miss yesterday’s lessons.
Now, we all know that in terms of programming languages, there’s been much evolution since the sixties/seventies, including C, C++, logic programming, functional programming, Java, aspects programming, Erlang, Scala, and many more, with each language offering its own strengths, specializations, and limitations. But we have not seen the same exuberant dynamism in the world of databases. In the earliest days, the hierarchical systems emerged, then some forms of the network database, but from the seventies on, SQL has dominated the landscape. Sure, there were some attempts at object-oriented databases in the late eighties and early nineties, but these were specialized niche products that never cracked the mainstream.
Given this history, we know that around the year 2000 or so, SQL remained the workhorse, the predominant database system in enterprises everywhere. So why did we see the emergence of LDAP, itself a simplification of x500, a kind of “object-oriented” hierarchical database? Why did a bunch of very smart people sitting on a standards committee commissioned by many organizations—including the largest telcos in the world—bypass SQL standards to implement a look-up mechanism based on a hard-coded hierarchy? Why did they create a distributed hierarchical database for email addresses and other information about people (including pictures and preferences—hello, Facebook!), which later evolved into a repository for PKI and source of authentication for internal networks?
Could it be that the hierarchy had some relevance that they weren’t getting from the relational model alone? I believe the much-maligned comma, or rather the DNs (and don’t you love the weird jargon from the X500 world?) still has something to tell us, even now, as identity environments grow increasingly fragmented and demand is going mobile and reaching into the cloud. So let’s look at why, where, and how LDAP is still very relevant for today’s identity infrastructures. But first we need a detour through SQL.
A Thought Experiment: Trees Inside of Tables
Imagine that we could implement a directory model—a complete LDAP schema and directory tree—in a set of SQL tables following best data management practices. Such an implementation would offer several major benefits:
- The information kept in the system would be extremely flexible. You could modify and generate as many directory trees as you need from your LDAP schema. You could also generate all the graphs you need from these schemas, including graphs containing loops, which are no longer strictly trees.
- The information in your system would be non-redundant and easy to maintain, following the normalization rules of the relational model. This is a key advantage of the relational system and one of the biggest weaknesses of the hierarchical and graph/network models.
- Your identity would be managed like all other essential business data. Basing your identity infrastructure on SQL technology also means that your IdM is now “a part” of classic data management, and your DB team is working in their most comfortable environment.
Given these known advantages of SQL, why did our committee from above decide to implement the X500 data model (and hence LDAP) as a specialized hierarchical “pseudo object-oriented“ database? Because of one of the prime advantages of the hierarchical model: fast look-up (or “queries”) and fast attribute value-based search. (And here’s where the link with Hadoop, HDFS, and other big data “NO SQL” implementations will become apparent for the advanced reader.)
The Dark Secret of SQL: It Takes Forever to Navigate Complex Hierarchies and Graphs
You see, the best system to record and update data supporting the so-called ACID properties is also terrible at answering very complex queries. The beauty of the relational model is that it can capture any relationship (whether in graphs or trees) and organize data so it’s captured only once and the updates are consistent, with any change in the real world reflected faithfully in the database. Once recorded in SQL, any graph, hierarchy, or relation between entities and data can be re-generated. As a result, there is no question that the relational system cannot answer. But there is also an important hidden cost: While you always get an answer to your SQL query, receiving that answer can take an incredibly long time. Isn’t it ironic that the world’s leading provider of SQL databases is named “Oracle”? Because, just as in Delphi, when it comes to graphs, hierarchies, or rich contextual data, your questions will always be answered—at some stage, when the fates are ready to tell you.
read more →