API MANAGEMENT AND PERIMETER SECURITY FOR COTS APPLICATIONS

This post was originally published as “API Management and Perimeter Security for COTS Applications” on the Levvel Blog.

Kronos provides a suite of workforce management solutions, including a mobile application for activities like clocking in and out of work, staff scheduling for shift management, and vacation requests. Our client previously deployed the Kronos Workforce Central server software to handle time tracking for hourly employees and other employee-related activities. Recently, the client decided to deploy the Kronos Workforce Mobile application to give managers access to shift and schedule data and approval functionality without having to be in front of a corporate workstation. The Kronos mobile app was ready to solve an important business problem, but an early architecture and security review of the proposed system uncovered an issue: perimeter security.

Most security integration engagements in the mobile space that I’ve been involved with have a custom client in the form of a native mobile application. If any commercial-off-the-shelf (COTS) solutions are used, the COTS component(s) are typically on the server-side. This project was somewhat unique in that both the mobile app (API consumer) and Workforce Central app servers (API provider) were part of a COTS solution — from the same vendor.

Most of the time, enterprise integration projects have COTS components from multiple vendors and the security integration mechanisms must map the front-end security model to the back-end security model. The goal is to have well-defined, flexible front-end security model and a flexible back-end security model that supports a variety of potential situations. This typically involves token acquisition, mediation, credential mapping, and caching some information (not the original password or similarly sensitive information, however) about these to facilitate efficient transformation between the security models. In this case, the Kronos mobile app has its own end-to-end security mechanism, but we had to layer on an additional piece to satisfy the client’s information security standards.

The original Kronos Workforce Central deployment was purely an internal system that communicated with punch clocks at various company sites and provided a web application for administrators. When the mobile initiative came along, this internal system now had to be exposed to the Internet so that mobile devices could hit the API with which the mobile app interacts. Exposing the system to the Internet could be easily done using a reverse web proxy such as F5 APM to move the traffic into the internal network where the app servers are deployed. However, this particular organization is a large, security-conscious Fortune 100 organization whose Information Security department maintains a standing requirement for all Internet-facing systems. It states that no unauthenticated traffic is allowed into the internal corporate network. InfoSec needs the perimeter network (the DMZ) to pre-authenticate the user before any traffic enters the internal network segments. This provides the ability to enforce various access controls for users at the entry point of each network segment.

This problem had several possible solutions, including the following:

Move the application server infrastructure to a DMZ. The client didn’t have the infrastructure to support moving Kronos infrastructure to the DMZ, however. It wasn’t a pattern that had been used before.
Deploy a reverse proxy or application security gateway technology such as F5 APM or TAMeb/iSAM in the DMZ. None of these technologies were available or had limited abilities to implement custom security integrations. Moreover, breaking the out-of-the-box UI is always a bad idea for COTS applications, especially when they are mobile applications that are built often. The last thing you want is a company-specific version leading to all sorts of support and backwards compatibility issues.
Grant a security exception and let the initial traffic enter the corporate network unauthenticated. I reject this option on principal; so did they.
Use a Service or API Gateway such as CA’s API Management solution, Apigee Edge, or WebSphere DataPower/IBM API Connect to integrate the proprietary API Security model that exists between the Kronos Mobile App and Kronos Workforce Central server with a standard enterprise security model that can provide the perimeter security required. This is the approach that was taken.

The Integration and API Management layers this organization has deployed are shown in the diagram below. Each actor in the processing pipeline is required to authenticate all traffic that comes into it. The standard enterprise API security model uses JSON Web Tokens (JWT) to describe the end-user or system actor (it could be either, depending on the requirements) to propagate identity and address authentication at each layer. Authorization decisions for API requests are performed at the API Gateway in the cloud and/or at the internal ESB. All API traffic uses TLS v1.2.

The client had created a well-defined enterprise API Security model during the initial API Management project, one designed with flexibility and integration with COTS, third-parties, and SaaS solutions in mind. Plus, they were already an Apigee Edge Public Cloud and WebSphere DataPower customer. Apigee Edge Public Cloud acts as a cloud-hosted API Gateway that provides a common security model, statistics gathering, monitoring, common error handling, along with other benefits. WebSphere DataPower is acting as the traffic ingress point for API traffic to the internal network. Frankly, it is playing a similar role to what Apigee plays out in the cloud, but it is protecting the corporate datacenter.

Recent versions of Kronos support Azure Active Directory logins. This organization is not yet on the latest and greatest; moreover, given the previous discussion about where the Kronos application servers are located on the network, this feature doesn’t get us any closer to the perimeter security requirement.

The Kronos Mobile application API traffic exchanges XML messages with the Workforce Central application server. This API is called the Workforce Central XML API. It uses a proprietary login mechanism that puts the username and password in the XML message body of an API call. The Kronos Workforce Central application server can validate those credentials via an LDAP bind against an LDAP server or Active Directory; this organization does LDAP binds against Active Directory. After authentication and authorization of the mobile user’s session, a JSESSIONID cookie is returned to the mobile app. All subsequent API calls use this JSESSIONID cookie to track the security session.

The existing OAuth2-based Enterprise Security Model uses Azure Active Directory as the Identity Provider (IdP). All features of the OAuth2 spec that are supported by Azure Active Directory are available to us. Since this organization doesn’t replicate user passwords from the internal Active Directory (AD) to Azure Active Directory (AAD), it was necessary to use another federation server that does have access to the internal Active Directory domain and is federated with Azure Active Directory. The organization uses Active Directory Federation Server v2.0 as its on-premise federation solution for Web Application SSO and WS-Trust Security Token Service.

The set of steps involved in authenticating an end user are:

Mobile app collects the end user credentials and sends them to the Kronos server via API gateway.
API Gateway intercepts the call, extracts the username and password, and in turn makes a ws-trust authentication request to the internal federation server. Now, the end user is authenticated at the gateway and no traffic has traversed the corporate network.
The Security Gateway and ESB validate the JWT token that was included (by Apigee) in the API request’s Authorization header to authenticate each request.
The call is let through to the Kronos server with the end user info, where it does the LDAP based login on the backend. (As far as the Kronos flow goes, we did not change a thing.)

Given this set of technologies, we had to authenticate the user via a WS-Trust call to ADFS to obtain a SAML2 bearer token using the credentials extracted from the login API call. That SAML2 assertion was then placed into an OAuth2 call — as defined in the “Security Assertion Markup Language (SAML) 2.0 Profile for OAuth 2.0 Client Authentication and Authorization Grants” specification — to Azure Active Directory to obtain an OAuth2 access token (JWT) that could be used to interact with the other layers of the infrastructure.

Admittedly, this was more complex than it could’ve been given a different configuration; if the password hashes had been replicated to Azure Active Directory, then a resource authorization grant call could have been made directly to AAD. There are security implications and other considerations to putting your organization’s password hashes in AAD or any cloud-hosted, third-party IdP, however. The resource authorization grant is the appropriate choice in such a situation because it is a system component making these calls and this grant provides an non-interactive login. But, we were required to go the circuitous route to get the access token we needed to route API traffic through the rest of the infrastructure.

We decided to implement the perimeter security using the Apigee Edge API Gateway in the public cloud because the distributed cache it provides between its nodes (Message Processors, or MPs) allowed the solution to work when requests were routed to different MPs. The Apigee Edge caching facility was used to create an authentication cache that kept track of the OAuth2 access token that came from the OAuth2 call during the initial login. The cache key for the authentication cache was the JSESSIONID cookie value that the Kronos App server assigned to the mobile user’s session. During response processing of the initial login request, the API Gateway will have access to the JSESSIONID value and the access token that must be included in the request for downstream processing at the Security Gateway and ESB. The security session on the Kronos application server is only valid for five minutes, so the Apigee Edge authentication cache was also configured with a 300 second expiration timeout. Whether Apigee Edge or the Kronos App Server (or another layer) returns a 401 Not Authenticated error, the mobile app responds the same way, forcing the client to enter credentials again.

Once a JWT token has been obtained that describes the authenticated user and has the proper audience for the Kronos API, the role information (relevant to the Kronos application) is available to the authorization processing mechanism used on the API Gateway. If the user attempting to login does not have the proper role information, the API Gateway rejects the login attempt. Normally, Apigee Edge would return a 403 Not Authorized error, but analyzing the behavior of the Kronos API, we found that the mobile app client did not react properly to a 403 error or other return codes it wasn’t expecting. So, we changed all error responses generated by Apigee to be a 500 Internal Server Error.

Once the request has been properly authenticated and authorized at the API Gateway layer, the request is sent to the corporate data center. The Security Gateway in the DMZ uses transport layer security with client authentication and only allows the certificate belonging to the API Gateway to connect. This satisfies the perimeter security requirement.

By routing the API traffic through Apigee Edge, the standard statistics logging/reporting, monitoring, error handling, and other features that the API Gateway provides are available to the Kronos API traffic.

The technologies described in this post can be used to address a wide range of security integration needs. API Management solutions like Apigee Edge are a versatile tool that, when combined with the integration expertise and knowledge of information security practices, can produce elegant, low-cost solutions.

Do you have a custom or COTS mobile application that must be integrated with various backend components, each having their own security concerns that are not compatible out of the box? Contact us.