From Legacy to Modern – From Ops to DevOps

Our sense of time speeds up as we age. So, as we get older, we perceive time to pass faster. While this experience of time is intriguing on the face of it, science has deciphered this and attributes it to the general decline in the amount of new perceptual information absorbed as we age. In the presence of more stimuli and lots of new information as during childhood, our brains take longer to process those and the period of time feels longer. With most adults stuck in a routine, the more familiar we become with day-to-day life experiences, the faster time seems to pass. If you think you have trouble keeping up now, buckle up as the world is about to change even faster with the pace of innovation and disruption accelerating.

Consulting firm Protiviti reports that C-level executives are most concerned, rightly so as I explain below, about digital disruption and their readiness to compete against born-digital organizations. In the past centuries, industries were disrupted by those who made products smaller, faster, better or cheaper through innovations in hardware components, supply chain sourcing and manufacturing processes, superior craftsmanship and any geopolitical advantage in securing raw materials. Over the decades, innovations have predominantly shifted from hardware to software. Today, industries are being disrupted by a new force, software. In the WSJ essay in 2011, Marc Andreessen proclaimed that software is eating the world and noted how just about every industry is upended by companies which operate as a software company at the core.

I was invited to be a speaker at the DevOps and Test Automation Summit held in Dallas, Texas earlier this month (February 2019) and rest of this blog article is a transcript of my talk on the topic of “From Legacy to Modern – From Ops to DevOps.”

Presentation Title 2

I’m a Director of Product Management focused on building Cloud-Native service offerings, of which DevOps is a key component. My talk was a lot focused on painting a picture on how DevOps journey could look like for an organization, rather than prescribing tools as DevOps is not an one-size-fits-all-approach and tools are to be chosen by individual teams based on their pain points, critical needs, application environment and transformational goals.

2018 – Year of Enterprise DevOps

Before we talk about the DevOps journey, let’s evaluate the extent of DevOps adoption in the market, and also why organizations embark on this journey. This could probably help you make a strong case on why you need to do DevOps at a larger scale within your organization.

Source: https://www.statista.com/statistics/673505/worldwide-software-development-survey-devops-adoption/

Adoption stats from Statista indicate that DevOps has gone mainstream with 50% having implemented DevOps. If we take a closer look, we see that 30% of organizations have almost fully adopted it, while 20% have atleast a few teams fully immersed in it. Given the level of adoption, analysts like Forrester have referred to 2018 as the “year of enterprise DevOps”.

Key Pillars for Digital Transformation

Why do organizations embrace DevOps? Let’s take a step back and look at the bigger picture.

No industry is immune to disruption. New software-driven products and experiences are impacting virtually every industry, with the well-known ones being Amazon in retail, Netflix in cable TV and movies, Spotify in music, Airbnb in short-term rental and Uber in ride-sharing. So, organizations across industry verticals see digital business initiatives as critical to their success.

digital transformation 2

Businesses want to develop digital infrastructure to enable easy, personalized and 24/7 access to information and data. They want to achieve more with less through improved operational efficiency and developer productivity. They want business agility to reduce time to value, and better compete with digital natives. While businesses are rapidly innovating and driving growth and competitive advantage through new technologies and practices, they’re cognizant of an increasingly complex risk landscape and want to maintain the precarious balance between protecting against cybersecurity threats and business demand for speed. To achieve this level of digital transformation and customer centricity, every business must become a software company in the App Economy, and Speed of Application Development and Delivery is the new digital business imperative.

DevOps enables Cloud-Native applications

In the previous section, we saw that Software is driving today’s innovation and disrupting entire markets. At the core of that disruption are equally new and innovative development and delivery methods for the software itself. And those principles form cloud-native. Cloud-native is the future of application development as it moves an idea into production quickly and efficiently. It goes beyond IT transformation and aims at fundamentally transforming a business. Evolving toward cloud-native application development and delivery is multidimensional, affecting culture, processes, architecture, and technology.

cloud native tenets

Figure: Tenets of Cloud-native application development and deployment (Source: RedHat)

A cloud-native approach is not focused on where applications are deployed, but instead on how applications are built, deployed, and managed. It is based upon four key tenets – all of which are focused on enabling small teams working on small batches of shippable software by delivering continuously – (1) Service-based architecture, (2) API-based communication, (3) container-based infrastructure, and (4) DevOps processes. Let us explore each of these tenets to understand it better.

Service-based architecture, such as microservices, advocates building modular, loosely coupled services. You have to be doing DevOps, and you have to be doing continuous delivery in order to do microservices.

To expose these services, lightweight, technology-agnostic APIs are used to reduce complexity and overhead of deployment, scalability, and maintenance. APIs also allow businesses to create new capabilities and opportunities internally, and externally via channel partners and customers.

Cloud-native applications rely on containers for a common operational model across technology environments and true application portability across different environments and infrastructure, including public, private, and hybrid. Cloud-native applications scale horizontally, adding more capacity by simply adding more application instances, often through automation within the container infrastructure.

Achieving a successful cloud-native strategy means embracing agile, high-quality app development, enabled by DevOps.

Sample Cloud-Native Maturity Matrix

cloudnative maturity

Source: https://container-solutions.com/cloud-native-maturity-matrix/

Before you initiate the cloud-native transformation, evaluate where you are today and decide on where you want to go. It is important to stay in sync on various aspects of maturity as you progress along the cloud-native scale to avoid bottlenecks. You could put together your own maturity assessment matrix. I’ve shared a sample matrix just to help get across my message. It is important to understand your current processes, and crucially your internal culture and its pivotal role in transformation, to avoid an expensive waste of time and resources through potential bottlenecks.

Agile & DevOps

agile devops

Source: http://www.effectivepmc.com/devops

I’d like to take a couple of mins to levelset on how I use Agile/DevOps and CI/CD in my conversation. Each of these are distinct and important. When all three are used for their intended purposes, the results are transformational.

Agile and DevOps go hand in hand. Agile Development focuses on business agility by Dev Team collaborating with Business and creating potentially shippable increments of the Product. DevOps is aimed at collaboration between the Development Team and the Operations team to produce working production increments.

DevOps & CI/CD

DevOps & CI/CD are used interchangeably. However, DevOps is typically focused on culture and roles that emphasizes collaboration and communication, while CI/CD (which stands for Continuous Integration/Continuous Delivery) is focused on tools for product lifecycle management and emphasizes automation.

Now let me talk about CI and CD.

cd cd 2

Figure: Continuous Integration, Delivery, Deployment

Source: https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment

Developers practicing continuous integration merge their changes back to the main branch as often as possible. These software changes are validated by creating a build and running automated tests against the build. By doing so, you avoid code base instability that usually happens when people wait until it’s release time to merge their changes into the release branch. Continuous integration puts a great emphasis on testing automation to ensure that the application is not broken whenever new commits are integrated into the main branch.

Continuous delivery is an extension of continuous integration to make sure that you can release new changes to your customers quickly in a sustainable way. This means that on top of having automated your testing, you also automate your release process and you can deploy your application at any point through manual click of a button, at any desired frequency, be it daily, weekly, or whatever suits your business requirements. To reap the benefits of Continuous Delivery, you should deploy to production as early as possible to release small batches for easy validation and troubleshooting.

Continuous deployment goes one step further than continuous delivery. With this practice, every change that passes all stages of your production pipeline is released to your customers and developers can see their work go live within minutes. There’s no human intervention, and only a failed test will prevent a new change from being deployed to production. This accelerates the feedback loop with your customers and takes pressure off the team as there is neither a release day nor a maintenance window outside working hours to push software changes.

DevOps – Culture change

The first and foremost step in a DevOps adoption journey is being cognizant of the cultural change involved. Below are some essential changes to bring about for a successful transition.

  • Dynamic cross-functional team focused on delivering business success where everyone is empowered and accountable.
  • Commitment to a blameless culture that responds to experimentation and organizational learning, while eliminating fear of failure.
  • Close alignment between IT and business leadership, with common vision and goals shared by CIO, VP of Apps and business leaders.
  • Generative high trust and collaborative culture that is driven top-down.

In the older Ops model, developers are constantly seeking to create new code and respond to changing needs, while operations is focused on maintaining stability and reliability. In contrast to this, high-performing DevOps teams have a common goal – the goal of making quality, availability, and security  everyone’s job, every day. Instead of throwing code over the proverbial wall to the IT operations team, the entire application development and update process is much more collaborative.

The empowerment of individuals is key to culture change. The ability for people to make decisions without feeling their job is on the line is key, even if the decision turns out to be wrong. Accountability is paramount. Without establishing a core ethos of shared responsibility, any DevOps initiative is bound to fail from the start. In order to cultivate a thriving DevOps culture, developers need to be both empowered and obligated to take ownership and responsibility for any issues caused in production. And you want this alignment to reflect top-down, starting with CIO, VP of Apps and any other relevant LOB leaders.

It’s really important to communicate without playing the blame game by shifting the mindset for people to focus mainly on the work being done rather than the people behind the work. DevOps is about experimentation. Experiments will often fail, but each experiment is another learning point for the organization.

Generative high trust organizations actively seek and share information to better enable the organization to achieve its mission. Responsibilities are shared and failure results in reflection and genuine inquiry within such organizations.

Change, especially cultural change, doesn’t happen without top-down sponsorship and is most effective when top down rather than bottom up.

Sample DevOps Maturity Model

If you think that you’re up for a cultural change, perform DevOps maturity assessment to identify the right set of processes and tools, based on the project and team. Repeat the assessment periodically to measure progress and update processes to evolve to the next stage.

DevOps Maturity Model

Source: https://www.infoq.com/articles/Continuous-Delivery-Maturity-Model

DevOps – No two journeys are the same

DevOps journeys differ across enterprises and across specific teams within a given organization based on their current maturity level, pain points and critical need of specific applications within the organization and market dynamics.

Digital transformation programs require alignment and collaboration between CIO, Chief Digital Officers and many other C-suite leaders. However, the executive sponsor and his/her functional alignment to IT vs. LOBs could heavily influence whether a particular enterprise starts its DevOps journey in Agile development, microservices architecture, implementation of CI/CD pipelines or extending beyond infrastructure automation into Infrastructure as Code (IaC). Organizations tend to derive benefits along their DevOps journey, irrespective of whether they adopt Agile development and CI/CD implementation for an existing app, resort to architectural modernization and use of microservices as part of their cloud-native journey, or automate their infrastructure delivery and package applications as containers to improve operational efficiency and infrastructure utilization.

Aligning Cloud & DevOps adoption

One of the core tenets of DevOps is the application of automation to streamline processes for increased IT operations and developer productivity. For organizations that adopt DevOps prior to moving to the cloud, teams can automate server configuration and code with tools like Ansible or Chef. However, hardware elements are much harder to automate since legacy equipment does not offer ready-to-use APIs. In contrast, cloud computing offers the ability to automate provisioning of not just software but also infrastructure elements (e.g., routers and load balancers). Through code and templates, teams can automatically summon elements as diverse as security services, databases and networking.

Concurrent adoption of DevOps and Cloud is the most efficient and gets you to DevOps faster. It allows an organization to adopt automation at all levels concurrently, using technology to enforce new DevOps processes while maximizing efficiencies.

A Winning DevOps approach – PACE Layered

While Cloud-native helps make fundamental transformation within an organization, not all existing applications within an organization need to get on to a DevOps pipeline.

gartner pace

I would suggest that you categorize applications based on Gartner’s PACE layered application strategy, with pace denoting how fast any application changes. The three application layers are –

  1. Systems of Record – These are typically third-party application software with some customization for the specific organization, and support administrative and transaction processing activities, such as finance, HR, asset management, or procurement and thereby change with the least frequency.
  2. Systems of Differentiation – These applications support processes unique to the organization or its industry. These are innovations that have matured and are going through incremental changes and change at a mediocre pace of once in every few quarters.
  3. Systems of Innovation – This category includes applications built to support new, innovative business activities and are constructed quickly to enable enterprises to take advantage of these new ideas and opportunities in the market. These are ideally released on an ongoing basis to validate business ideas or fail fast based on learnings from customer feedback.

For best results when introducing DevOps to your IT organization, identify new applications designed for change that have a higher tolerance of risk.

The core usecase of DevOps is in Agile development, though it can be applied broadly to other categories such as Iterative and Incremental Development (IID) and Citizen development (low code app development) to benefit from improved collaboration and extensive automation.

DevOps: Where and how to start?

The previous step helps us identify the category of applications to choose from. Next, let me provide some guidance on the specific type of application to pick and how to kickstart implementation.

Start small and iterate with a project which is straightforward and not mission-critical, to find the right balance for your business. Early iterations with a new Agile-DevOps Product team might take time to improve velocity. Keep team sizes small and encourage scope of each team’s domain to be small and bounded. Start with one Agile-DevOps team and then move to several teams creating an application via Scrum of Scrums.

Some organizations focus on Agile-DevOps team dynamics and tackling a project. IT groups tend to focus on automation. Assess knowledge base and expertise of your team, operational readiness and experiment with different tools.

To reiterate, my learning has been that organizations which started their journey to address their biggest pain point and most critical tactical need, factoring in their industry drivers and constraints, were the most successful with DevOps implementation.

Organizing Teams for DevOps

Now that we’ve identified where to start with DevOps, let us ensure our teams are organized well to support these initiatives.

market oriented org

Source: Functional vs. Market Orientation Work: The DevOps Handbook, Kim et al.

To better achieve DevOps outcomes, we need to reduce the effects of functional orientation which is optimized for cost and expert skill development, and instead enable market orientation which is optimized for speed and quickly delivering value to the customer by having many small teams working independently. These market-oriented teams are responsible not only for feature development but handle idea conception through retirement as they’re cross-functional and independent. They are able to design and run user experiments, build and deliver new features, deploy and run their service in production, and fix any defects without dependencies on other teams, thus enabling them to move faster.

I’m not suggesting that you do a large, top-down reorganization to achieve market orientation as it could potentially create fear and paralysis, and impede progress. Instead, you could plan to embed functional skills such as QA, Ops, and Infosec into each product team, or alternately provide their capabilities to these teams through automated self-service platforms that provide production-like environments, initiate automated tests, or perform deployments. This would help you move away from a model where the product team has to wait on open tickets to groups such as IT Operations or InfoSec. This model has been touted by Amazon as one of the primary reasons behind their ability to move fast even as they grow.

While market oriented teams are highly recommended when teams are being built from the ground up, this isn’t the only way to achieve fast flow and reliability. It is possible to create effective, high-velocity organizations with functional orientation as long as everyone in the value stream are focused on organizational outcomes, regardless of the function within which they reside in the organization. Etsy, Google and GitHub are among those who’ve achieved success with functional orientation, given their high-trust culture that enables all departments to work together effectively.

DevOps – Roles & Skills

Next, let us delve deeper into the typical composition of various teams, and required skillsets to better enable DevOps adoption. I’ll also touch upon what you need to take into consideration when resourcing a team in a DevOps organization.

In organizations that have embraced DevOps, I’ve typically seen 3 different teams – Product, DevOps and IT Ops teams.

Product Team – Each product team owns a specific service and is designed to be cross-functional and independent. The core team usually consists of product owner, scrum master, developer, test engineer, SQA engineer, architect, DevOps, and InfoSec. Generalizing Specialists i.e. those with T-shaped (deep expertise in one area and broad skills across many areas) and E-shaped skills (deep expertise in a few areas and experience across many areas with proven execution skills and always innovating with limitless potential) would be better fits than “I” shaped skills (deep expertise in a specific area only) for the core product team. Be cognizant that developers have been agile for a longer time than operations, and many of the Ops teams have some catchup to do, when they’re pulled into an Agile product team.

IT Ops isn’t going away with DevOps. Instead we can expect to see IT Ops adopting SDLC best practices to automate infra delivery through IAC. Given the significant operational efficiency gains through DevOps, I’ve seen ops headcount shift to Dev to hire DevOps/Site Reliability engineers/cloud engineers.

DevOps Team – Within the industry, we tend to agree on the need for product teams. In contrast, there are some who’d vehemently argue against a separate DevOps team justifying that it goes against the vein of DevOps, which is guided by collaboration between Engineering and Ops. However, in practice, while there is Ops representation in product teams through DevOps participation in Agile ceremonies, there is a well-defined DevOps team in most organizations, with specific responsibilities outside of the product team to (a) help automate and streamline operations and processes (b) build and maintain tools for deployment, monitoring, and operations and (c) troubleshoot and resolve issues in dev, test and production environments.

Typically, IT Ops function is centralized, and DevOps teams are part of individual LOBs.

IT Ops – Value to & through DevOps

DevOps teams have every interest in working in partnership with IT Ops to ensure they have access to the resources they need to deliver services in a way that avoids bottlenecks or barriers to entry.

IT Ops continues to be valuable in a DevOps organization by

  • Offering infrastructure services that DevOps can consume quickly and easily.
  • Obtaining economies of scale by serving up shared infrastructure.
  • Designing infrastructure per corporate governance requirements around business continuity, compliance and information security.
  • Serving as the broker of physical, virtual and cloud resources.

IT Ops, in turn, derives value through DevOps by limiting shadow IT, and leveraging operational efficiency and productivity boosts that DevOps brings to the organization.

CI/CD – Implementing Pipelines

Automation of release tasks is a core requirement for DevOps teams. There are tools and technologies that support automation and collaboration between teams. The below diagram captures few sample tools for each DevOps phase.

ci cd pipeline

Source: https://www.helpnetsecurity.com/2019/01/18/protecting-privileged-access/

Think of building a continuous pipeline as creating a toolchain of loosely-coupled event driven providers and consumers to automatically move application code through these phases in the release process with zero latency. Multiple pipelines are typically built for a single application depending on the phases covered and the environment (staging, pre-prod, prod) to which the application code is released.

On selecting tools for implementing a pipeline for a given application, below are few criteria for identifying (a) relevant stages in your pipeline and (b) tools to use in each stage.

  • Identify pain points and focus on tools for relevant stages.
  • Evaluate tools based on ease of learning, implementation, out-of-box features, extensibility, applicability for your environment, marketplace and any other organizational requirements.
  • Be wary of swapping one tool out for another due to lack of needed functionality and ensuing solution fragility and support costs.
  • Leverage learnings from DevOps initiatives within other groups in your organization, and aim at shared tools but give ample space for individual teams to pick their tools.

Continuous Delivery is already highly adopted for Application Development, while databases have been left behind with only one-third adopting similar technologies and tools for their database environments.

DevOps Tools Landscape

I’ve included this infographic just to illustrate that Devops tools landscape is a crowded one. So don’t get into analysis/paralysis when choosing tools.

Source: harness.io

Limit your evaluation phase of tools for your specific environment and process. Given that DevOps is not a one-size-fits-all approach, I will not recommend any tools for each pipeline stage.

DevOps – Measuring progress

Executives, higher up in the management chain, often face the challenge of just hearing positive news which is not quantified, and negative updates being kept under wraps. The best countermeasures to this inaccurate communication are mutually reinforcing pillars of automation and measurement. Put together a DevOps dashboard to automate progress tracking. Make sure that your initial adoption plan includes putting together a DevOps dashboard. Use the dashboard to assess DevOps progress at regular cadence.

Low-level operational metrics won’t do the trick. The metrics you present to upper management must tie back to business value. They’re not interested in a bunch of numbers that show how much code you’re getting out the door. They want to know how quickly you got something with quantifiable value to market.

To help with this, DevOps teams adopt digital experience monitoring and analytics solutions that correlate data — from the point of customer engagement to back-end business processes.

barriers

The last 3 slides starting this one should sum up all that I’ve talked about so far.

While DevOps has gone mainstream, barriers to DevOps still exist within organizations. Such barriers vary based on the type of organization, its size, existing processes and degree of deployment scale among other aspects.

Culture: DevOps initiatives require a collaborative culture, with risk-taking, experimentation and failure forgiveness baked in. Usually, culture is the hardest to change and has been rated as the #1 barrier to DevOps. A business-as-usual attitude may not work. To overcome this barrier, it is critical for enterprises to make conscious efforts to embrace a shift in culture during DevOps rollout, through executive sponsorship, election of champions and adoption of collaborative tools (such as Slack) for a top-down and bottom-up culture shift to happen.

Plan – In organizations that have spun up DevOps projects as bottom-up initiatives, organizational resources such as program managers, planning processes, budget allocation and executive buy-in were sporadic and projects were not specifically tied to business goals or measurable objectives resulting in limited success that could not build momentum. This barrier can be overcome through structured planning and securing executive sponsorship for DevOps initiatives.

Toolchain – Fragmented toolsets including multiple open-source tools have hindered standardization and slowed down adoption.  This problem is likely to continue for several years until the toolchain ecosystem consolidates further and DevOps governance is mainstreamed and standardized across corporate systems including IT, LoB, partners, etc.

Budget/Skillset – Most DevOps projects have under-budgeted for experimentation, failure, resources, tool support and time for transformation. However, enterprise-grade support fees for open source tools, non-standardized tool adoption and expensive DevOps resources skewed budget. Being informed of this propensity should help enterprises overcome this barrier.

Legacy: Brownfield deployments with legacy infrastructure induce complexity for cloud models. Infrastructure rigidity and access to physical environments induce delays. Organizations that have on-premise infrastructure, proprietary tools and manual processes all suffer from legacy technical debt. A number of tools now straddling brownfield and greenfield deployments are sprouting up which should help ease this barrier.

best practices 2

Here are the best practices I’d recommend for better DevOps success.

  1. Tailor your implementation to address pain points in your release process.
  2. Take stock of where you are for the given application under consideration before charging ahead with implementation.
  3. Encourage experimentation using pilot initiatives to create some quick wins, and also identify the right balance for the specific team that is now implementing DevOps, while still leveraging DevOps learnings from within the wider organization.
  4. It is highly critical to invest in building an internal DevOps community in the organization so that they can help understand what other teams are working on and there is cross-pollination of best outcomes and toolsets.

key takeaways 2

Here is a quick explanation of key takeways from my talk.

  1. DevOps is not a one-size-fits-all approach but needs to tailored to address specific pain points, within the context of the overall Cloud-Native journey.
  2. The most commonly cited challenge with DevOps is in making cultural change happen and lack of a guiding transformation roadmap to drive measurable business outcomes.
  3. Automate measurement through DevOps dashboard to track actual progress based on business impact.
  4. Change agents who help with OCM (Organizational change management) are key to instilling and coaching Agile, Collaboration and Automation mindset and helping operationalize it on a daily basis.

Finally, remember that DevOps is about culture and collaboration, rather than tools for automation.

PwC’s global strategy consulting business found that companies facing disruption generally have longer to respond than they expect, and an effective response is typically available to them. It was earlier unthinkable that any Enterprise could transform itself fundamentally as we currently witness today in the Cloud-Native era. There is always hope to save your organization from the brink of extinction and become a powerhouse competitor by reinventing through Cloud-Native practices.

 

 

 

Navigating the maze of Cyber Security Intelligence and Analytics

Quite recently, Mint reported that intelligence agencies of the three nations – India, UK and US – did not pull together all the strands gathered by their high-tech surveillance tools, which might have allowed them to disrupt the 2008 terror strike in Mumbai. Would data mining of the trove of raw information, along the lines of US National Security Agency’s PRISM electronic surveillance program, have helped in connecting the dots for the bigger picture to emerge? As leaked by Edward Snowden, NSA has been operating PRISM since 2007 to look for patterns across a wide range of data sources, spanning multiple gateways. PRISM’s Big Data framework aggregates structured and unstructured data including phone call records, voice and video chat sessions, photographs, emails, documents, financial transactions, internet searches, social media exchanges and smartphone logs, to gain real-time insights that could help avert security threats.

In the business world, information compiled in a relatively recent Verizon Data Breach Investigations Report points out that data is stolen within hours in 60% of breaches, but goes undetected for months in 54% of overall breaches. With the mounting pace of attacks, and increasing IT attack surface due to constant influx of new technologies such as cloud computing, BYOD and virtualization, CISOs are looking for real-time data analytics to improve threat defense through better controls, reliably detect an incident and quickly contain the breach before it inflicts an inordinate amount of damage, and also provide insight into extent of data exfiltration to quantify damages and potentially remediate the situation, without aggravating security staff shortages already faced by most organizations.

Breach duration

Figure – Timespan of Breach Detection (Source: Verizon)

The era of big data security analytics is already upon us with large organizations collecting and processing terabytes of internal and external security information. Internet businesses such as Amazon, Google, Netflix and Facebook who have been experimenting with analytics since early 2000s might contend that Big Data is just an old technology in a new disguise. In the past, ability to acquire and deploy high-performance computing systems was limited to large organizations with dire scaling needs. However, technological advances in the last decade – which have rapidly decreased the cost of compute and storage, increased the flexibility and cost-effectiveness of data centers and cloud computing for elastic computation and storage, and development of new Big Data frameworks which allow users to take advantage of distributed computing systems storing large quantities of data through flexible parallel processing – and major catalysts such as data growth and longer retention needs have made it attractive and imperative for many different types of organizations to invest in Big Data Analytics.

How do traditional Security Analysis tools fare?

Over the decades, security vendors have progressively innovated on multiple aspects to – (1) protect endpoints, networks, data centers, databases, content and other assets, (2) provide risk and vulnerability management that meet governance requirements, (3) ensure policy compliance with SOX, PCI, HIPAA, GLBA, DSS and other regulations, (4) offer identity/access/device management, while aiming (5) to provide a unified security management solution that allows for configuration of its security products, and offers visibility into Enterprise security through its reporting capability. While most individual security technologies have matured to the point of commoditization, security analysis remains a clumsy affair that requires many different tools (even in SIEM deployments, that I will elaborate on below), most of which do not interoperate due to the piecemeal approach of security innovation and multi-vendor security deployments.

Today, security analysts often rely on tools such as Log Management solutions and Security Information and Event Management (SIEM) systems for network security event monitoring, user monitoring and compliance reporting – both of which focus primarily on the collection, aggregation and analysis of real-time logs from network devices, operating systems, and applications. These tools allow for parsing and search of log data for trends, anomalies and other relevant information for forensics, though SIEMs are deemed to be more effective for forensics given its event reduction capability.

Log Management solutions from few prominent vendors are AlienVault USM, AlertLogic Log Manager, McAfee Enterprise Log Manager, LogRhythm, HP ArcSight ESM, Splunk, SolarWinds Log & Event Manager, Tenable Log Correlation Engine, ElasticSearch ELK, SawMill, Assuria Log Manager, BlackStratus Log Storm and EiQNetworks SecureVue. You’ll find a more  complete list along with details on these offerings here.

Let me now elaborate on how Log Management & SIEM solutions differ, and point out few misgivings of SIEM solutions (apart from its price, of course).

While a SIEM solution is a specialized tool for information security, it is certainly not a subset of Log Management. Beyond log management, SIEMs use correlation for real-time analysis through event reduction, prioritization, and real-time alerting by providing specific workflows to address security breaches as they occur. Another key feature of SIEM is the incorporation of non-event based data, such as vulnerability scanning reports, for correlation analysis. Let me decipher these features unique to SIEMs.

  • Quite plainly, correlation is to look for common attributes and link events together into meaningful categories. Data correlation of real-time and historical events allows for identification of meaningful security events, among massive amount of raw event data with context information about users, assets, threats and vulnerabilities. Multiple forms of event correlation are available – e.g. a known threat described by correlation rules, abnormal behavior in case of deviation from baseline, statistical anomalies, and advanced correlation algorithms such as case-based reasoning, graph-based reasoning and cluster analysis for predictive analytics. In the simplest case, rules in SIEM are represented as rule based reasoning (RBR) and contain a set of conditions, triggers, counters and an action script. The effectiveness of SIEM systems can vary widely based on the correlation methods that are supported. With SIEMs – unlike in IDSs – it is possible to specify general description of symptoms and use baseline statistics to monitor deviations from common behavior of systems and traffic.
  • Prioritization involves highlighting important security events over less critical ones based on correlation rules, or through inputs from vulnerability scanning reports which identify assets with known vulnerabilities given their software version and configuration parameters. With information about any vulnerability and asset severity, SIEMs can identify the vulnerability that has been exploited in case of abnormal behavior, and prioritize incidents in accordance with their severity, to reduce false positives.
  • Alerting involves automated analysis of correlated events and production of alerts, based on configured event thresholds for incident management. This is usually seen as the primary task of a SIEM solution that differentiates it from a plain Log Management solution.

The below figure captures the complete list of SIEM capabilities.

siem-image-wp

Figure – SIEM Capabilities (Source: ManageEngine.com)

SIEM solutions available in the market are IBM Security’s QRadar, HP ArcSight, McAfee ESM, Splunk Enterprise, EMC RSA Security Analytics, NetIQ Sentinel, AlientVault USM, SolarWinds LEM, Tenable Network Security and Open Source SIM which goes by the name of OSSIM. Here is probably the best source of SIEM solutions out in the market, and how they rank.

Real-world challenges with SIEMs

  • SIEMs provide the framework for analyzing data, but they do not provide the intelligence to analyze that data sensibly and detect or prevent threats in real-time. The intelligence has to be fed to the system by human operators in the form of correlation rules.
  • The correlation rules look for a sequence of events based on static rule definitions. There is no dynamic rule generation based on current conditions. So, it takes immense effort to make and keep it useful. The need for qualified experts, who can configure and update these systems, increases the cost of its maintenance even if we were to assume that such expertise is widely available in the industry.
  • SIEMs throw up a lot of false notifications when correlation rules are used initially, which prompt customers to even disable these detection mechanisms. SIEMs need to run advanced correlation algorithms using well-written specific correlation rules to reduce false positives. The sophistication of these algorithms also determines the capability to detect zero-day threats, which hasn’t been the forte of SIEM solutions in the market.

In spite of these challenges, a market does exist for these traditional tools, as both SIEM and Log Management play an important role in addressing organizational governance and compliance requirements, related to data retention for extended periods or incident reporting. So, these solutions are here to stay, and be in demand atleast by firms in specific industry verticals and business functions. Also, SIEM vendors have been exploring adjacent product categories – the ones I will cover below – to tackle the needs of the ever-so dynamic threat landscape.

Is Security Analytics a panacea for security woes?

With the current threat landscape increasingly featuring Advanced Persistent Threats (APTs), security professionals need intelligence-driven tools that are highly automated to discover meaningful patterns and deliver the highest level of security. And there we have all the key ingredients that make up a Security Analytics solution!

First off, Security Analytics (SA) is not a replacement for traditional security controls such as NGFW, IDS, IPS, AV, AMP, DLP or even SIEMs and other existing solutions, but can make them work better, by reconfiguring security controls based on SA outcome. For all of it to come together and a deeper truth on potential threats and current intrusions to emerge through Security Analytics, threat intelligence needs to be fused from multiple internal and external sources. Various levels of Indicators of Compromise (IOCs) need to be understood to capture earlier neglected artifacts and correlate behavior to detect potential threats and zero-day attacks, as traditional signature based approaches are no longer sufficient.

While use of Big Data technologies might be fundamental to any implementation of predictive analytics based on data science, it is not a sufficient condition to attain nirvana in the Security Analytics world. After wading through Big Data material, I’ve decided to keep it out of this discussion, as it doesn’t help define the essence of SA. One could as well scale up a SIEM or Log Management solution in a Big Data framework.

Now that you have a rough idea about the scope of this blog post, let me deep dive into various SA technical approaches, introduce the concepts of Threat Intelligence, IOCs and related standardization efforts, and finally present a Security Analytics architectural framework that could tie all of these together.

How have vendors implemented Security Analytics?

With Gartner yet to acknowledge that a market exists for Security Analytics – its consultants seem to be good with Threat Intelligence Management Platforms (TIMP) – I chose to be guided by the list of ‘Companies Mentioned’ here by ResearchAndMarkets, and additionally explore dominant security players (Palo Alto Networks, Fortinet) who were missing in this list, to check if there is anything promising cooking in this space. And, here is what I found. Less than a handful of these vendors have ventured into predictive analytics and self-learning systems, using advanced data science. Most others in the market offer enhanced SIEM and unified cyber forensic solutions – that can consume threat intelligence feeds and/or operate on L2-L7 captures – as a Security Analytics package, though it is certainly a step forward. The intent of this section is to offer a glimpse into technology developments in the Security Analytics space. I doubt that my research was exhaustive [as I mostly found product pitches online that promise the world to potential undiscerning customers], and would be glad to make the picture a little more complete, if you could educate me on what is going on in your firm in this space, if I’ve missed any game changer!

All SA products consume and manage data in some manner, be they log data, file data, full network packet captures, and/or behavioral data from systems or networks, by integrating with other security devices and solutions in the deployment. However, there must be intelligence in the system that helps render those data into something useful. Vendors have implemented this intelligence into SA via deep inspection of L2-L7 exchanges, anomaly sensors to spot behavior deviations, diamond model of intrusion analysis, game theory and/or advanced machine learning algorithms, potentially based on Campaigns which is a “collection of data, intelligence, incidents, events or attacks that all share some common criteria and can be used to track the attacker or attack tactic” [sic] [I’m going to wait for this concept to evolve and become commonplace, before I remark on this]. I’ve used sample vendor offerings to elaborate on each of these SA implementation approaches. I’ve come across categorization of SA offerings as data analyzers vs. behavior analyzers, but I view these as building blocks for SA and not alternative solutions.

Deep data inspectionSolera Networks’ DeepSee platform (now acquired by BlueCoat) offers an advanced cyber forensics solution that reconstructs and analyzes L2-L7 data streams (captured by integrating with industry standard and leading security vendor offerings for routers, switches, next-gen firewalls, IPSs, SIEMs et al) for application classification and metadata extraction, indexed storage and analysis to provide accurate visibility into activities, applications and personas on the network. It is built on the premise that full visibility, context and content are critical to react to security incidents in real-time or back-in-time. I scoured the net to understand how it detects a breach (beyond what a scaled-up SIEM can do with similar stream captures and batch data stores) and thus goes beyond Cyber Forensics, but didn’t find any material. CyberTap Security’s (acquired by IBM) Recon boasts of a similar solution with ability to reassemble captured network data to its original form, be they documents, web pages, pictures, mails, chat sessions, VOIP sessions or social media exchanges.

Behavioral anomaly sensors – With APT attacks consisting of multiple stages – intrusion, command-and-control communication, lateral movement, data exfiltration, cover tracks and persist [below figure has further details on APT lifecycle] – each action by the attacker provides an opportunity to detect behavioral deviations from the norm. Correlating these seemingly independent events can reveal evidence of the intrusion, exposing stealthy attacks that could not be identified through other methods. These detectors of behavioral deviations are referred to as “anomaly sensors,” with each sensor examining one aspect of the host’s or user’s activities within an enterprise’s network. Interset and PFP Cybersecurity are among the limited vendors who’ve built threat detection systems based on behavioral analytics, as reported here.

sophos-apt-lifecycle1

Figure – Lifecycle of Advanced Persistent Threats (Source: Sophos)

Cognitive Security (research firm funded by US Army, Navy and Air Force – now acquired by Cisco) stands out as it relies on advanced statistical modeling and machine learning to independently identify new threats, continuously learn what it sees, and adapt over time. This is a good solution to deep dive into, to understand how far any vendor has ventured into Security Analytics. It offers a suite of solutions that offer protection through the use of Network Behavior Analysis (NBA) and multi-stage Anomaly Detection (AD) methodology by implementing Cooperative Adaptive Mechanism for Network Protection (CAMNEP) algorithm for trust modeling and reputation handling. It employs Game Theory principles to ensure that hackers cannot predict or manipulate the system’s outcome, and compares current data with historical assessments called trust models to maintain a highly sensitive and low false positive detection engine. This platform utilizes standard NetFlow/IPFIX data and is said to deploy algorithms such as MINDS, Xu et al., Volume prediction, Entropy prediction and TAPS. It does not require supplementary information such as application data or user content and so ensures user data privacy and data protection throughout the security monitoring process. To reiterate, this is a passive self-monitoring and self-adapting system that complements existing security infrastructure. Cylance, among SINET’s Top 16 emerging Cybersecurity companies of 2014 as reported here, seems to have also made some headway in this artificial intelligence based SA approach.

Zipping past A-to-Z of Threat Intelligence

Definition – Here is how Gartner defines Cyber Threat Intelligence (CTI) or Threat Intelligence (TI) – “Evidence-based knowledge, including context, mechanisms, indicators, implications and actionable advice about an existing or emerging menace or hazard to assets that can be used to inform decisions regarding the subject’s response to that menace or hazard.”

How does TI help? – Clearly, TI is just not indicators, and can be gathered by human analysts or automated systems, and from internal or external sources. Such TI could help identify misuse of any corporate assets (say as botnets), detect data exfiltration and prevent leakage of further sensitive information, spot compromised systems that communicate with C2 (i.e. Command-and-Control aka CnC) servers hosted by malicious actors, detect targeted and persistent threats missed by other defenses and ultimately initiate remediation steps, or most optimistically – stop the attacker in his tracks.

How to gather TI? – A reliable source of TI is one’s own network – information from security tools such as firewalls, IPS et al, network monitoring and forensics tools, malware analysis through sandboxing and other detailed manual investigations of actual attacks. Additionally, it can be gleaned through server and client honeypots, spam and phishing email traps, monitoring hacker forums and social networks, Tor usage monitoring, crawling for malware and exploit code, open collaboration with research communities and within the industry for historical information and prediction based on known vulnerabilities.

TI external sources could either be (a) open-source or commercial, and (b) service or feed providers. These providers could also differ on how they glean raw security data or threat intelligence as categorized below:

  • Those who have a large installed base of security or networking tools and can collect data directly from customers, anonymize it, and deliver it as threat intelligence based on real attack data. E.g. Blue Coat, McAfee Threat Intelligence, Symantec Deepsight, Dell SecureWorks, Palo Alto Wildfire, AlienVault OTX
  • Those who rely heavily on network monitoring to understand attack data. These providers have access to monitoring tools that sit directly on the largest internet backbones or in the busiest data centers, and so they are able to see a wide range of attack data as it flows from source to destination. E.g. Verisign iDefense, Norse IPViking/Darklist, Verizon
  • Few intelligence providers focus on the adversary, track what different attack groups are doing and closely monitor their campaigns and infrastructure. This type of intelligence can be invaluable because adversary focused intelligence can be proactive, knowing that a group is about to launch an attack allows their customers to prepare before the attack is launched. E.g. of these TI service providers who focus on manual intelligence gathering by employing human security experts are iSIGHT Partners, FireEye Mandiant, CrowdStrike
  • Open source intelligence providers who typically crowd source. The best open source TI providers typically focus on a given threat type or malware family. E.g. Abuse.ch which tracks C2 servers for Zeus, SpyEye and Palevo malware while combining domain name blocklists. Other best open sources of TI are Blocklist.de, Emerging Threats, Spamhaus. ThreatStream OPTIC intelligence is community vetted but not open source.
  • Academic/research communities such as Information Sharing and Analysis Centers (ISACs), Research and Education Networking (REN) ISAC, Defense Industrial Base Collaborative Information Sharing Environment (DCSIE)

Other TI sources of manual/cloud feeds include – malware data from VirusTotal, Malwr.com, VirusShare.com, ThreatExpert; National Vulnerability Database; Tor which provides a list of Tor node IP addresses; and others such as OSINT, SANS, CVEs, CWEs, OSVDB, OpenDNS. Few other commercial vendors include Vorstack, CyberUnited, Team Cymru and Recorded Future.

Indicators of Compromise – The Basics

Indicators of Compromise aka IOC denote any forensic artifact or remnant of an intrusion that can be identified on a host or network, with the term ‘artifact’ in the definition allowing for observational error. Indicators could take the form of – IP addresses of C2 servers, domain names, URLs, registry settings, email addresses, HTTP user agent, file mutex, file hashes, compile times, file size, name, path locations etc. Different types of indicators can be combined together in one IOC [as illustrated in the below figure].

whats-an-indicator-copy_11

Figure – Indicators of Compromise aka IOC (Source: Mandiant.com)

The below pyramid stacks up the various indicators one can use to detect an adversary’s activities and how much effort it would take for the adversary to pivot and continue with the planned attack, when indicators at each of these levels are denied.

threat_intelligence_pyramid_of_pain

Figure – Pyramid of Pain with IOCs (Image Source: AlienVault)

Starting at the base of the pyramid where the adversary’s pain is the lowest if detected and denied, we have Hash Values such as SHA1 or MD5 which are often used to uniquely identify specific malwares or malicious files involved in an intrusion. The adversary could potentially change an insignificant bit and cause a different hash to be generated, thus making our earlier detected hash IOC ineffective, unless we move to fuzzy hashes.

Next up in the pyramid are IP addresses. These again might not take long for the adversaries to recover from, as they can change the IP address with little effort. If they were to use an anonymous proxy service like Tor, then this indicator has no effect on the adversary. In comparison, Domain Names are slightly harder to change than IP addresses as they must be registered and visible in the Internet, but still doable though it might take a day or two for any adversary.

Looking at it from an IoC usage perspective in security deployments, the TTL of an IP address can be very low. Compromised hosts in legitimate networks could get patched, illicitly acquired hosting space might be turned off, malicious hosts are quickly identified and blocked, or the traffic might be black holed by the ISP. An IP address may have a TTL of 2 weeks, while domains and file hashes would have significantly longer TTLs.

Typical examples of Network Artifacts are URI patterns, C2 information embedded in network protocols, distinctive HTTP User-Agent or SMTP Mailer values. Host Artifacts could be registry keys or values known to be created by specific pieces of malware, files or directories dropped in certain places or using certain names, names or descriptions or malicious services or almost anything else that is distinctive. Detecting an attack using network/host artifacts can have some negative impact on the adversary, as it requires them to expend effort and identify which artifact has revealed their approach, fix and relaunch it.

Further up in the pyramid, we have Tools which would include utilities designed – say to create malicious documents for spearphishing, backdoors used to establish C2 communication, or password crackers and other host-based utilities they might want to use post their successful intrusion. Some examples of tool indicators include AV or YARA signatures, network aware tools with a distinctive communication protocol and fuzzy hashes. If the tool used by adversaries has been detected and the hole has been plugged, they have to find or create a new tool for the same purpose which halts their stride.

When we detect and respond to Tactics, Techniques and Procedures (TTPs), we operate at the level of the adversaries’s behavior and tendencies. By denying them any TTP, we force them to do the most time consuming thing possible – learn new behaviors. To quote a couple of examples – Spearphishing with a trojaned PDF file or with a link to a malicious .SCR file disguised as a ZIP, and dumping cached authentication credentials and reusing them in Pass-the-Hash attacks are TTPs.

There are a variety of ways of representing indicators – e.g. YARA signatures are usually used for identifying malicious executables, and Snort is used for identifying suspicious signatures in network traffic. Usually these formats specify not only ways to describe basic notions but also logical combinations using boolean operators. YARA is an open source tool used to create free form signatures that can be used to tie indicators to actors, and allows security analysts to go beyond the simple indicators of IP addresses, domains and file hashes. YARA also helps identify commands generated by the C2 infrastructure.

Sharing IOCs across organizational boundaries will provide access to actionable security information that is often peer group or industry relevant, support an intelligence driven security model in organizations, and force threat actors to change infrastructure more frequently and potentially slow them down.

Exchanging Threat Intelligence – Standards & Tools

Effective use of CTI is crucial to defend against malicious actors and thus important to ensure an organization’s security. To gain real value from this intelligence, it has to be delivered and used fairly quickly if not in real-time, as it has a finite shelf life with threat actors migrating to new attack resources and methods on an ongoing basis. In the last couple of years, there has been increased effort to enable CTI management and sharing within trusted communities, through standards for encoding and transporting CTI.

Generally, indicator based intelligence includes IP addresses, domains, URLs and file hashes. These are delivered as individual black lists or aggregated reports via emails or online portals, which are then manually examined and fed by analysts into the recipient organization’s security infrastructure. In certain cases, scripts are written to bring in data from VirusTotal and other OSINT platforms directly into heuristic network monitors such as Bro.

Let me touch upon various CTI sharing standards – OpenIOC, Mitre package (CybOX, STIX, TAXII), MILE package (IODEF, IODEF-SCI, RID) and VERIS – that are aimed at doing away with the above TI sharing inefficiencies.

OpenIOC is an open source, extensible and machine-digestible format to store IOC definitions as XML schema and share threat information within or across organizations. This standard provides the richest set of technical terms (over 500) for defining indicators and allows for nested logical structures, but is focused on tactical CTI. The standard was introduced and primarily used in Mandiant products, but can be extended by other organizations by creating and hosting an Indicator Term Document. There has been limited commercial adoption outside of Mandiant, with McAfee among the minority vendors with products that can consume OpenIOC files. MANDIANT IOC Editor, Mandiant IOC Finder and Redline are tools that can be used to work with OpenIOC.

Mitre has developed three standards that are designed to work together and enable CTI sharing – Cyber Observable eXpression (CybOX), Structured Threat Information Expression (STIX) and Trusted Automated eXchange of Indicator Information (TAXII). With STIX being accepted by industry leaders, STIX and TAXII are starting to see wide adoption. Common Attack Pattern Enumeration and Classification (CAPEC) and Malware Attribute Enumeration and Characterization (MAEC) are focused on attack patterns and malware analysis respectively.

MITRE threat formats

Figure – Threat Intelligence Formats in Mitre Family (Source: Bit9.com)

  • CybOX provides the ability to automate sharing of security intelligence by defining 70 objects (e.g. file, mutex, HTTP session, network flow) that can be used to define measurable events or stateful properties (e.g. file hashes, IPs, HTTP GET, registry keys). Objects defined in CybOX can be used in higher level schemas like STIX. While OpenIOC can effectively represent only CybOX objects, CybOX also understands the notion of events which enables it to specify event order or elapsed time, and bring in the notion of behaviors.
  • STIX was designed to additionally provide context for the threat being defined through observable patterns, and thus covers the full range of cyber threat information that can be shared. It uses XML to define threat related constructs such as campaign, exploit target, incident, indicator, threat actor and TTP. In addition, extensions have been defined with other standards such as TLP, OpenIOC, Snort and YARA. The structured nature of the STIX architecture allows it to define relationship between constructs. E.g. the TTP used can be related to a specific threat actor.
  • TAXII provides a transport mechanism to exchange CTI in a secure and automated manner, through its support for confidentiality, integrity and attribution. It uses XML and HTTP for message content and transport, and allows for custom formats and protocols. It supports multiple sharing models including variations of hub-and-spoke or peer-to-peer, and push or pull methods for CTI transfer.

Managed Incident Lightweight Exchange (MILE), an IETF group, works on the data format to define indicators and incidents, and on standards for exchanging data. This group has defined a package of standards for CTI which includes Incident Object Description and Exchange Format (IODEF), IODEF for Structured Cyber Security Information (IODEF-SCI), and Real-time Inter-network Defense (RID) which is used for communicating CTI over HTTP/TLS. IODEF is an XML based standard used to share incident information by Computer Security Incident Response Teams (CSIRTs) and has seen some commercial adoption e.g. from HP ArcSight. IODEF-SCI is an extension to the IODEF standard that adds support for attack pattern, platform information, vulnerability, weakness, countermeasure instruction, computer event log, and severity.

Vocabulary for Event Recording and Incident Sharing (VERIS) VERIS framework from Verizon has been designed for sharing strategic information and an aggregate view of incidents, but is not considered to be a good fit for sharing tactical data.

Many vendors and open source communities have launched platforms to share TI. e.g. AlienVault’s Open Threat Exchange (OTX), Collective Intelligence Framework (CIF), IID’s ActiveTrust. OTX is a publicly available sharing service of TI gleaned from OSSIM and AlienVault deployments. CIF is a client/server system for sharing TI which is internally stored in IODEF format, and provides feeds or allows searches via CLI and RESTFUL APIs. CIF is capable of exporting CTI for specific security tools. IID ActiveTrust platform is leveraged by government agencies and enterprises to confidently exchange TI and coordinate responses between organizations.

Unifying Security Intelligence & Analytics – The OpenSOC framework

So, how do organizations use Threat Intelligence that I’ve talked about at length? With Threat Intelligence coming in from a variety of sources and in multiple formats (even if each of these are standardized), a new solution being floated in the market is the Threat Intelligence Management platform (TIMP) or Threat Management Platform (TMP) – which has been tasked to parse incoming intelligence and translate it into formats as understood by various security solutions (e.g. malware IPs into NIDS signatures, email subjects into DLP rules, file hashes into ETDR/EPP/AV rules, Snort rules for IPS, block lists and watch lists for SIEMs/AS, signatures for AV/AM etc.), to make it suitable for dissemination. TI can also be uploaded into a SIEM for monitoring, correlation and alerting, or to augment any analysis with additional context data.

Now that I’ve tied up one dangling thread, what about Security Analytics? Having zoomed in early on in this blog post on what any SA operates on and how its output could better security controls in a deployment, I’ll provide a 30,000-40,000ft view (at cruising altitude?) this time around, by introducing the OpenSOC, an unified data-driven security platform that combines data ingestion, storage and analytics.

OpenSOC framework

Figure – OpenSOC framework

The OpenSOC (Open Security Operations Center) framework provides the building blocks for Security Analytics to (1) capture, store, normalize and link various internal security data in real-time for forensics and remediation, (2) enrich, relate, validate and contextualize earlier processed data with threat intelligence and geolocation to create situational awareness and discover new threats in a timely manner, and (3) provide contextual real-time alerts, advanced search capabilities and full packet extraction tools, for a security engine that implements Predictive Modeling and Interactive Analytics.

The key to increasing the ability to detect, respond and contain targeted attacks is a workflow and set of tools that allows threat information to be communicated across the enterprise at machine speed. With OpenSOC being an open source solution, any organization can customize the sources and amount of security telemetry information to be ingested from within or outside the enterprise, and also add incident detection tools to suit its tailored Incident Management and Response workflow.

I’ve treaded on uncertain ground in navigating the product-market for Security Analytics, given that it is still nascent and the product category isn’t well delineated. Would welcome any views on this post, be they validating or contradicting mine.

How is the SDN landscape shaping up? (Part-3) – A security perspective

In a fast-changing technology landscape, security is a moving target as the sophisticated attacker unearths and takes advantage of new weaknesses. Cybercrime has moved beyond mere vandalism and emerged as a professional industry focused on exploit kit innovations and planned attacks, given its lucrativeness and with some countries around the world involved in extensive cyber-espionage. The global hackers’ economy is pegged conservatively at around $450B, while Gartner estimates the worldwide security market for 2014 to be just about $71B.

In the December 2013 breach of retail giant Target, data from as many as 40M credit cards and 70M user accounts were hijacked, while Home Depot data breach between April and September 2014 affected 56M debit and credit cards, data of which is said to have appeared within days on black-market sites. Per Checkpoint security report 2014, 88 percent of analyzed organizations experienced at least one potential data breach event in 2013, up from 54 percent in 2012. Verizon’s data breach investigation report identified nine major patterns that covered over 90% of security incidents in the past decade, with the dominant pattern varying across industry verticals. [Refer bottom-right of the following chart for a listing of these categories.] In the recent years, dangers of targeted attacks and advanced persistent threats (APTs) have garnered much of the attention in the information security world.

data breaches

Figure – Data Breaches in organizations (Source: Verizon)

How can SDN improve security?

With the increase in network traffic, virtual machines, cloud computing, and malware threats, IT manpower is a major security bottleneck, as they simply can’t keep up with the increasing demand of sorting through incidents/alerts and fine-tuning security controls based upon latest threats. This situation can be expected to exacerbate as IoT applications gain momentum. The only way to bridge this growing response-capability gap is through intelligent incident detection and automated response.

While automated security is a key driver, the excitement with SDN enabled security is more so around the opportunity for intelligent response on a granular basis – be it on per flow, per application or per user basis – to provide Security-as-a-Service, while eliminating manual configuration errors and keeping down SecOps & NetOps staffing costs. No longer would the default system response to any severe security threat be ‘Fully Block’.

Enterprise SecOps teams are interested in using SDN to enable multiple usecases – to selectively block malicious traffic from endpoints while still allowing normal traffic flows, to centralize security policy and configuration management, and for network security policy auditing and conflict detection/resolution.

With deperimeterization, corporate networks no longer have a single point of entry or exit, resulting in the demand for network telemetry and flow-based security solutions. SDN could be used to act on any anomalies detected on flow capture data, by dynamically establishing rules to divert specific flows to either a centralized or distributed enforcement points. Additionally, SDN could be used for traffic engineering to direct discrete network application flows to specific combination of security services such as FWs, IDS/IPS and WAFs.

SDN’s capability to programmatically and dynamically insert customized security services in the network path, via service chaining of physical and virtual L2-L4 routers/switches/appliances [as illustrated in below figure], could help minimize performance impact of ‘bump in the wire’ security devices, while enhancing security by acting on gleaned threat intelligence.

sdn-security-big-picture

Figure – SDN enabled network and application security (Source: devcentral.f5.com)

While SDN controller limits its visibility and programmatic focus to underlying physical/virtual network elements and thus directly implements network security, application layer security is provided in this architecture by working in tandem with orchestration components to control other L4-L7 data path elements.

In a nutshell, SDN can help improve security by aligning the right security service with the right flows.

Common Security Usecases of SDN

Having gone over SDN security benefits in the previous section, let’s now deep dive into a couple of well-defined SDN security usecases – DDoS mitigation, Advanced Malware Quarantine and Elephant Flow mitigation.

DDoS Mitigation

Distributed Denial of Service (DDoS) attacks that are typically launched from compromised systems (bots) bombard enterprise applications with fake service requests, with the intent of denying or degrading performance of legitimate service requests. In addition to hogging the application servers, such attacks consume SP network capacity and saturate inline security devices. Addressing DDoS attacks requires detection of fake requests and diverting traffic to a specialized cleaning device to remove fake packets from the traffic stream, and sending back legitimate packets to the enterprise application/service.

While DDoS defense can be implemented through Remotely Triggered Black Hole (RTBH) or Policy based routing (PBR), the former solution implements filters that block all traffic to the target application, thus making it unavailable for legitimate services too. On the other hand, while static PBR based solution provides granular flow control, it requires SPs to add manual filter configuration on their backbone/edge routers and thus prolongs the recovery time from such attacks.

In comparison, BGP FlowSpec which aligns with the SDN paradigm follows a more granular approach and automates distribution of filter lists that can match a particular flow via BGP NLRI to the backbone/edge routers. So, policy updates happen dynamically and only a specific flow traffic (based on L3/L4 information) is stopped instead of dropping all traffic to a server, as illustrated in the below figure. SDN can thus boost security by dynamically reprogramming and restructuring a network that is suffering a distributed denial-of-service attack.

DDoS mitigation using BGP FlowSpec

Figure – DDoS Mitigation using BGP FlowSpec (Source: Alcatel-Lucent)

Automated Malware Quarantine (AMQ)

SDN can also offer security capabilities such as automatically quarantining an endpoint or network that has been infected with malware.

In the non-SDN architecture, AMQ is typically deployed as a proprietary standalone solution where each device performs its specified function autonomously, with limited awareness of other devices in the network. This closed approach is suitable only for static traffic flows, and inflexible in data center environments where server workloads are virtualized, traffic flows are highly dynamic and multiple simultaneous flows must be maintained.

In a SDN controller driven network, if detailed forensics by network security devices generate a high enough score to initiate a quarantine directive, the SDN controller translates this into a set of OpenFlow rules that is pushed down to OpenFlow enabled switches to cut off the infected host from the production network and display (via the Web Proxy Notifier in the below figure) a web page with corrective actions to be performed. Once the corrective actions are performed, the rules are changed to allow the end host back into the network. Such automated reconfigurations through SDN reduce the response time to security threats, while allowing user mobility at the network edge. Given that this AMQ implementation does not require any additional software or hardware support beyond OpenFlow enabled devices, this is a vendor agnostic solution and ripe for deployment.

AMQ SDN security

Figure – Advanced Malware Quarantine using SDN (Source: ONF)

Elephant flow detection/mitigation

Short-lived flows referred to as mice tend to be bursty and are often latency sensitive. In contrast, long-lived flows termed as elephants normally perform transfers of large blocks of data and are typically packet latency insensitive. Without intelligent traffic engineering, elephant flows may fill up network pipes causing latency and/or service disruption of mice flows.

SDN applications could help select a marking action and instruct the SDN controller to push the action to OpenFlow routers/switches to assign a selected queue for traffic associated with elephant flows. Other applications include offloading elephant flows from L2/L3 network fabric to the optical circuit switched network i.e. a pure photonic layer 1 network, for better performance and scalability at lower cost. In an enterprise environment, elephant flows observed during network backup of a filesystem could be prevented from taxing the firewall and slowing down its performance without compromising on security, by implementing a firewall service bypass function dynamically via SDN, based on whitelisted flow parameters of “good” permitted elephant flows. The permitted elephant flows are now dynamically re-directed so that they are sent directly in/out of the campus and the firewall is no longer in the forwarding path.

Security-centric SDN offerings

While OpenFlow is a key technology enabler of SDN security usecases as we saw in the previous section, few firms offer enhanced SDN security solutions that go beyond OpenFlow.

Illumio – a promising startup with a veteran executive team from Cisco, McAfee, Nicira, Riverbed and VMware – is based on the idea that each workload must have its own defenses. Illumio’s Adaptive Security Platform (ASP) provides visibility and enforcement services to dynamically keep pace with the motion, change and automation of the underlying IT infrastructure and applications, by attaching fine-grained security at the level of individual workloads while being continuously aware of its context. [Refer below figure to understand workload context.] It allows enterprises to secure applications across private, public or hybrid cloud environments on any physical server or VM. Illumio has 25 customers including Morgan Stanley, Plantronics, Creative Artists Agency, UBS, Yahoo and NTT I3.

Illumio workload context

Figure – Workload context (Source: Illumio)

Illumio ASP provides two modes of operation: (1) illumination mode which security administrators can use to visualize workloads and traffic flows, perform policy analyses, assess security gaps and potential impacts to application flows, and even discover unused workloads. (2) enforcement mode which lets administrators write security policies using natural language to describe desired communications between workloads. Once the policies are enforced, workloads are locked down to interact only with the specifically allowed paths.

As illustrated in the below figure, Illumio ASP is architected as a distributed and asynchronous system with Virtual Enforcement Nodes (VENs) attached to individual workloads (running on any VM, physical server, or private/public/hybrid cloud) and a centralized policy engine (PCE). The VENs listen continuously for the context of their workload and relay the information to the PCE, which computes enforcement rules by combining the workload’s context with configured security policies. The enforcement policies are then sent to the VEN, which modifies the appropriate iptables or Windows Filtering Platform parameters on the workload.

Illumio ASP architecture

Figure – Security-centric SDN – Illumio ASP architecture (Source: Illumio)

NetOptics, a provider of application and network visibility and monitoring solutions, offers Security-centric SDN by combining an SDN controller with Network Packet Brokers (NPBs) in an architecture that allows for intelligent orchestration of the customer’s existing security appliances and solutions. NPBs embody dynamic attack monitoring, ability to chain solutions and distribute traffic, while the SDN controller is used to assess the network and adapt the network behavior based on threats detected through security enforcement elements, to either divert suspicious traffic, change security devices’ responses, or block packets altogether, all with minimal-to-no human intervention.

Radware’s Defense4All, the first open SDN security application to be integrated into OpenDaylight, offers carriers and cloud providers DoS and DDoS detection and mitigation as a native network service. The application uses the programmability of SDN to collect statistics, analyze information and control the infrastructure layer to proactively defend against network flood attacks. This allows operators to provide DoS/DDoS protection service per virtual network segment or per customer.

In today’s dynamic cloud data centers, assets and their network configurations are subject to change yet compliance lacks automation and continues to rely on manual processes. Catbird, the leader in security policy automation and enforcement for private clouds and virtual infrastructure, offers CatBird vSecurity to bring automation and agility of the cloud to security automation and enforcement of compliance standards such as PCI, HIPAA and FISMA. By integrating with Cisco ACI, Catbird provides IT teams a highly standardized, automated approach to provisioning security as part of the network policy, through the asset lifecycle from asset inception to teardown.

Most vendors including Juniper (Altor Networks acquisition rebranded as Firefly Perimeter) and Cisco offer SDN controllers that seamlessly integrate with virtual and physical routers/switches/security devices to enable dynamic provisioning and service chaining of security services. Among the vendors that have already announced OpenDaylight Helium-based products is Brocade. Helium release has multiple security capabilities including Secure Network Bootstrapping Infrastructure (SNBI) and AAA. Alongside, we have OpenFlowSec, a community focused on developing OF security applications, kernel extensions and application frameworks, to assist OpenFlow practitioners in enhancing security capabilities of OpenFlow.

Is SDN secure enough, to offer security?

We’ve seen that good potential exists to enhance security through SDN, and there are ongoing industry efforts to help realize it. But, is SDN inherently secure enough, to augment network and application security? Or is SDN security an oxymoron as skeptics put it? Let me explore this aspect to try and calm down the naysayers to SDN from a security perspective, before I wind up this post.

Here are the top security concerns of the SDN architecture –

  1. Single point of failure, given the central function of the SDN controller
  2. Wide span of impact, when the network is opened up to applications
  3. Need for secure communication between controller and end nodes/applications, to stem MITM attacks

Single Point of Failure

Because the control plane and thereby the SDN controller play such a central function in the SDN architecture, security strategies must focus on protecting the control plane to thwart any targeted attacks on the controller – be they to saturate the control plane or attempts to leverage policy configuration errors for infiltration and lateral movement – as access to the controller could potentially give complete control of the network to an attacker.

It is vital to secure the SDN controller by carefully designing access control policies, manage authorization, track and audit usage of the controller. Also, where it resides on the network is a big security concern. This concern of having a king that needs to be protected are being addressed in variants of the SDN architecture through High Availability in the controller, SSL communication between controller and network elements, extension to the OpenFlow data plane called connection migration, which dramatically reduces the amount of data-to-control-plane interactions that arise during such attacks, SIEM to log everything that comes out of the system, analytics to correlate logs from SIEM and alert the manager of any changes.

Wide span of impact

In contrast to pre-SDN one-by-one configuration process, where an error might only affect one part of the network, SDN now makes it easy to have one policy applied uniformly and in an automated way. However, opening up the network to its own applications require their own security policy framework, governance and management to be put in place.

Business applications are vulnerable to potential threats because of the powerful SDN programming model. Multiple SDN network services may interfere with one another, compromising the forwarding behavior of the network and such conflicts must be avoided. Security policies may be compromised at the controller, at one or more network devices, and/or at other places. As a result, security policies must be validated, along with the network configuration and behavior and performance.

While the upside is that SDN will demand clear policies, companies will need to spend time thinking about and designing those policies. These can be expected to be reside as pre-provisioned security policy logic in a policy management system.

Communication between controller and end nodes/applications

To thwart any security barriers to SDN adoption, OpenDaylight community launched a project to analyze current security capabilities of its SDN implementation, and provide recommendations for security enhancements. Below is a quick snapshot of the transport layer security of existing northbound (NB) and southbound (SB) plugins. For example, OpenFlow specifies the use of TLS or UDP/DTLS, which support authentication using certificates and encryption to secure the connection.

SDN protocols - transport layer security

Figure – Transport Layer Security capabilities of SB & NB protocols (Source: ODL)

OpenDaylight recommendations for SDN controller security that are implemented in Helium and future releases include:

  • AAA service for applications/users/devices
  • Framework to securely and automatically discover, connect and configure the devices
  • Secure mechanism to cluster the controller for scalability and availability
  • Robust incident response support for controller/devices/user. E.g. A southbound syslog plugin to incorporate capture logs from devices in incident analysis
  • Secure communication channel to connect to plugins, common trusted crypto key storage for all plugins, pluggable or built-in Certificate Authority

How is the scale tilted?

Securing networks is becoming more challenging to businesses, especially with BYOD and increased cloud adoption, if not yet due to other mass phenomena such as Internet-of-Everything. Organizations can certainly protect themselves better through automated and dynamic security solutions made possible through SDN, as it provides a centralized intelligence and control model that is well suited to provide much-needed flexibility to network security deployments.

With SDN, we can add agility to network intrusion responses and go beyond the network to protect what is actually of interest, namely applications. Focus on network security was more an interim arrangement as there existed a network visibility deficit at the higher layers. However, by focusing on the network all these years, we lost the most important context awareness which will help deduce if the application or user is doing what is allowed to be done. Security-centric SDN allows an organization to deploy a quick, decisive and deep enterprise-wide response to threat on all fronts. An integrated solution comprising both network and application-layer elements will ultimately provide the comprehensive ‘top-to-bottom of the stack’ security desperately needed to defend against attackers in the dynamic threat landscape.

The key to realizing self-defending networks, however, is “lower false positives” combined with “actionable-threat-intelligence”, given the lack of human element to weed out false alarms or manually correlate event logs to decide on the course of action, in the SDN driven security architecture.  The ecosystem of network security vendors, threat intelligence providers, and security professionals should strive for highly accurate intelligence so that automated security remediation decisions are made with a high-degree of confidence. Also, SDN technologies will need to continually evolve to enhance their inherent security capabilities to avoid being a dampener to adoption.

Adoption of SDN will force NetOps and SecOps to work together more closely, even if they aren’t merged into a single organization as few propose or atleast foresee to happen. This could be a bigger change than the driving technology per se, and one that could meet with opposition, given the organizational dynamics involved. Let’s wait and watch how these play out and if enterprises are able to tap into security-centric SDN benefits.

Meanwhile, how do you think the balance is tilted? Will security drive or hinder SDN adoption? Also, a clarion call to security professionals out there to critique my post. Look forward to learning from your feedback too, while I try to get a better handle on security.

Internet of Things – Unraveling technology demands & developments

Every generation is said to tune into current vibration levels and raise it higher for evolution. Evolution of both technology and human life is marked by greater finesse, vividness and presence, all of it driven by evolution of thought. The concept of ‘Internet of Things’ (IoT) is an inspiring vision to bring together innumerable technology advancements in computing & communications, and further evolve those through innovation, to improve quality of human life by interconnecting physical and cyber worlds. High-profile IoT applications include Industrial Control, Home Automation, Smart Retail, Connected Health, Hi-tech Cities, Intelligent Transportation and Logistics .

iot-a

Figure – IoT Applications (Source: IoT-A)

Per Business Insider Intelligence [refer figure below], Internet of Things will connect devices in never-seen-before scale and at a faster pace than the industry has witnessed so far with PCs/smartphones/tablets, or in the future with wearables. And so the challenge in bringing alive Internet of Things – which is estimated to connect over 50 Billion devices by 2020, in a world populated today with 7.2 billion people & growing a little over 1% – is to make possible connecting devices cost-effectively and efficiently in whopping scale, and in an open world by addressing security needs and alleviating privacy concerns.

Connected devices

Figure – Growth trend & forecast for connected devices (Source: Business Insider)

How IEEE defines Internet-of-Things

Having set out to understand technology demands from IoT, let’s start with a technical definition of the key term. Here is how the IEEE IoT technical community defines it. “The Internet of Things (IoT) is a self-configuring and adaptive system consisting of networks of sensors and smart objects whose purpose is to interconnect “all” things, including everyday and industrial objects, in such a way as to make them intelligent, programmable and more capable of interacting with humans.”

While SDOs (Standard Development Organizations) use the phrase Internet-of-Things/IoT, vendors have coined different terms – ranging Internet-of-Everything (Cisco – Refer my previous article for IoE definition), Industrial Internet (General Electric), Smart & Connected Communities (Cisco’s Smart Cities solution), Smart Planet (IBM) – with either wider or limited scope given their industry play and applicable market segments.

IoT – Key Architectural Layers

Here is a simplified view of technology components that are needed to actualize IoT. To reiterate, IoT is a system formed of “enabling” technologies, and not a specific set of technologies per se.

IoT technologies - functional view

Figure – Functional View of IoT Technologies (Source: Freescale.com)

Given that multi-service edge connectivity nodes have embedded processing capability, we could aggregate two or more of the building blocks, in the above functional view. Thus, the key layers in the IoT architecture, as exemplified in ITU-T’s IoT reference model below in a more granular manner, are:

  1. Sensor/machine infrastructure
  2. Communication backbone
  3. M2M service layer
  4. Application platform

In addition, we have critical overlay functions – Security and Management – that span multiple layers.

iot itu-t reference model

Figure – IoT Reference Model (Source: ITU-T)

IoT Technology demands

Let’s now delve into the technology demands of IoT from each of its architectural layers.

The Sensor/Machine Infrastructure layer is formed of sensors, actuators and smart objects that would help onboard the physical world into Internet-of-Things.

Sensor/Machine Infrastructure Layer – Characteristics, Challenges & Technology demands
  • Light weight, inexpensive, typically single-function, resource-constrained, miniaturized devices with little to no physical security and memory capability
  • Rudimentary network connectivity and therefore need for low power communication protocols
  • Limited compute and cannot support traditional security algorithms
  • Installed in remote/inaccessible locations, or embedded in physical structures, and so wireless devices that operate autonomously in the field, with secure remote management
  • Geo-location based discovery support
  • Ultra-low power circuits and communication
  • Long lifetime devices that can potentially run on a single battery, to avoid incredibly expensive battery replacement, given enormity of deployment scale
  • Super regenerative and ambient energy harvesting capabilities
  • Adaptive to what is happening in the real world, based on events that are either detected directly or by real-time analysis of sensor data
  • Self-organizing and autonomic working, given the scale and lack of accessibility to mounted locations

The Communication Backbone comprises of M2M gateways, multi-service edge and backbone IP/MPLS core nodes, which form the network infrastructure and connect things globally. M2M gateways play a vital role in this architecture as they, along with edge nodes with fog computing capability (i.e. those with compute, storage and network resources between sensors and clouds), aggregate information from innumerable directly-connected endpoint devices with varying compute/memory/network-connectivity capabilities.

Communication Backbone – Technology demands of boundary nodes
  • Support large number of nodes in highly heterogeneous operating environments through multi-modal technologies (wired/wireless) and variety of protocols such as Zigbee, Bluetooth low energy, WiFi, 3G, 4G
  • Allow for endpoint mobility and geo-distribution
  • Low latency processing and real-time decision making through fog computing
  • Strong safeguards during communication to ensure security and privacy
  • Ruggedized M2M gateways
  • Availability and reliability using backup distributed intelligence to workaround device malfunctions
  • Distributed processing, data collection, network resource preservation and closed loop functioning to ensure that scarce network bandwidth is not wasted, while meeting response time requirements for usecases and ensuring scalability
  • Unlimited addressing capability e.g. through IPv6

The M2M Service Layer, a software layer between transport and application protocol layers, will provide data transport, security, device discovery and device management across a multitude of vertical domains, independent of communication technologies in the lower layers. This will help ensure connectivity between devices and various M2M applications, to realize horizontally integrated Internet-of-Things, as against vertical silos or ‘Intranet-of-Things’ for specific applications. This layer should ensure semantic modeling of things by providing context for the information that “things” can provide, or actuations they can perform. For e.g., while providing data from a temperature sensor for home automation, it should also describe if it is the indoor temperature of a room, or the temperature of a fridge etc.

The IoT Application Platform, powered by transformative technologies such as Cloud and Big Data Analytics, will host IoT applications for global users.

The result of such a layered architecture is a globally accessible network of things, providers and consumers, who can create businesses, contribute content, generate and purchase new services.

IoT enabling technologies

Figure – IoT Applications with ‘enabling’ technologies (Source: Freescale.com)

While connectivity devices and technologies have existed for long, each of the discussed components in the IoT architecture will need to go through one or more evolution phases, until it passes the following two key litmus tests, which drive the need for innovation and eventually standardization.

  1. Is it technically & economically effective and efficient for IoT scale?
  2. Can it handle the implications of operating in an open world?

So, let’s take a look at industry efforts around innovation and standardization, in the next section.

IoT – Interoperability through Standards & Collaboration

With IoT predicted to create roughly $15 trillion in value-at-stake over the next 10 years, we have just about every vendor exploring market adjacencies to tap into this market potential. About 66% of the total value i.e. $9.5T is forecast to come from specific industry verticals (e.g. Smart Grid, Connected Healthcare), while the remaining 34% or $4.9T would be derived horizontally across industries (e.g. knowledge-worker productivity, travel avoidance). This has prompted vendors to form alliances among themselves, and with SDOs to drive vertical specific and cross-industry innovations/standardizations as applicable.

While standards bodies are effective at reaching multi-stakeholder consensus, and are efficient in comparison to letting market pick winners and losers among alternative technologies, industry-led consortia are comparatively faster, adaptive to market conditions and allow dominant players to push forward their interests. Cross-industry collaboration could also help quicken the pace of IoT adoption by increasing affordability through economy of scale benefits, and enable future applications as yet unimagined.

Below is a snapshot of the heterogeneous standards environment in IoT, as of a couple of years ago. Since then, the ecosystem has seen more protocol innovations – ModBus, IEEE WAVE 1609.2, Deterministic Ethernet, Power line communication (IEEE 1901 and IEEE 1901.2), Ultra-Wide Bandwidth (UWB) Technology to name a few – and thus further increasing heterogeneity. However vendors and SDOs are involved in concerted efforts, to bring in interoperability among architectural layers and across IoT applications spanning industry verticals, to address this dire challenge in realizing the vision of Internet-of-Things.

IoT heterogenous technologies

Figure – Heterogeneous standards environment in IoT (Source: IoT Research EU)

Let’s now go over various working groups that have sprouted in the IoT space, to supplement technology innovation and interoperability efforts from SDOs such as ITU-T, ETSI, IEEE and IETF typically focused on specific OSI layers, or drive joint standards across SDOs.

Working Group (Active Since) Charter Founding Members

IPSO Alliance (Sep 2008)

Establish Internet Protocol (IP) as the network to interconnect smart objects, and allow existing infrastructure to be readily used without translation gateways or proxies ARM, Atmel, Bosch, Cooper, Dust Networks, EDF, Ericsson, Freescale et al

IoT-A (2010-2013)

Developed an architectural reference model to allow seamless integration of heterogeneous IoT technologies into a coherent architecture to realize ‘Internet of Things’ rather than ‘Intranet of Things’ ALU, Hitachi, IBM, NEC, NXP, SAP, Siemens, and universities – “Mission Accomplished late 2013”

oneM2M (2012)

Develop technical specifications for a common M2M Service Layer to allow connectivity between devices and various M2M applications, to realize horizontally integrated Internet-of-Things Leading ICT standards bodies namely ETSI, ARIB, TTC, ATIS, TIA, CCSA and TTA

AllSeen Alliance (2013)

Collaborate for an open, universal IoT software framework across devices and industry applications, based on AllJoyn open source project, originally developed by Qualcomm but now released to community developers Qualcomm, in collaboration with Linux Foundation

Industrial Internet Consortium (Mar 2014)

Accelerate development and adoption of intelligent industrial automation for public usecases AT&T, Cisco, GE, Intel, IBM

HomePlug Alliance (Apr 2014)

Develop technology specs for powerline networking to enable home connectivity AMD, 3Com, Cisco, Intel, Intellon, Texas Instruments, Motorola, Panasonic at al

HyperCat (May 2014)

Develop an open specification for IoT that will make data available in a way that others could make use of it, through a thin interoperability layer. ARM, BT, IBM, Intel, Living PlanIT, et al

Open Interconnect Consortium (Jul 2014)

Define interoperable device communication standards (for peer-to-peer, mesh & bridging, reporting & control etc.) across verticals, and provide an open source implementation Atmel, Broadcom, Dell, Intel, Samsung and Wind River

IEEE P2413 (Jul 2014)

Create a standard interoperability architecture and define commonly understood data objects, for information sharing across IoT systems; Standardization targeted by 2016 IEEE; collaborating with oneM2M, ETSI and other SDOs to evolve joint standards

Thread (2014)

Create an open, secure, simple, power-efficient protocol, based on robust mesh network that runs over standard 802.15.4 radios, and can support a wide variety of home products ARM, Freescale, Nest, Samsung, Silicon Labs, Yale

OMA LWM2M (2014)

Proposed a new Light-weight M2M protocol standard, based on client-server model for remote management of M2M devices and related service enablement  OMA

I haven’t included Apple’s HomeKit & HealthKit software platforms though the company has inducted home/health device manufacturing partners, as it wasn’t really a consortium led effort.

As can be observed in the above table, many more vendors have come together in 2014 and formed new consortia, with well-defined charters even if overlapping ones. The developments so far have been encouraging, given that the ecosystem has been able to drive protocol innovations (e.g. IEEE 802.15.4, LWM2M, HyperCat, etc.), arrive at an all-encompassing IoT reference architecture (IoT-A), identify the need for an M2M Service Layer (oneM2M) and hopefully many more that I’m not aware of, as yet.

Are things looking up for IoT?

Let me wrap up the post with my key takeaways, from this analysis on IoT technology demands and developments.

Given that multiple protocols have always existed in each layer of the OSI stack, it is not realistic to expect the ecosystem to converge on a single protocol standard for each layer in the IoT reference model, considering varied applications needs and diverse physical operating conditions for connected devices. However, basic IoT requirements such as humungous scale, strong security safeguards and lightweight remote management of devices will demand that protocols in each layer be evolved or revamped. Consortia led efforts such as those from Thread, HomePlug, HyperCat and OMA LWM2M that drive focused innovations, keeping in mind the need to ease migration or ensure compatibility with existing protocol standards and other IoT advancements, should help quicken the pace of IoT evolution.

On the technology standardization/innovation front, the pace of development has been assuring [as I’ve elaborated in the previous section]. There certainly aren’t any tangential efforts or cases of SDOs locking horns that could potentially dampen IoT technology development. On the contrary, multiple interop events are being jointly organized to evolve IoT specifications, especially around IoT architecture and its critical M2M Service Layer to accommodate IoT applications spanning industry sectors.

There is certainly lot more for me to dig into on the IoT front, be it developments in sensors/actuators/smart objects, IoT gateways & controllers, communication protocols, security or management aspects. Now that we’ve figured out how to make sense of the Internet-of-Things world, what we know about it and what we don’t, let’s learn more on IoT aspects that interest us and redefine the coordinates of IoT, some another day.

Until then, if you have any views on my post that could help me reconstruct Internet-of-Things better, please share those in the comments section.

Are Smart Cities becoming a reality with Internet of Everything?

Our planet is now urbanized. In the words of Ban Ki-moon, UN Secretary General, we now live in the ‘urban century’.

In the 20th century, only 1 in every 10 people lived in urban areas. It was in 2008 that the world’s population crossed the line of being more than 50% urban, which is projected to reach 75% by 2050. While this could fuel economic growth, rapid urbanization will heavily stress infrastructure in cities, given higher demands for transportation, water, energy, housing, healthcare et al. With finite resources, limited budgets, demographic shifts and climate change concerns, cities will need to use innovative technologies strategically, to achieve urban efficiency and improve quality of living, and most importantly make it economically and environmentally sustainable.

What makes a Smart City?

There is no tipping point after which a city using ICT can be termed ‘smart’. However, I’ve listed some solutions that can increase a city’s urban performance and smartness quotient, by optimizing use of natural resources, improving cost efficiencies and managing infrastructure that keeps cities running smoothly.

smart city concept

Smart City Concept – Source: DefenseForumIndia.com

Energy

  • Intelligent and weather adaptive street lighting
  • Smart energy meters in homes and businesses to regulate energy consumption
  • Homes feeding any excess solar energy harvested into smart electric grid

Transportation

  • Smart traffic management by discovering emergency routes and intelligently rerouting traffic in case of adverse climate conditions, accidents or traffic jams using smart billboards
  • Monitoring vehicle and pedestrian levels to optimize driving routes, traffic lights and installation of overhead walkways
  • Monitoring of parking spaces and providing drivers assistance in locating an empty slot

Safety & Security

  • Video surveillance solutions to monitor crime levels, and automatic-sense-and-respond capabilities to prevent or contain natural disaster damages, and improve evacuation/police/ambulance/fire service response
  • Monitoring of vibrations and material conditions in buildings, bridges and historical monuments

Waste

  • Detection of waste type and fill levels in containers to optimize trash collection routes and methods
  • Automated tunneling of waste to compost plans in multi-tenant dwelling units
  • Automation of waste segregation and treatment plants

Water

  • Detection of water leaks using sensors and pipe pressure variation, to fix aging infrastructure
  • Monitoring water quality to ensure optimum level of chemicals used to treat water, and detection of impurities
  • Smart water meters for better gauging consumption levels
  • Storm water and waste water treatment plants

Air

  • Monitoring of pollutants and radiation levels in manufacturing and nuclear zones, to generate leakage alerts and avert health threats to local citizens
  • Monitoring noise levels in school, hospital and central zones

Internet-of-Everything for Cities

Traditionally, cities have built infrastructure silos to address transportation, energy, water, safety, waste management and similar needs. Also, vertical specific IT infrastructure management solutions were deployed to reap technology benefits. However such independent solutions rule out the possibility of sharing information, intelligence and IT resources across various city infrastructure, stunting the potential to scale and keep up with growing urban population.

In the Smart City context, IoE will serve as a digital overlay to unify city infrastructure, its people, things and data. IoE is not a single technology, but rather a concept. Just like the Internet, the Internet of Everything (IoE) would come alive as a system that can provide Smart City services, once the stack of ICT hardware and software intelligence is added to the underlying physical infrastructure, to enable P2M (people-to-machine) and M2M (machine-to-machine) communication.

In a larger context, IoE is thought of as the confluence of consumer, business and industrial internet, and thus can enable P2P (people-to-people), P2M and M2M communications. IoE connects people, data, things, and processes in networks of billions of automatic connections, unlike today where one must proactively connect to the network and to one another, via personal devices and social media to gather information. These automatic connections would create vast amounts of data, which when analyzed (either in information-gathering end devices or fog computing nodes) and used intelligently can allow for real-time decision making, and thus have boundless applications including Smart City solutions listed in the previous section.

IoE Architecture for Smart Cities

So, for a Smart City to be built, what are the various components that need to come together?

Smart City OS is the virtual application platform that aggregates open innovations from businesses and individual citizens (e.g. Smart City apps built using intelligence from the Digital Cloud on which it runs), and serve as the ‘operations center’ for public services.

The Digital Cloud denotes the cloud platform (with embedded SW & HW) that aggregates intelligence spanning multiple usecases of Smart Cities, and more widely across IoE applications, namely Smart Homes, Smart Transport (V2X applications), Smart Industry, Smart Health and Smart Living/Entertainment, that could benefit from composite information. Having a national digital grid for various IoE applications could help tap into gestalt effect benefits. A high-speed communications network built on fiber-optic and WiFi backbone network, will serve as the hardware platform that moves data from smart objects  and sensors for aggregation in the Digital Cloud.

Sensor/Machine infrastructure will be formed of existing physical infrastructure equipped with wired/wireless sensors and M2M devices, to enable detection and notification of events to the higher layers in the IoE architecture.

IoE cities - high level architecture

IoE for Cities – Architectural Framework; Source Cisco.com

My understanding is that existing CAM (Cloud/Analytics/Mobility) technologies are ready to power up the ‘Digital Grid’ in the above architecture and make it functional today, though the complexities of a large-scale national deployment are yet to be understood.

The key challenge would be in identifying cost-effective, scalable and interoperable technology solutions to build the Sensor/Machine infrastructure, and ensuring raw/filtered data flows through to the Digital Grid. As can be expected, considerable work needs to be done to actualize the operations center, ‘Smart City OS’.

Smart City Pilot Projects

According to a BBC news article, IBM had nearly 2500 smart city projects around the world in 2013. Few governments that the company had worked with, on Smart City initiatives, are Dublin (parking), Dubuque (water), Rio (traffic), Singapore (traffic), Stockholm (traffic) and California (traffic). There have also been citizen/academia led initiatives as those in New York (rain water/sewage) and London (pollution, weather, river levels) where the local government opened up city data to the public, or Kickstarter Air Quality Egg project led by Pachube (now Xively) which deploys its own sensing network globally, to gather air quality data. While pilot projects in existing cities have focused on specific applications, new cities have been successful in testing and deploying more comprehensive Smart City solutions.

Over the years, infrastructure has gotten smarter in existing cities, as retrofitting smart tech into existing infrastructure has been going on for a while now, in just about every main city around the world – Barcelona, Copenhagen, Vienna, Manchester, Paris, Chicago, New York, San Francisco, Philadelphia, Boston, Seattle, Orlando, Dublin, London, Amsterdam, Rio de Janeiro, Sofia, Johannesburg, Singapore, Hong Kong, Santiago de Chile, Mexico City, Bogota – to name a few that I specifically came across online, in this context.

Significant Greenfield Projects

In Asia-Pacific and Middle East, there have been a number of greenfield projects, with Songdo in South Korea and Masdar in UAE being the best known examples. In such greenfield projects, deployment of the ICT infrastructure is planned into the city’s construction from the beginning, allowing for systems to be integrated.

Work has been underway since 2004, on the $35B project termed Songdo International Business District, a 1500-acre new city in South Korea targeted to accommodate 15,000 smart homes, 65,000 residents and 300,000 commuters by 2018. The city boasts of a pneumatic system that funnels garbage to its waste-energy generation plant, and ICT infrastructure that allows for better monitoring of energy use, traffic, water and waste, apart from remote healthcare, virtual concierge, intelligent tutoring services made possible through high quality video communication infrastructure. With 33,000 residents so far, the capital energy use in Songdo is on average 40% lower than any existing city of comparable scale. It is reported that the developer charges a premium for these homes given its sustainability features. And so, in such private projects, affordability will be the key for smart cities to be socially accepted and become the norm.

High-tech solar-powered Masdar City in UAE, which is claimed to be the world’s first sustainable city, set out to prove that cities can be sustainable even in environmental conditions as harsh as its deserts. The city is managed by the Abu Dhabi government via a subsidiary, and it boasts of green educational and business districts with driverless electric vehicles, reduced demand for air conditioning, movement sensors replacing light switches and taps, clean air etc. However, it had come under fire for being sparely populated because of lack of affordable housing. It is expected to break ground soon with 500 smart private homes which will grow upto 2000 homes, 40000 residents and 50000 workers at a price tag of $15-18B.

PlanIT Valley near Porto, Portugal is yet another smart city that would be built ground up by Living PlanIT with smart technologies from various vendors, and its own Urban Operating System (UOS) to manage daily processes and gather data from smart sensors deployed throughout the city. This smart city development has been classified as a Project of National Interest (PIN) and is backed by the local government, while Living PlanIT’s founders themselves own the land and fund it.

Panasonic developed Fujisawa SST (Sustainable Smart Town) in Japan is aimed at showcasing renewable (solar) energy generation while reducing greenhouse gas emissions and cutting water use. The smart town continues to grow its resident base, as it develops its infrastructure to house 1000 households, when the project completes in 2018. The city manages its energy needs in real time by coordinating data from sensors linked to home appliances.

Does it pay to invest in Smart Cities?

Every greenfield smart city project, be it Songdo/Masdar/Fujisawa SST, has resulted in substantial environmental benefits through energy/water savings and reduced carbon emissions. However, there have been limited disclosures on profitability and TCO savings of such greenfield ventures and assumed lifespan for these calculations.

In comparison, ROI has been captured and made public for brownfield smart city projects. For e.g., Smart garbage solution in Finland brought down waste collection costs by 30%; smart lighting solution in UK resulted in 7% crime reduction without additional manpower investments; immersive video conferencing in USA lowered travel expenses by 15%. In addition, digitization is seen to fuel GDP growth and lower unemployment rate through job creation.

Given steady demographic shifts toward an aging population in major economies, Smart City initiatives could help lower workforce costs by automating City Infrastructure Management (CIM). Also, CIM maintenance activities could potentially be outsourced to other countries around the world, as they can be remotely managed.

So, are Smart Cities becoming a reality with IoE?

Well, given significant pilot efforts around the globe, I’d say that we’ve now entered the era of Smart Cities. How soon it touches us depends on where you or I live in the world!

I started exploring this topic, as we’d soon be hosting a panel discussion, on ‘Realizing the vision of Smart Cities in India’, in a local industry conference. The field of smart technologies and IoE/IoT is certainly buzzing with activity, given entrepreneurial efforts and industry cum government initiatives such as IoT Living Lab in Electronic City, 100 Smart Cities in India, Digital India, etc.

This Smart City intro post was mostly focused on aspects of environmental sciences [subject seemed to lack zing and certainly not my cup of tea!]. Let me now get back to my IT roots and explore underlying IoE technologies beyond Cloud-Analytics-Mobile-Social, their maturity level, industry standardization efforts, affordability factor and thus how far/close we are to widespread adoption.

[And there goes my last bit of resolve to limit post lengths. Well, my blog posts are anyways intended to capture a well-rounded perspective on what is going on in a technology area/market segment, rather than border on technology journalism.]

Will be back with more details, on IoE efforts around the globe or more closer home to make world’s cities smarter, as I prep up for the panel discussion in the coming weeks.

How is the SDN landscape shaping up? (Part-2) – A market perspective

Will SDN and its accompaniments (NFV/NV) bring back the glorious days prior to the telecom bust, and pull the networking industry out of its technological plateau? I presume, most would say that the wave has already started. It is surmised that SDN has brought back venture money to networking after years of drought, years that were mostly sustained by bigwigs of the industry, through spin-ins and strategic investments in startups. SDN has found huge favor with VCs, with startups in this space having raised nearly $500M in 2013. Successful exits in the last couple of years, including Nicira Networks (VMWare), Contrail Systems (Juniper), Vyatta (Brocade), Tail-f (Cisco), Xpliant (Cavium)  and Cyan (IPO), have further spurred VC funding in the SDN market.

I will explore the SDN market landscape in this post, having covered technical aspects in my previous post.

What business potential does SDN hold?

To start off with vendor perspective, there are 2 facets to be understood –

  1. How will SDN impact customer spending in existing market?
  2. What is SDN’s potential for new market creation?

Roughly 30% of the total networking spend from Service Providers, Enterprises and Web Hosting companies is expected to be related to SDN. So, existing market players need to buckle up and be ready with committed SDN roadmaps and solutions, to still be relevant in the SDN driven networking market, as the technology is emerging as a significant influencer in network purchasing decisions. Customers are increasingly seen to evaluate how solutions and equipment procured today will fit into an SDN environment in the future. Market research firms (and VC firm Lightspeed Venture Partners) forecast impact of SDN to exceed $25B per annum, which could potentially be as high as $35B, by 2018. In comparison, the overall networking TAM is estimated to grow from the current value of $75B to $90B by 2018.

Plexxi-SDN-Report-319x301

Figure – SDN existing market impact (Source: Plexxi)

Except for speeds and feeds, networking innovation has been lagging behind compute (servers) and storage, the other two key blocks in any data center infrastructure. The advent of SDN/NFV/NV should help effectively virtualize any IT infrastructure environment by complementing compute and storage solutions, and propel new market creation, at a pace faster than seen with server virtualization in the mid-2000s. According to IDC, the new SDN market TAM will reach a total value of $3.7B by 2016, and touch $8B by 2018.

sdn market 1

Figure – SDN new market potential (Source: IDC, Image Source: IT World)

Moving on to customer perspective, SDN assures businesses outcomes of better revenues (by enabling network monetization through improved service velocity), and lower TCO (through automated control and higher network resource utilization). As with any emerging technology, customers are apprehensive whether SDN can deliver on its promises, and are unsure of its potential to become mainstream. Broad incumbent support for SDN and significant open community efforts are expected to accelerate maturation of SDN technology, and help realize usecases.

While SDN related needs have mostly been latent, Google (a founding member of ONF) developed an in-house solution for its inter-datacenter WAN deployment with centralized traffic engineering, using OpenFlow based SDN, as early as 2012. Other marquee SDN adopters include Amazon, eBay, Rackspace and Baidu.

What are the leading usecases?

Key usecases for SDN include public & private cloud, WAN traffic optimization, dynamic WAN interconnects and re-routes, network virtualization, automated network management, service chaining, network analytics, automated malware quarantine, granular flow based DDoS mitigation et al. [I could plan to cover each of these in detail in some future post.]

The usecases span Data Centers, Enterprise campuses, Cloud Providers, Service Providers and even SMBs, as latter segment could amply gain from SDN’s value proposition of IT infrastructure simplicity.

It wouldn’t thus be an exaggeration to conclude that SDN is going to impact every customer segment and use case of the networking market, and thus no customer or vendor is going to be immune to SDN driven change.

The Ongoing Controller War

SDN has driven the emergence of a new class of products, the SDN controller. With the controller being a strategic control point of any SDN network, vendors are vying for significant mindshare of their respective SDN architecture and controller solution, to eventually translate it into a sizeable market share.

Most dominant vendors in the industry are working on their own controller offerings [Refer next section for list of SDN controllers] to better chart the course of controller evolution and development, turn on potential software differentiation and hardware-assist features, and effectively orchestrate their range of infrastructure equipment offerings.

In addition, vendors (in collaboration with the Linux Foundation) have created an open-source platform for SDN, the OpenDaylight Project (ODP), to enable SDN adoption by accelerating technology development through ODP’s Open Controller.

A community-driven, common and trusted Open Controller would ensure network component interoperability across vendor offerings, both within and across architectural layers. The goal is also to promote multi-vendor environments, in comparison to today’s networks where each tier is typically populated with single vendor solutions. Network architects have been advocating open source/standard approaches to liberate customers from vendor lock-in challenges. Customers such as cloud and internet service providers, using such open standards based solutions, could still differentiate their end user offerings, by incorporating their secret sauce in the application layer. [Refer ‘SDN architecture’ section in my previous post for various layers].

Arguably, vendors who do not have their own controller but are participating actively in OpenDaylight community efforts seem to be betting the most for ODP controller to take off. Meanwhile, the controller war has heated up with the entry of other open source controllers which have been making news, namely Juniper’s OpenContrail and ON.LAB’s ONOS.

So, let us take a quick look at how these controllers differ.

ODP architecture has a single uber-controller and is primarily datacenter focused today, but could service WAN usecases as well.

ONOS targets SP WAN usecases with an architectural focus on fault-tolerance and state distribution across multiple controllers, to address high availability and bottleneck concerns with a single uber-controller. The challenge here is the need to orchestrate among these multiple controllers.

OpenContrail architecture is built for centralized control but with distributed physical components for fault tolerance. OpenContrail is very routing centric and focused on solving multi-tenant issues for SPs. Experts opine that its scope to extend to other customer segments is limited, given the lack of an abstraction layer to support multiple southbound/northbound interfaces, unlike ODP. [Refer ‘A Deep Dive into OpenDaylight components’ section in my previous post].

With vendors and communities tweaking their architectures and evolving their solutions over time, it is early to predict any potential winners. With SDN deployments being sporadic till date, it will be sometime before contenders evolve their offerings, prove their mettle in actual deployments and emerge successful.

The gamut of SDN offerings (ODP/ONF members only)

Given that there is a whirlwind of SDN activities in every nook and corner of the industry, I’ve opted to limit my evaluation to current members of ODP & ONF. To get a feel of the number of firms out there in the SDN ecosystem, refer list of players.

With SDN taking the world by storm [well, that might have be an exaggeration though – just got carried away for a bit, but here is what I wanted to say], I think it is inevitable for networking vendors to make their existing HW equipment and OS offerings SDN-ready, if they don’t want to be left behind. Also, a new group of players have emerged with niche SDN applications and orchestration platforms (e.g. PLUMGrid with its OpenStack Networking Suite).

While adding SDN capability to (legacy/existing) equipment and appliances would ensure investment protection for customers, SDN and orchestration applications (which are still taking shape and quite customer/segment specific) are key to delivering real customer value through SDN. However, adoption of SDN controllers, the pivotal component in the SDN architecture, is a precursor for customers to tap into potential of SDN applications.

I’ve chosen to focus my analysis in this post, on SDN offerings from (1) those who’ve taken the plunge by putting out (or working on) SDN controllers in the market, and (2) those who have built pure software switches, that can go with generic (x86?) hardware, towards realizing the joint vision of SDN/NFV/NV.

And, here we go with the list of SDN offerings!

(1) SDN Controller Platforms

Just a quick reminder that SDN controllers are only software platforms. And, yes, they do need a host to run on. [Would welcome inputs on hosts you’ve seen being used in SDN deployments.]

I’ve removed the term ‘Controller’ from product names in the below table. Thought it was understood.

logo_abbnDBSM

logo_atto

OBelle

logo_bigswitchBig Tap

logo_brocade_0Brocade Vyatta

logo_cienaAgility Multilayer WAN (MLWC)

logo_ciscoXNC (ODP based), ONE, APIC (Insieme)

logo_citrixNetScaler logo_coriantIntelligent Optical Control (IOC)

logo_cyanBlue Planet

logo_dellActive Fabric

logo_etriETRI

logo_extremenetODP-based with extensions
logo_hpVirtual Application Networks logo_huaweiCarrier-class SDN (SNC) logo_ibmProgrammable Network (PNC)
logo_inocybeSustainable SDN (ODP-based) logo_juniperOpenContrail, NorthStar Network (NNC) logo_nclVirtual Network (VNC2.0)
logo_necProgrammable Flow PF6800 logo_nttRyu OpenFlow (used by Pica8 too) logo_nuageVirtualized Services (VSC)
logo_oracleOracle logo_plexxiPlexxi Control logo_vmwareNSX (Nicira)

logo_opendaylightOpenDaylight (ODP)

(2) SDN Packet Processing Platforms

Here is the list of software-based SDN packet processing platforms, built to run on generic hardware. These would technically fall under the gamut of SDN-ready NFV products, though they align with SDN vision.

logo_6wind6WindGate logo_aricent

Fast Path Accelerator

logo_bigswitch
Switch Light
logo_brocade_0Vyatta 5400/5600 vRouter logo_microsoftCisco vPE-F logo_microsoftHyper-V vSwitch
logo_midokura
MidoNet
logo_nec

ProgrammableFlow vSwitch

logo_pica8

Integrated Open vSwitch

Interestingly, the ecosystem has also seen the entry of Intel into Ethernet switch market, with FM5000/FM6000 series of SDN-enabled ASICs.

Commercial SDN deployments

Now, let me run SDN through the market adoption test!

As we saw, there is no dearth of SDN offerings in the market. But, but, has SDN really taken off? I chose to evaluate this based on public customer references. I thought I’d be amply surprised if any vendor deployed SDN commercially and didn’t get their marketing folks to put together public customer references, or if carriers/web hosting companies didn’t want to make a splash of having adopted SDN. And well, I was surprised. Anyways, more on this later.

Going based on public references, here are the commercial SDN deployments of vendors. I’ve kept out in-house SDN solutions developed/deployed by customers such as Google, NTT, AT&T, Amazon, Microsoft, Facebook, given the number of makeshift offerings masquerading themselves as SDN solutions, and the many paths to realizing SDN benefits of programmability and improved service velocity.

SDN Vendors Customers Deployment Usecases
logo_contextreme logo_verizon SP Carrier Network
logo_bigswitch logo_csmresearch Private Cloud
logo_huawei logo_chinatel Data Centers
logo_huawei logo_21vianet Cloud Data Centers
logo_nec logo_ntt Cloud Data Centers
logo_nicira logo_ebay Data Centers (Prior to VMWare acquisition)
logo_nicira logo_rackspace Data Centers (Prior to VMWare acquisition)

Additionally, Cyan reports on its website that Blue Planet SDN platform has been implemented in 120 production networks, but I didn’t come across any other public material.

Seems too short a list right? Either these are the only commercial deployments or public references aren’t the way to go. Didn’t think that SDN customers (and not just startups like Versa Networks, GuardiCore) would be in stealth mode. If you know of any more SDN deployments, please point me to public references online.

If you could spare time for a deep-dive, do take a look at the links I’ve embedded (at the start of this section) on in-house SDN solutions being worked on by SDN “customers”. Interesting to see how boundaries are continuing to blur across vendors and customers!

May the best win (or atleast each find their niche) and the ecosystem prosper!

That was quite a long post! I really hope to limit the post length in future.

Meanwhile, if you have any feedback on this one, please post a comment or drop an email. I’m certain there’d be alternate views, especially given the murky state of the SDN market.

How is the SDN landscape shaping up? (Part-1)

“May you live in interesting times!” goes a Chinese curse! And interesting times, while challenging and marked with uncertainty, have immense potential waiting to be realized. The SDN canvas, dotted with hopeful startups, open source communities and networking behemoths edging their way in (seemingly a little too early, without giving startups a chance to have had a good run in the new market), is brimming with opportunities and promises to accelerate service velocity through automation and orchestration, while being cost-effective.

The famed SDN definition

To fast-forward through Software Defined Network (SDN) evolution to this date, SDN started off with a compelling vision to centralize network control plane in a network controller, and strip off intelligence from the distributed data plane, to provide administrators an environment that is generic, open, extendable and centrally manageable. With few takers to rebuild control plane solutions from scratch, the vision has evolved to accelerate deployment of end user applications in a secure and scalable manner.

Control Plane Approaches

OpenFlow’s imperative and OpFlex’s declarative model are the two main control plane approaches in the SDN market.

In the imperative model as in OpenFlow, the controller fully instructs routers and switches on how to move packets based on application requests, with no control intelligence embedded in the distributed data path network elements. The imperative model suffers from the drawbacks of centralized controller becoming a bottleneck and single point of failure in the network.

In contrast, the declarative model implemented in OpFlex, allows for more distributed intelligence. The controller sets a central policy based on the application needs but gives power for network nodes to determine how best to execute the said policies and meet the application needs. In this approach, the network can sustain itself even if the controller fails, allowing for better availability and resiliency. The network can also scale better as the controller is no longer the sole brain of the network. Cisco’s Application Centric Infrastructure (ACI) framework is based on the declarative model.

Open Standards and Development Communities

Open communities such as Open Networking Foundation (ONF), OpenStack and OpenDaylight have played a pivotal role in bringing together IT, cloud and telecom service providers, compute, storage and network equipment vendors & silicon providers, technologists, developers and researchers, to streamline efforts in formulating open standards and following through with open source development, promotion and adoption of SDN.

Open Networking Foundation (ONF) is accredited with introducing OpenFlow, the first SDN standard and vital element of the open SDN architecture.

OpenStack – Unlike ONF which is limited to networking and is a standards community, OpenStack encompasses compute, storage and networking (refer figure below), and is a developer community focused on cloud environment solutions. OpenStack software is an open-source cloud operating system that helps control large pools of compute, storage and networking resources throughout a datacenter, allows administrators to manage resources through the OpenStack dashboard, and empowers application users to provision resources, through a web interface, transparently orchestrating across compute, storage and networking blocks.

OpenStack is architected to provide flexibility as businesses design their public/private clouds, with no proprietary hardware or software requirements and the ability to integrate with legacy systems and third-party technologies.

OpenStack has multiple official programs targeted for specific architectural blocks such as Nova (Compute), Swift (Storage), Quantum replacing earlier termed Neutron (Networking), Heat (Orchestration) etc. to provide plugins to deliver each of these blocks as a service, for e.g. Network as a Service in the cloud environment, and thus the value proposition of programmability in the SDN/SDCC paradigm, though in a cloud environment.

openstack-software-diagram

Figure – OpenStack Architecture – Sourced from openstack.org

OpenDaylight is an open platform that any enterprise or provider can use today, to enable SDN and NFV through programmability of networks of any scale. The software is combination of components and includes a fully pluggable controller, interfaces, protocol plug-ins and applications.

SDN Architecture

Towards realizing the SDN vision, Open communities and foundations comprised of business users, vendors, technologists and researchers, have arrived at a basic SDN architecture [captured in below figure], which consists of 3 main layers – Application, SDN controller and Network Infrastructure.

The SDN controller which houses the intelligent control plane, interfaces between user services & applications and the network on which they run (latter denoting the distributed packet-forwarding data plane in an SDN environment), with the goal of abstracting the network, so that application developers can tune the network to meet application needs, without having to understand the inner workings of the network.

SDN-Archit-fig1

Figure – Basic SDN architecture – Sourced from Datacenterjournal.com

Northbound APIs are used for communication between the controller and application layer, to enable efficient orchestration and automation of the network to align with needs of different applications, while Southbound APIs are used between the controller and infrastructure layer.

To demystify the term ‘orchestration’, as in a musical orchestra, this function in the SDN architecture ensures that various resources (for e.g. compute, storage and network blocks in a data center) are controlled by a common entity to align with or complement each other and work synergistically to meet the business application needs.

Let us now see how OpenStack, OpenFlow and OpenDaylight fit into the SDN architecture.

OpenStack allows for orchestration in a cloud environment to deliver networking-as-a-service, through OpenStack Quantum APIs that I discussed earlier in this article.

OpenFlow is a open-standard Southbound communication protocol that enables OpenFlow SDN controller  to add and delete flow tables entries in OpenFlow switches and routers, and thus control flows for optimal performance and eventually make the network more responsive to real-time traffic demands. More to this, when I explore OpenDaylight in the next section.

OpenDaylight is a complete implementation of SDN and I think the below framework summarizes it best. Let me also repeat how I introduced OpenDaylight earlier in this article. OpenDaylight is an open source software platform that implements a pluggable controller, northbound programmatic & southbound implementation interfaces, protocol plug-ins and applications, that anyone can use today (that’s right, today!) to evaluate, commercialize, and deploy SDN (and NFV – the topic I’ve saved for a future blog post). The controller is contained within its own Java Virtual Machine, and can be deployed on platform that runs Java.

odp_ds_ltr_diagram

OpenDaylight Framework – Sourced from OpenDaylight.org

A Deep Dive into OpenDaylight components

In this section, I’ll go over various protocols and stacks that you would hear of in the SDN context and see how they fit into the SDN architecture, based on the OpenDaylight framework [refer figure above].

Northbound Interfaces

Northbound APIs are the most critical in the SDN environment, as the value of SDN is closely tied to the innovative applications it can potentially support and enable. Given that they must support a wide variety of applications, a variety of possible interfaces currently exist to control different types of applications via an SDN controller. Consolidation of these interfaces is yet to happen, given that SDN usecases are still evolving.

OpenDaylight supports OSGi framework and bidirectional REST for northbound APIs. The OSGi framework is used for applications that will run in the same address space as the controller. In comparison, REST (HTTP based) APIs are used for applications that do not run in the same address space or even necessarily on the same machine, as the controller.

Northbound APIs are also used to integrate the SDN controller with automation stacks such as Puppet, Chef, Salt, Ansible and CFEngine. As we saw earlier, they also help interface with orchestration platforms such as OpenStack.

SDN Applications that can be optimized via Northbound interfaces include load balancers, firewalls and other software-defined security (SDSec) services.

Southbound Interfaces

The southbound interface is capable of supporting multiple protocols for (1) managing physical and virtual network elements, (2) operating on the control plane to allow for controller driven programmability, or communicating network state/events and (3) configuring data forwarding plane on distributed physical and virtual network elements. Networking equipment vendors implement one or more protocols in the above categories, to add SDN capability to their legacy equipment, thus ensuring investment protection for their existing installed base, during the move to SDN.

NETCONF, OF-CONFIG using YANG data models, SNMP and XMPP operate in the management plane and allow for network device configuration and monitoring.

I2RS, PCEP, BGP-FlowSpec and BGP-LS are protocols that operate on the control plane and either update routing tables in a programmatic way, allow for creation of MPLS-TE tunnels from a central controller and communication of computed LSP paths to network nodes, automate distribution of traffic filter lists for DDoS mitigation or help export link/topology/tunnel states through BGP to the controller.

OpenFlow (v1.0, v1.3), LISP and OVSDB are protocols that allow the controller to configure flows tables and influence the forwarding behavior of physical and virtual devices.

[I could explore these protocols in-depth in another blog post, to limit the length of this one.]

SDN Controller

As discussed earlier, the controller is the key arbitrator between network applications and network infrastructure, and forms the crux of the SDN network. To be able to effectively centralize the intelligent control plane, the controller typically implements base network functions to provide host/node service, flow service, topology service, path service to setup and manage a path based on specified constraints, multi-tenant network virtualization, network statistics, security, centralized monitoring etc. In addition, it provides a collection of pluggable modules in the Service Abstraction Layer to support a variety of southbound interfaces.

[I’ll delve more into the SDN controller war among commercial and open source variants, in my subsequent post. I plan to post “part-2” of this topic soon, which will cover SDN market potential, overview of vendor architectures/solutions, SDN ecosystem of controllers, SDN-ready routers/switches and trending vendor solutions.]

To be continued..

Look forward to hearing your thoughts in the comments section. Would be glad to address any questions as well.

A peek into Security in the context of BYOD, Cloud and IoE

Is IT team’s worst security nightmare unfolding with rampant BYOD due to unsurpassed mobility, adoption of cloud and exploding interconnections foreseen in the world of Internet of Everything? Gone are the days when the most effective policy was to build a controlled environment, and secure oneself through limited access to the external world.

Increasingly sophisticated attacks, dynamic nature of threats, advanced persistent threats (APTs), thriving underworld hacking industry, attackers innovating in lockstep with evolving technology and cloud-based solutions stress the need for enhanced, integrated and scalable security solutions focused on prevention, detection, mitigation and remediation of attacks, across the entire span of user and network touch points, without leaving any security gaps due to fragmented solutions stitched together using disparate products.

Integrated security solutions, aimed at protecting network resources and content, could be on-premises or cloud based offerings. In the new paradigm of SDN, these solutions should allow for central policy management and distributed policy enforcement.

Security solutions typically build an intelligence ecosystem by analyzing and aggregating extensive telemetry data including web requests, emails, malware samples and network intrusions, to protect networks, endpoints, mobile devices, virtual systems, web and email from known and emerging threats.

Protecting Network and Virtual Resources

Network Security Solutions have evolved beyond traditional firewalls that control ingress and egress traffic flows according to predefined rules that help enforce a given security policy, be it through packet filters or application proxies. Traditional firewalls resort to stateful packet inspection (SPI) as against deep packet inspection (DPI), and are not capable of distinguishing one kind of web traffic from another. DPI is helpful in managing application and data driven threats by looking deep into the packet and making granular access control decisions based on packet header and payload data.

Secure site-to-site and remote user connectivity can be enabled through IPSec, SSL, L2TP or PPTP based VPNs. User authentication for remote access VPNs are typically carried out through RADIUS, TACACS, LDAP, Active Directory, Kerberos or SmartCards.

While earlier-gen Intrusion Detection Systems (IDS) passively scan traffic and report on threats, Intrusion Prevention Systems (IPS) sit directly behind firewalls, actively analyze traffic, thwart any denial-of-service (DoS) attacks & application hijack attempts, by dropping malicious packets, blocking traffic from the source address, resetting the connection and additionally notifying the administrator as IDSs do.

Unified Threat Management (UTM) systems perform multiple security functions such as DPI, IPS, IDS, NAT etc. in a single platform. Given that this approach involves multiple devices and separate internal engines that examine a packet multiple times to perform individual security functions, it adds packet latency resulting in degraded network performance, apart from increasing operational management overhead.

Next-gen firewalls (NGFWs) – With security requirements being critical to businesses, IT managers had to sacrifice on network performance to achieve network security using UTMs, until the advent of next-gen firewall solutions. NGFWs are application-aware and so can differentiate traffic flows, even if they share a common protocol/port combination. They perform optimized DPI to detect anomalies and known malware, by examining the packet only once and thus ensuring performance. In addition to DPI enabled application-awareness/granular control, and traditional firewall functions of NAT and VPN, NGFWs come with integrated signature-based IPS engine and ability to integrate intelligence from outside the firewall such as directory based policy, blacklists and whitelists.

Most important of all, apart from identifying and controlling use of predefined applications based on their signatures, NGFWs can learn new applications by watching how applications behave and alert administrators if there is any deviation from base-lined normal behavior.

NGFWs perform packet inspection and SSL decryption in high-performance hardware and so can perform full packet inspection without introducing latency.

Network Access Control (NAC) – Traditionally, NAC can restrict what devices get on a network and thus were intended to work well in a closed static environment with company-owned devices. The phenomenon of BYOD has caused security to move up to the application layer with IT teams enforcing access controls through mobile app wrappers and installation of device management profiles.

In addition to solutions for physical resources, Sandboxing is increasingly being used in virtual environments, to improve security by isolating a given application to contain damages due to malware, intruders and other users, especially in virtual desktop infrastructure solutions.

network security - opennetworking.org

Network Security Solutions – Sourced from opennetworking.org

Securing Content

Content Security Solutions protect users, email and data from inbound and outbound web security threats, and have evolved from standalone to hosted offerings.

Email Protection – Email security appliances keep critical business email safe from spams and malwares, with good spam capture rate, minimal false positives, fast blocking of new email transported viruses to avoid proliferation in the network, effective zero-hour antivirus solutions, and ability to scale threat analysis.

Web Security – In addition to effective malware protection, complete web security requires solutions to provide granular and nuanced policy knobs to control how end users access the internet, and implement proprietary and confidential data access controls through deep content analysis. Thus businesses can control access for specific features and applications such as messaging, audio, video based on user’s business requirements, without blocking access to entire websites or internet.

Security Market

While the overall security market opportunity is very strong, content security and traditional appliance/software markets are seeing a decline. Growth in hosted/SaaS solutions is offsetting the above downward trend to keep the overall security market flat.

Network security and content security market TAMs are at $6.5B and $2.8B respectively, with each growing at around 4% YoY.

Cisco leads the network security market with nearly 35% market share, followed by Check Point with 15% share and Fortinet, Palo Alto, Juniper and McAfee capturing between 5-8% of the market each.

Leaders in the content security market are Blue Coat, McAfee (Intel), Cisco, Websense and Symantec with each of these players having captured about 10-15% market share. While no single player currently dominates the market, top vendors have been extending their market reach through strategic partnerships and acquisitions.

Where do we go from here?

In the new world driven by SDN and Big Data analytics, security solutions will be evaluated on their ability to glean and integrate threat intelligence from the ever-growing ecosystem, dynamically update privileges and trust profiles of any user, device or application in real time to thwart or remediate any attack, and most importantly scale to actualize IoE, while hiding solution complexity for IT operations. Unlike other technology areas, majority of the security innovation is embedded deep inside the hardware/software offering with no inkling to the operations team or network user. Security is also a field where solution effectiveness is evaluated on the misses and not on instances of job well done, and so is quite often relegated to the back burner until a major flare-up.

What important aspects of security landscape have I missed out?

Can Enterprises and Service Providers fully mitigate personal data risks due to mobile apps, social networks and cloud hosting? If not, what measures do end users have to take, and what are the technology gaps?

What insights do you have into the mobile security market, or security needs of IoE?

Feel free to share your views in the comments section.

Servers – A key block in Data Center infrastructure business

Businesses use servers to centrally host various applications such as email, collaboration, firewall, file and print in a secure manner. In this article, I will go over the server market, vendors, technologies and categories of product offerings.

Potential – To start with the market potential, worldwide server TAM is estimated to be roughly $54B in 2014. YoY growth is forecasted to be around 1.5%, as companies perform their cyclical IT infrastructure refresh, post the slowdown during the financial crisis. Servers come in various price bands starting from few ‘000 $s to a couple of million $s. Demand for servers in the public cloud is expected to be the primary driver for server market growth, while server consolidation and virtualization is foreseen to dampen unit demand. Modular servers – blade and density-optimized servers – represent distinct segments of growth for vendors in an otherwise mature market.

Unit shipments are expected to grow by 5.5% in CY14, with higher volumes from lower price bands. To get a better perspective of the size of server market, let us compare it with overall IT spending. Gartner has projected worldwide IT spending of $143B for data center systems, and $320B for Enterprise software during the year. Servers thus account for nearly 38% of the IT systems spend, and a little over 11% of the combined HW & SW IT spending.

Demand drivers – Strategic focus by enterprises on data center and server consolidation, with latter driven by virtualization technologies, and adoption of SMAC (social, mobile, analytics and cloud) applications are among the key trends that will determine demand for various form factors and types of servers. SaaS providers such as Facebook, Google and Baidu, and additionally service providers are seen driving growth, especially for hyper-scale servers.

Server generations – The 1st generation of servers was largely based on mainframes and terminals, while the 2nd generation has been driven by the client/server model in the PC era, which uses LAN and Internet technologies for communication. The 3rd generation of servers is instead built on a foundation of SMAC technologies, with exploding number of connected users and apps which demand hyper-scale processing capability.

Vendors – Over the years, IBM, HP and Dell have been the dominant players in the server market, with offerings for all types of servers – blade, density-optimized, rack and tower servers. Cisco, which had earlier partnered with these firms to sell its storage and networking products to Enterprise data centers, entered the server business by launching its blade server product line in 2009. The current pecking order of players by market share is HP, IBM, Dell, Oracle and Cisco.

HP holds the number 1 position in the worldwide server market with over 25% revenue share, followed by IBM with nearly 24% share, Dell at roughly 17%, and Oracle and Cisco a little lower than 6% each. IBM which was seen to dominate to server market with over 35% market share in 2012, has not only dropped revenue share, but also announced its plan to offload low margin x86 hardware business to Lenovo. It will continue to play in this market with high margin System-z and non-x86 servers. Sales through ODMs such as Quanta and Inventec represented nearly 7% of overall server revenue. Majority of these sales were in the US market and primarily to Google, Amazon, Facebook and Rackspace.

Geography – US is the largest market for servers with nearly 39% of worldwide server TAM, followed by EMEA region with 22% and APEJ with 20%.

Server architectures – X86-based platforms have been the predominant architectural choice as it allows enterprises to run their non-mission critical applications at affordable price points. Non-x86 technologies such as RISC, CISC and EPIC were typically chosen for mission-critical applications and databases including ERP, CRM, data warehouses, business intelligence and analytics, where key considerations are reliability, availability and serviceability (RAS). The value proposition gap between non-86 and x86 servers has been shrinking due to advancement in x86 capacity and performance capabilities, and emergence of highly sophisticated x86 virtualization mechanisms. And so, the market for non-x86 technologies at high price points has been rapidly declining.

Market share by architecture – X86 servers account for over 78% of total server revenue. HP leads x86 market with nearly 30% market share, while Dell is next with 21% market share. Non-x86, the declining market segment, is led by IBM with nearly 70% market share. IBM controls most of CISC server market, and HP dominates EPIC server segment. IBM and Oracle are major players in RISC market, with IBM being the dominant player with 71% market share.

Product categories – Apart from technology, server offerings are also distinguished by form factor and are available as tower, rack, blade and density-optimized servers.

A tower server is a floor-standing unit with integrated processors, memory, I/O controllers and peripherals. These entry level server offerings are typically seen in SMBs which have no delineated lab space or data center facility, and are opted for when low cost is a priority, and there are limited scalability and network/storage connectivity requirements. The footprint usage of these units is limited.

server - tower

Figure 1 – Tower server form factor

A rack-mounted server, as the name indicates, fits into 19” wide rack units. These servers come in heights that are multiples of ‘U’s, each U being 1.75”. These are used when each node is of fairly large capacity; overall server configuration consists of considerable number of nodes, and when scalability is important. The footprint usage of this type of servers is moderate. These help address fluctuating workload challenges through a varying balance of processing, memory, I/O and internal storage resources. These servers are plugged into a rack and then power, networking and storage equipment are connected.

server - rack

Figure 2 – Rack-mounted server form factor

A blade server is a modular solution that slides into a chassis slot, and typically houses processors, memory, local hard disk storage, network connections and storage connections. Apart from servers, the enclosure would also house blades for network, storage and power. These are used when each node has to be reasonably small capacity; overall server configuration consists of considerable number of nodes, and when scalability is important. Blade systems are advantageous in that they are simple to setup and manage, and have a small footprint.

server - blade

Figure 3 – Blade Server form factor

Blade servers form a primary building block for integrated systems in the datacenter, as enterprise customers evolve toward private clouds. Converged blade platform, with a compelling value proposition of reduced IT complexity, is an opportunity for pull-through revenue for storage, networking, software and services, beyond the servers. Blade servers form a growing segment and form a key element in the DC vendor’s portfolio for both revenue and profitability.

Density-optimized servers are a hybrid of blade and rack servers where multiple server nodes are available in 2U or 4U rack chassis, and are targeted for high performance computing and cloud applications, and typically in use by hosting companies. Density-optimized servers are utilized by large homogeneous hyper-scale datacenters to leverage scalability and efficiency of this form factor.

 server - density

Figure 4 – Density-optimized form factor

Market split by form factor – Tower, rack, blade and density-optimized servers represent 14%, 59%, 14% and 13% of x86 units there were shipped in CY13. Blade server market is led by HP with 42% market share worldwide, followed by Cisco with 25% share and IBM with nearly 14% share.

Future action is in modular servers, namely blade and density-optimized servers, though they currently form only 17% and 6% respectively of total server revenue. Virtualization and SMAC adoption will continue to be key drivers for server demand across the globe. A converged server/storage/network offering is essential to fully tap into the business potential in data centers, in the era of Fast IT.