Tag Archives: Introspection

BU Seminar: Seamless, Unified Operational Visibility and Analytics Designed for Cloud

I will be giving a talk on our recent work on cloud monitoring & operational analytics in Boston Univ. next week. The details are below. I am hoping to add what I learn from the day’s discusions later as well.

Photo Apr 28, 1 34 10 AM

Seamless, Unified Operational Visibility and Analytics Designed for Cloud

Emerging cloud services allow users to define and provision complex, distributed systems with unprecedented simplicity and agility. With the push of a button entire stacks of software can be instantiated within minutes with various configurations and customizations. Automation, continuous integration and delivery further simplify the entire lifecycle management of modern born-on-the-cloud applications. These advances also bring in additional research challenges and opportunities for cloud. Operational visibility into the complex, distributed user applications, cloud runtimes and the underlying infrastructure is becoming a persistent pain point for both the end users and the service/platform providers. As the system and configuration complexity grows, data-driven operational analytics for security, compliance, configuration and resource management are becoming key areas of interest. In both cases—of operational visibility and analytics—existing, traditional solutions are either ineffective or insufficient. Their assumptions are based on a different era of computing that no longer applies, with long-running, dedicated systems that can tolerate ample configuration times and resource overheads, and where it is common practice to push monitoring and analytics burden to the end user context. In this talk, I propose a fundamentally-different approach for designing seamless, unified and deep operational visibility and analytics services in the cloud. I first present Agentless System Crawler, a cloud-native framework, which leverages cloud, virtualization and containerization abstractions to provide complete visibility into running entities in the cloud, without modifying, instrumenting or accessing into the end user context. Our approach with crawlers is based on introspection without intrusion, and as-a-service without necessitating guest cooperation. I demonstrate how we use VM introspection and container namespace mapping techniques to provide a “touchless”, “out-of-band” and “always-on” cloud operational visibility service that is built into the platform. Cloud users simply register for this service, without worrying about the plumbing, overheads or side effects of gaining visibility into their environments. Cloud operators can leverage this approach to provide deeper operational insights without intervening with user environment, thus enabling a new set of cloud operational analytics as a service that are always on and available with the push of a button. In the second part of this talk, I discuss how we leverage this seamless, deep visibility today for building cloud-native operational analytics services for security, compliance, system discovery and misconfiguration analysis. Last, I present some of the open problems and some of our upcoming research directions.

Links From the End of the Talk for Those Interested in Learing More or Contributing:


  • Operational Visibility: IC2E’14, Sigmetrics’14, VEE’15, HotCloud’15, ATC’16 (InterConnect’15)
  • Operational Analytics: BigData’14, IBM JRD’16:{SWDisc,NFM,DevOps} (InterConnect’16)



Open Source:

Try It:

Discussed Open Problems as Promised:

  • Truly Seamless OpVis: No performance impact (~/~) + Absolutely no side effects (+/-)
  • Extensibility and configurability: Deep visibility into system, application and infra
  • Scale out across runtimes and scale up to many instances; challenges & limits
  • How do you design DDOS-mitigation/admission-control/fair sharing in this model of built-in service
  • Privacy and data sensitivity with Ops data analytics
  • Piecemeal analytics/security solutions –> Cloud analytics/security roadmap
  • Rules/annotators –> Actually smart analytics that learn
  • good and bad configs for security, performance, availability, etc.
  • Cross-silo analytics across Time, Space, Dev/Ops [CloudSight Dream]

Additional Points from Discussion:

  • Decoupling data for security and privacy concerns: Crawl system data and not user data. Give user the flexibility to choose what is exposed and what is not. We should find out better ways of controlling what is visible to any monalytcis system. Whether agent-based or agentless.
  • Is out-of-band approach really secure: good point on never enough security, and centralized point of entry. VMs, vs. containers vs. unikernels.
  • TBC

CMU SDI/ISTC Seminar: Agentless, Near-realtime VM Introspection in Origami

One of my earlier talks on agentless, introspection-based monitoring for VMs. It is interesting how long we have been pushing on this, and how recent containerization business made these ideas resonate more. I just copied from the SDI/ISTC Seminar Series for nostalgia:

DATE: Thursday, January 31, 2013
TIME: Noon – 1:00 pm
PLACE: Gates 8102
SPEAKER: Canturk Isci, IBM TJ Watson

TITLE: Agentless, Near-realtime VM Introspection in Origami

Enterprise data centers continue to embrace virtualization and cloud computing technologies due to their dramatic benefits for simplifying and streamlining system provisioning and management, as well as to improve overall resource use efficiency of the underlying hardware infrastructure via virtual machine (VM) consolidation and dynamic, distributed resource management. As virtualization and cloud automation has made it cheap and simple to create, deploy and recycle VMs on a large scale, modern data centers have become much more dynamic environments than they were a few years ago. This is creating new challenges for routine maintenance operations like security and compliance scans, that continue rely on a decades-old model based on in-VM monitoring agents and rule-driven analytics. When machines become transient and fungible resources, this model breaks down, resulting in increasing operational cost and risk.

In this talk we present a different approach to maintaining a large-scale dynamic data center. We show that out-of-VM monitoring agents can approach the data fidelity and real-time view as in-VM monitoring agents. Using this foundation we show how systems can be viewed as documents, enabling data mining and ad-hoc analytics techniques to be exploited for data center maintenance operations. These technologies are under development as part of the Origami system, an ambitious project to treat systems as data, and transform the way dynamic Cloud data centers are managed.