I will be giving a talk on our recent work on cloud monitoring & operational analytics in Boston Univ. next week. The details are below. I am hoping to add what I learn from the day’s discusions later as well.
Seamless, Unified Operational Visibility and Analytics Designed for Cloud
Emerging cloud services allow users to define and provision complex, distributed systems with unprecedented simplicity and agility. With the push of a button entire stacks of software can be instantiated within minutes with various configurations and customizations. Automation, continuous integration and delivery further simplify the entire lifecycle management of modern born-on-the-cloud applications. These advances also bring in additional research challenges and opportunities for cloud. Operational visibility into the complex, distributed user applications, cloud runtimes and the underlying infrastructure is becoming a persistent pain point for both the end users and the service/platform providers. As the system and configuration complexity grows, data-driven operational analytics for security, compliance, configuration and resource management are becoming key areas of interest. In both cases—of operational visibility and analytics—existing, traditional solutions are either ineffective or insufficient. Their assumptions are based on a different era of computing that no longer applies, with long-running, dedicated systems that can tolerate ample configuration times and resource overheads, and where it is common practice to push monitoring and analytics burden to the end user context. In this talk, I propose a fundamentally-different approach for designing seamless, unified and deep operational visibility and analytics services in the cloud. I first present Agentless System Crawler, a cloud-native framework, which leverages cloud, virtualization and containerization abstractions to provide complete visibility into running entities in the cloud, without modifying, instrumenting or accessing into the end user context. Our approach with crawlers is based on introspection without intrusion, and as-a-service without necessitating guest cooperation. I demonstrate how we use VM introspection and container namespace mapping techniques to provide a “touchless”, “out-of-band” and “always-on” cloud operational visibility service that is built into the platform. Cloud users simply register for this service, without worrying about the plumbing, overheads or side effects of gaining visibility into their environments. Cloud operators can leverage this approach to provide deeper operational insights without intervening with user environment, thus enabling a new set of cloud operational analytics as a service that are always on and available with the push of a button. In the second part of this talk, I discuss how we leverage this seamless, deep visibility today for building cloud-native operational analytics services for security, compliance, system discovery and misconfiguration analysis. Last, I present some of the open problems and some of our upcoming research directions.
Links From the End of the Talk for Those Interested in Learing More or Contributing:
Papers:
- Operational Visibility: IC2E’14, Sigmetrics’14, VEE’15, HotCloud’15, ATC’16 (InterConnect’15)
- Operational Analytics: BigData’14, IBM JRD’16:{SWDisc,NFM,DevOps} (InterConnect’16)
Blogs:
- Crawl the Cloud Like You Crawl the Web: https://developer.ibm.com/open/2015/07/18/crawl-cloud-like-crawl-web/
- Monitoring and Logging for IBM Containers. No configuration needed: https://developer.ibm.com/bluemix/2015/07/06/monitoring-and-logging-for-containers-no-config-required/
- Test Driving Built-in Monitoring and Logging in IBM Containers: https://developer.ibm.com/bluemix/2015/11/16/built-in-monitoring-and-logging-for-bluemix-containers/
- Is your Docker container secure? Ask Vulnerability Advisor!: https://developer.ibm.com/bluemix/2015/07/02/vulnerability-advisor/
Demos:
Open Source:
- dwOpen Tech Talk: https://developer.ibm.com/open/events/dw-open-tech-talk-agentless-system-crawler/
- dwOpen Page: https://developer.ibm.com/open/agentless-system-crawler/
- Agentless System Crawler: http://github.com/cloudviz/agentless-system-crawler
- PSVMI Introspection Library: https://github.com/cloudviz/psvmi
Try It:
- As-a-service today: http:///www.bluemix.net
- Run it yourself: http://github.com/cloudviz/agentless-system-crawler
Discussed Open Problems as Promised:
- Truly Seamless OpVis: No performance impact (~/~) + Absolutely no side effects (+/-)
- Extensibility and configurability: Deep visibility into system, application and infra
- Scale out across runtimes and scale up to many instances; challenges & limits
- How do you design DDOS-mitigation/admission-control/fair sharing in this model of built-in service
- Privacy and data sensitivity with Ops data analytics
- Piecemeal analytics/security solutions –> Cloud analytics/security roadmap
- Rules/annotators –> Actually smart analytics that learn
- good and bad configs for security, performance, availability, etc.
- Cross-silo analytics across Time, Space, Dev/Ops [CloudSight Dream]
Additional Points from Discussion:
- Decoupling data for security and privacy concerns: Crawl system data and not user data. Give user the flexibility to choose what is exposed and what is not. We should find out better ways of controlling what is visible to any monalytcis system. Whether agent-based or agentless.
- Is out-of-band approach really secure: good point on never enough security, and centralized point of entry. VMs, vs. containers vs. unikernels.
- TBC