White Paper Published By: Cisco Published Date: Apr 10, 2015
This document is a Cisco Validated Design (CVD) for Cisco Connected Mobile Experience (CMX) Solutions. It presents system-level requirements, recommendations, guidelines, and best practices for detecting, connecting, and engaging mobile users within your venue and leveraging your Wi-Fi network to fit your business needs. As Cisco continues to develop and enhance the technologies required to implement a CMX solution, this CVD will evolve and be updated to provide the latest guidelines, recommendations, and best practices for designing and deploying a CMX solution.
Workload modeling techniques presented in Section 2 are agnostic of the logic that governs a cloud system. Explicit modeling of this logic, or part of it, for QoS prediction can help improving the effectiveness of QoS management.
Several classes of models can be used to model QoS in cloud systems. Here we briefly review queueing models, Petri nets, and other specialized formalisms for reliability evaluation. However, several other classes exist such as stochastic process algebras, stochastic activity networks, stochastic reward nets , and models evaluated via probabilistic model checking . A comparison of the pros and cons of some popular stochastic formalisms can be found in , where the authors highlight the issue that a given method can perform better on some system model but not on others, making it difficult to make absolute recommendations on the best model to use.
3.1 3.1 Performance models
Among the performance models, we survey queueing systems, queueing networks, and layered queueing networks (LQN). While queueing systems are widely used to model single resources subject to contention, queueing networks are able to capture the interaction among a number of resources and/or applications components. LQNs are used to better model key interaction between application mechanisms, such as finite connection pools, admission control mechanisms, or synchronous request calls. Modeling these feature usually require an in-depth knowledge of the application behavior. On the other hand, while closed-form solutions exist for some classes of queueing systems and queueing networks, the solution of other models, including LQNs, rely on numerical methods.
Queueing Systems. Queueing theory is commonly used in system modeling to describe hardware or software resource contention. Several analytical formulas exist, for example to characterize request mean waiting times, or waiting buffer occupancy probabilities in single queueing systems. In cloud computing, analytical queueing formulas are often integrated in optimization programs, where they are repeatedly evaluated across what-if scenarios. Common analytical formulas involve queues with exponential service and arrival times, with a single server (M/M/1) or with k servers (M/M/k), and queues with generally-distributed service times (M/G/1). Scheduling is often assumed to be first-come first-served (FCFS) or processor sharing (PS). In particular, the M/G/1 PS queue is a common abstraction used to model a CPU and it has been adopted in many cloud studies ,, thanks to its simplicity and the suitability to apply the model to multi-class workloads. For instance, an SLA-aware capacity allocation mechanism for cloud applications is derived in  using an M/G/1 PS queue as the QoS model. In  the authors propose a resource provisioning approach of N-tier cloud web applications by modeling CPU as an M/G/1 PS queue. The M/M/1 open queue with FCFS scheduling has been used - to pose constraints on the mean response time of a cloud application. Heterogeneity in customer SLAs is handled in  with an M/M/k/kpriority queue, which is a queue with exponentially distributed inter-arrival times and service times, k servers and no buffer. The authors use this model to investigate rejection probabilities and help dimensioning of cloud data centers. Other works that rely on queueing models to describe cloud resources include ,. The works in , illustrate the formulation of basic queueing systems in the context of discrete-time control problems for cloud applications, where system properties such as arrival rates can change in time at discrete instants. These works show an example where a non-stationary cloud system is modeled through queueing theory.
Queueing Networks. A queueing network can be described as a collection of queues interacting through request arrivals and departures. Each queue represents either a physical resource (e.g., CPU, network bandwidth, etc) or a software buffer (e.g., admission control, or connection pools). Cloud applications are often tiered and queueing networks can capture the interactions between tiers. An example of cloud management solutions exploiting queueing network models is , where the cloud service center is modeled as an open queueing network of multiclass single-server queues. PS scheduling is assumed at the resources to model CPU sharing. Each layer of queues represents the collection of applications supporting the execution of requests at each tier of the cloud service center. This model is used to provide performance guarantees when defining resource allocation policies in a cloud platform. Also,  uses a queueing network to represent a multi-tier application deployed in a cloud platform, and to derive an SLA-aware resource allocation policy. Each node in the network has exponential processing times and a generalized PS policy to approximate the operating system scheduling.
Layered Queueing Networks. Layered queueing networks (LQNs) are an extension of queueing networks to describe layered software architectures. An LQN model of an application can be built automatically from software engineering models expressed using formalisms such as UML or Palladio Component Models (PCM) . Compared to ordinary queueing networks, LQNs provide the ability to describe dependencies arising in a complex workflow of requests and the layering among hardware and software resources that process them. Several evaluation techniques exist for LQNs -.
LQNs have been applied to cloud systems in , where the authors explored the impact of the network latency on the system response time for different system deployments. LQNs are here useful to handle the complexity of geo-distributed applications that include both transactional and streaming workloads.
Jung et al.  uses an LQN model to predict the performance of the RuBis benchmark application, which is then used as the basis of an optimization algorithm that aims at determining the best replication levels and placement of the application components. While this work is not specific to the cloud, it illustrates the application of LQNs to multi-tier applications that are commonly deployed in such environments.
Bacigalupo et al.  investigates a prediction-based cloud resource allocation and management algorithm. LQNs are used to predict the performance of an enterprise application deployed on the cloud with strict SLA requirements based on historical data. The authors also provide a discussion about the pros and cons of LQNs identifying a number of key limitations for their practical use in cloud systems. These include, among others, difficulties in modeling caching, lack of methods to compute percentiles of response times, tradeoff between accuracy and speed. Since then, evaluation techniques for LQNs that allow the computation of response time percentiles have been presented .
Hybrid models. Queueing models are also used together with machine learning techniques to achieve the benefits of both approaches. Queueing models use the knowledge of the system topology and infrastructure to provide accurate performance predictions. However, a violation of the model assumptions, such as an unforeseen change in the topology, can invalidate the model predictions. Machine learning algorithms, instead, are more robust with respect to dynamic changes of the system. The drawback is that they adopt a black-box approach, ignoring relevant knowledge of the system that could provide valuable insights into its performance.
Desnoyers et al.  studies the relations between workload and resource consumption for cloud web applications. Queueing theory is used to model different components of the system and data mining and machine learning approaches ensure dynamic adaptation of the model to work under system fluctuations. The proposed approach is shown to achieve high accuracy for predicting workload and resource usages.
Thereska et al.  proposes a robust performance model architecture focusing on analyzing performance anomalies and localizing the potential source of the discrepancies. The performance models are based on queueing-network models abstracted from the system and enhanced by machine learning algorithms to correlate system workload attributes with performance attributes.
A queueing network approach is taken in  to provision resources for data-center applications. As the workload mix is observed to fluctuate over time, the queueing model is enhanced with a clustering algorithm that determines the workload mix. The approach is shown to reduce SLA violations due to under-provisioning in applications subject to to non-stationary workloads.
3.2 3.2 Dependability models
Petri nets, Reliability Block Diagrams (RBD), and Fault Trees are probably the most widely known and used formalisms for dependability analysis. Petri nets are a flexible and expressive modeling approach, which allows a general interactions between system components, including synchronization of event firing times. They also find large application also in performance analysis.
RBDs and Fault Trees aim at obtaining the overall system reliability from the reliability of the system components. The interactions between the components focus on how the faulty state of one or more components results in the possible failure of another components.
Petri nets. It has long been recognized the suitability of Petri nets for performance and dependability of computer systems. Petri nets have been extended to consider stochastic transitions, in stochastic Petri nets (SPNs) and generalized SPNs (GSPNs). They have recently enjoyed a resurgence of interest in service-oriented systems to describe service orchestrations .
In the context of cloud computing, we have more application examples of Petri nets nets for dependability assessment, than for performance modeling. Applications to cloud QoS modeling include the use of SPNs to evaluate the dependability of a cloud infrastructure , considering both reliability and availability. SPNs provide a convenient way in this setting to represent energy flow and cooling in the infrastructure. Wei et al.  proposes the use of GSPNs to evaluate the impact of virtualization mechanisms, such as VM consolidation and live migration, on cloud infrastructure dependability. GSPNs are used to provide fine-grained detail on the inner VM behaviors, such as separation of privileged and non-privileged instructions and successive handling by the VM or the VM monitor. Petri nets are here used in combination with other methods, i.e., Reliability Block Diagrams and Fault Trees, for analyzing mean time to failure (MTTF) and mean time between failures (MTBF).
Reliability Block Diagrams. Reliability block diagrams (RBDs) are a popular tool for reliability analysis of complex systems. The system is represented by a set of inter-related blocks, connected by series, parallel, and k-out-of-N relationships.
In , the authors propose a methodology to evaluate data center power infrastructures considering both reliability and cost. RBDs are used to estimate and enforce system reliability. Dantas et al.  investigates the benefits of a warm-standby replication mechanism in Eucalyptus cloud computing environments. An RBD is used to evaluate the impact of a redundant cloud architecture on its dependability. A case study shows how the redundant system obtains dependability improvements. Melo et al.  uses RBDs to design a rejuvenation mechanism based on live migration, to prevent performance degradation, for a cloud application that has high availability requirements.
Fault Trees. Fault Trees are another formalism for reliability analysis. The system is represented as a tree of inter-related components. If a component fails, it assumes the logical value true, and the failure propagation can be studied via the tree structure. In cloud computing, Fault Trees have been used to evaluate dependencies of cloud services and their effect on application reliability . Fault Trees and Markov models are used to evaluate the reliability and availability of fault tolerance mechanisms. Jhawar and Piuri  uses Fault Trees and Markov models to evaluate the reliability and availability of a cloud system under different deployment contexts. Based on this evaluation, the authors propose an approach to identify the best mechanisms according to user’s requirements. Kiran et al.  presents a methodology to identify, mitigate, and monitor risks in cloud resource provisioning. Fault Trees are used to assess the probability of SLA violations.
3.3 3.3 Black-box service models
Service models have been used primarily in optimising web service composition , but they are now becoming relevant also in the description of SaaS applications, IaaS resource orchestration, and cloud-based business-process execution. The idea behind the methods reviewed in this section is to describe a service in terms of its response time, assuming the lack of any further information concerning its internal characteristics (e.g., contention level from concurrent requests).
Non-parametric blackbox service models include methods based on deterministic or average execution time values -. Several works instead adopt a description that includes standard deviations ,, or finite ranges of variability for the execution times ,. Parametric service models instead assume exponential or Markovian distributions ,, Pareto distributions to capture heavy-tailed execution times , or general distributions with Laplace transforms .
Huang et al.  presents a graph-theoretic model for QoS-aware service composition in cloud platforms, explicitly handling network virtualization. Here, the authors explore the QoS-aware service provisioning in cloud platforms by explicitly considering virtual network services. A system model is demonstrated to suitably characterize cloud service provisioning behavior and an exact algorithm is proposed to optimize users’ experience under QoS requirements. A comparison with state of the art QoS routing algorithms shows that the proposed algorithm is both cost-effective and lightweight.
Klein et al.  considers QoS-aware service composition by handling network latencies. The authors present a network model that allows estimating latencies between locations and propose a genetic algorithm to achieve network-aware and QoS-aware service provisioning.
The work in  considers cloud service provisioning from the point of view of an end user. An economic model based on discrete Bayesian Networks is presented to characterize end-users long-term behavior. Then the QoS-aware service composition is solved by Influence Diagrams followed by analytical and simulation experiments.
3.4 3.4 Simulation models
Several simulation packages exist for cloud system simulation. Many solutions are based on the CLOUDSIM  toolkit that allows the user to set up a simulation model that explicitly considers virtualized cloud resources, potentially located in different data centers, as in the case of hybrid deployments. CLOUDANALYST  is an extension of CLOUDSIM that allows the modeling of geographically-distributed workloads served by applications deployed on a number of virtualized data centers.
EMUSIM  builds on top of CLOUDSIM by adding an emulation step leveraging the Automated Emulation Framework (AEF) . Emulation is used to understand the application behavior, extracting profiling information. This information is then used as input for CLOUDSIM, which provides QoS estimates for a given cloud deployment.
Some other tools have been developed to estimate data center energy consumption. For example, GREENCLOUD , which is an extension of the packet-level simulator NS2 , aims at evaluating the energy consumption of the data center resources where the application has been deployed, considering servers, links, and switches.
Similarly, DCSIM  is a data center simulation tool focused on dynamic resource management of IaaS infrastructures. Each host can run several VMs, and has a power model to determine the overall data center power consumption.
GROUDSIM  is a simulator for scientific applications deployed on large-scale clouds and grids. The simulator is based on events rather than on processes, making it a scalable solution for highly parallelized applications.
Research Challenges A threat to workload inference on IaaS clouds is posed by resource contention by other users, which can systematically result in biased readings of performance metrics. While some bias components can be filtered out (for example using the CPU steal metric available on Amazon EC2 virtual machines), contention on resources such as cache, memory bandwidth, network, or storage, is harder or even impossible to monitor for the final user. Research is needed in this domain to understand the impact of such contention bias on demand estimation.
Major complications arise in workload inference on PaaS clouds, where infrastructure-level metrics such as CPU utilization are normally unavailable to the users. This is a major complication for regression methods which all depend on mean CPU utilization measurements. Methods based on statistical distributions do not require CPU utilization, but they are still in their infancy. More work and validations on PaaS data are required to mature such techniques.