
Cloud
Computing, Data Centres, Green Computing
1. Fine-grain Resource Allocation Exploiting Application
Diversity in Clouds
Cloud computing can be best characterized by its
essentially elastic and scalable resource allocation for various
applications with a pay-as-you-go pricing model. These characteristics
are realized primarily by virtualization technologies with (massively) multi-core
processors. As the adopt of clouds prevails, their use is becoming
increasingly diverse ranging from hosting web services, performing data
analytics to running high performance scientific and engineering
applications. To this extent, many cloud service providers (e.g., Amazon
EC2) provision a number of different service offerings, such as High-CPU,
High-Memory and Cluster GPU. This project investigates virtual machine
composition and placement exploiting the heterogeneity in both resources
and applications for efficient resource utilization with respecting QoS requirements of application. The student is
expected to have an understanding of operating systems and good algorithm skills, but he/she is
not required to have prior experience with cloud computing.
2. Exploiting
Heterogeneous Computing Systems for Energy Efficiency
Large-scale distributed computing systems like clouds can
be best described by their essentially heterogeneous and dynamic
characteristics both in resources and applications. However, it is often
the case that resources in a particular system are homogeneous. Recently,
a number of efforts have been witnessed to build those systems with
heterogeneous resources to better manage a diverse set of workloads (or
applications). In other words, a computer system may be built with
typical high-performance processors and low-power processors like Intel
Atom processors. This project investigates energy-aware scheduling and
resource allocation strategies for different workloads exploiting power
consumption characteristics of heterogeneous processors. Students, with
good programming background particularly in a Linux environment, are
encouraged to apply.
3. Fairness and Efficiency Analysis of Cloud Resource
Management Algorithms
Resource
allocation in cloud computing plays an important role in improving data
centre efficiency and end user satisfaction. In this project, we consider
fairness as an important factor of user satisfaction. Achieving
efficiency and fairness simultaneously is difficult sometimes. Recent
research shows that Hadoop Fair Schedule is not efficient for
applications with heterogeneity resource demands. The project will review
the performance of existing resource allocation algorithms on the two
aspects. It will further work towards a fair and efficient resource
allocation algorithm in the context of the Pay-As-You-Go (PAYG) cloud
pricing model. The student is expected to be familiar with distributed
computing concepts and comfortable with Java programming.
4. Problem Diagnostic Mechanisms for Resource Provisioning
in Cloud Computing
There are many ways things may
go wrong when running applications in the cloud. Software bugs and
misconfigurations are common causes of system failures. When problems
like these happen, the resources these applications run on are not
effectively used and in the worst case, these applications may jeopardize
the resource use of other normal applications. This project intends to analyze the patterns of system behaviours when these
problems occur and investigate the impact of these problems for resource
provisioning mechanisms. It also intends to develop automatic method to
detect these problems and enhance cloud resource provisioning algorithms
to deal with these problems. The student is expected to have good
understanding of distributed computing principles and be comfortable with
Java programming.
5. Modelling Consolidation
Behaviour of Collocated Virtual Machines in Private Clouds
Clouds are literally changing
the way we think, we program and we even solve our problems. Besides many
advantages public clouds provide to users, they also introduce serious
concerns (e.g., security) that particularly stop many large organisations
to migrate their computation to public clouds; private cloud seems to be
their solution. In private clouds, each organization is in charge of
handling its Virtual Machines (VM) to better provision its available
resources. One of the most effective ways to provision recourses is to
collocate several VMs on a Physical Machine (PM); a process that may also
degrade performance of the provided services hosted by VMs. The projects
aims to investigate relations between performance degradation of VMs
according to their own loads as well as their neighbours that are also
competing to use the shared recourses on PMs. Students who are interested
need moderate knowledge of programming. Tools to monitor, benchmarks to
test, and a private cloud to experiment are all already developed and
provided; students only need to collect raw data and model the
not-so-easy ecology of such complicated systems.
6.
Application Isolation Techniques in Cloud Computing Platforms
The cloud computing model allows people to use CPU,
storage and even network bandwidth from remote resource providers. These
resource providers often host lots of third-party applications in tens of
thousands machines in their data centres. As many third-party
applications share physical CPUs, storage and networks, how to isolate
these applications becomes an issue. The project will review the
technologies, such as virtual machines, used by existing cloud computing
infrastructure providers for achieving application level isolation and
examine the effectiveness of these technologies. It will also investigate
how to make a cloud computing platform trustworthy.
7. Application-Specific Service Level Agreement and
Energy-Efficiency Improvement in Cloud Computing Platforms
Cloud computing environments are gaining popularity
as the de facto platforms for many applications. These systems bring a
range of heterogeneous resources that should be able to function
continuously and autonomously. However, these systems expend a lot of
energy. Thus, this project aims to develop new algorithms and tools for
energy-aware resource management allocation for large-scale distributed
systems enabling these systems to become environmentally friendly. The
proposed framework will be 'holistic' in nature seamlessly integrating a
set of both site-level and system-level/service-level energy-aware resource
allocation schemes addressing a range of complex scenarios and different
operating conditions.
8.
Interruption Prediction Models for Business Applications in Clouds
Although
performance interruption of cloud-based applications is a well-known fact
when migrating virtual machines (VMs), no actual model was ever proposed
to predict magnitude of such interruptions prior to reorganizing VMs in
virtualized environments. This project starts by collecting data for
modelling transition behaviour of VMs (cloud applications) during their
live-migration process. The collected data is then analyzed
to design mathematical, empirical, and/or, analytical models to predict
application transient behaviour during a given service. The last step is
to evaluate the proposed model on business applications already running
in the cloud.
9.
Designing a QoS-Aware Controller for Dynamic
Scheduling in Processing of Cloud-Based Data Streaming Platforms
More and more companies have to deal with
huge amounts of streaming data that need to be quickly processed in a
real time fashion to extract meaningful information. Stream data
processing is different from well-studied batch data processing which
does not necessarily need to be done in real-time (the issue of
velocity). In such environments, data must analyzed/transformed
continuously in the main RAM before it is stored on the hard drive.
Especially in environments that the value of the analysis decreases with
time. Normally, this is done by a cluster of server (worker) nodes that
continuously work on processing the incoming stream of data. One of the
major issues posed by streaming data processing is keeping the QoS level under fluctuations of request rates. Past
research showed that the presence of high arrival rate of streaming data
within short periods causes serious degradation to the overall
performance of underlying system. In this project, we are looking for
creating some advanced controller techniques to allocate effectively
available resources to handle the big data streaming with complex arrival
patterns in order to preserve the QoS enforced
by end-users.

|
Internet
of Things, Edge and Fog Computing
1.
Localization Problems in Internet of Things Networks
Internet of
Things (IoT) is a novel design paradigm,
intended as a network of thousands (or millions) of tiny sensors
communicating with each other to offer innovative solutions to real time
problems. These sensors form a network (Wireless Sensor Networks) to
monitor physical environment and disseminate collected data back to the
base station through multiple hops. WSN has the capability to collect and
report data for a specific application. The location information plays an
important role for various wireless sensor network applications. A
majority of the applications are related to location based services
(LBS). The development of sensor technology, processing techniques, and
communication systems give rise to a development of smart sensors for
adaptive and innovative applications (health, transport, smart cities, etc). So a single localization technique is not
adequate for all application. In this work, we are interested to
understand the different problems associated with the complexity
associated with the variety of localization techniques that arise in IoT applications. The richness of localization
techniques stems from a variety of conditions, such as, information
(distributed or centralized), transmission range (range-based,
range-free), operating environment (in-door, out-door), node density,
mobility, and others.
2. Energy Minimization in Cloud of Things Systems
Cloud Computing
has become the de facto computing platform for application processing in
the era of Internet of Things (IoT). However,
limitations of the cloud model such as the high transmission latency and
high monetary and energy costs are giving birth to a new computing
paradigm called Edge Computing (a.k.a Fog
Computing). Fog computing aims to move the data processing at the network
edge so as to reduce Internet traffic, improve response time, as well as
to provide a better model to deal with data safety and privacy concerns.
However, since the servers at the Fog layer are not as powerful as the
ones at the cloud, there is a need to balance the data processing in
between the Fog and the Cloud. A naive solution where all applications
run at the fog layer may lead to high computation costs and even increase
the response time. Therefore, the trade-off in terms of average response
time and average monetary and energy costs needs to be addressed within
the fog computing model. This project deals with a range of problems
associated with resource management in Internet/Cloud of Things systems
with several trade-offs such as average response time and average cost.
3. Managing Delay-Sensitive Applications in the Internet
of Things
The steep rise
of Internet of Things (IoT) applications along
with the limitations of Cloud Computing to address all IoT requirements promotes a new distributed computing
paradigm called Fog Computing, which aims to process data at the edge of
the network. With the help of Fog Computing, the transmission latency and
monetary spending caused by Cloud Computing can be effectively reduced.
However, executing all applications in fog nodes will increase the
average response time since the processing capabilities of fog is not as
powerful as cloud. A tradeoff issue needs to be
addressed within such systems in terms of average response time and
average cost. In this paper, we develop an online algorithm, unit-slot
optimization, based on the technique of Lyapunov
optimization. It is a quantified near optimal solution and can online
adjust the tradeoff between average response
time and average cost. We evaluate the performance of our proposed
algorithm by a number of experiments. The experimental results not only
match up the theoretical analyses properly, but also demonstrate that our
proposed algorithm can provide cost-effective processing while
guaranteeing average response time.
4. Real-Time IoT Data Stream
Analysis on Edge Computing Systems
Data from Internet of Things applications (IoT) have distinct life stages, 1) initial (1-5
minutes), medium-term (5-30 minutes), long-term (30minutes-1 day) and
historical (1 day+). Some IoT applications
require the use of 100% freshly generated sensor data to optimize local
performance and flag potential problems, and some IoT
applications require combing device data with neighbourhood data to drive
medium-term optimization and performance decisions. These unique
requirements drive a recent requirement for Edge-based analytics. In this
work, you are expected to complete the following two tasks, 1) to
determine what kind of IoT applications
requires edge analytics and 2) to conduct edge analytics for a real IoT application on a Raspberry Pi or other platforms.
5. Sustainable Edge Computing for Internet of Things
Systems
There are
multiple factors that can impact energy consumption of IoT applications running over an Edge computing
platform, which include access network technologies, idle power
consumption of Edge servers, the locations of IoT
applications processing (the proportion of applications processed at edge
servers and/or cloud servers), and virtualization and network management.
All these factors can be adjusted to reduce energy consumption in current
CoT computing architectures. We aim to address
energy efficiency issues in IoT with an
emerging interdisciplinary research trend, called sustainable edge
computing. The
overall idea of our approach is to localise IoT
traffic and power resources. We will focus on the technologies and
systems that can be used in conjunction with Edge computing in order to
control energy consumption of IoT applications
and services. This project aims to propose a new approach to exploiting
the capabilities of microgrids in the context
of Edge computing in order to reduce the energy consumption of IoT applications and services. There are several open
issues to be addressed in this project, including but not limited to:
Energy-harvesting sensors and IoT device use of
local energy generation; where exactly to locate Edge computation and
storage considering microgrids can provide
local energy sources for each individual house; the impact of local
energy storage (batteries and super capacitors) and renewable energy
sources in a neighbourhood; integrating Edge computing and microgrids at different network levels and different
locations; and design scheduling schemes to ensure system performance as
well as its sustainability.
|

Parallel
& Distributed Computing
1. Load Balancing and Task Scheduling in
Parallel, Distributed, and Cluster Computing Environments
Scheduling and load balancing are two
important problems in the area of parallel computing. Efficient solutions
to these problems will have profound theoretical and practical implications
that will affect other parallel computing problems of similar nature.
Little research attempted a generalized approach to the above problems. The
major problems encountered are due to the interprocessor
communication and delay because of inter-dependency between the different
subtasks of a given applications. The mapping problem arises when the
dependency structure of a parallel algorithm differs from the processor
interconnection of the parallel computer, or when the number of processes
generated by the algorithm exceeds the number of processors available. This
problem can be further complicated when the parallel computer system
contains heterogeneous components (e.g. different processors and link speeds,
such as in Cluster and Grid Architectures). This project intends to
investigate the development of new classes of algorithms for solving a
variety of scheduling and load-balancing problems for static and dynamic
scenarios.
2.
Scheduling Communications in Cluster Computing Systems
Clusters of commodity computer systems
have become the fastest growing choice for building cost-effective
high-performance parallel computing platforms. The rapid advancement of computer
architectures and high-speed interconnects have facilitated many successful
deployments of this type of clusters. Researchers in previous studies have
reported that, the cluster interconnect significantly impacts the
performance of parallel applications. High-speed interconnects not only
unveil the potential performance of the cluster, but also allow clusters to
achieve better performance/cost ratio than clusters with traditional local
area networks. Towards this end, this project aims to study the how
computations and communications influence the performance of such systems.
Applications tend to range from the compute-intensive to the
communication-intensive and an understanding of such applications and how
they map efficiently onto clusters is important.
3.
Parallel Machine Learning and Stochastic Optimization Algorithms
Optimization algorithms can be used to
solve a wide range of problems that arise in the design and operation of
parallel computing environments (e.g., datamining, scheduling, routing).
However, the many classical optimization techniques (e.g., linear
programming) are not suited for solving parallel processing problems due to
their restricted nature. This project is investigating the application of
some new and unorthodox optimization techniques such fuzzy logic, genetic
algorithms, neural networks, simulated annealing, ant colonies, Tabu search, and others. However, these techniques are
computationally intensive and require enormous computing time. Parallel
processing has the potential of reducing the computational load and
enabling the efficient use of these techniques to solve a wide variety of
problems.
4. Autonomic Communications in Parallel and Distributed
Computing Systems
The rapid advancement of computer
architectures and high-speed interconnects have facilitated many successful
deployments of many types of parallel and distributed systems. Researchers
in previous studies have reported that, the design of interconnects
significantly impacts the performance of parallel applications. High-speed
interconnects not only unveil the potential performance of the computing
system, but also allow such systems to achieve better performance/cost
ratio. Towards this end, this project aims to study the how computations
and communications influence the performance of such parallel and
distributed computing systems.
5. Quality of Service in
Distributed Computing Systems
There is a need to develop a
comprehensive framework to determine what QoS
means in the context of the distributed systems and the services that will
be provided through such infrastructure. What complicates the scenario is
that the fact the distributed systems will provide a whole range of
services and not only high performance computing. There is a great need for
the development of different QoS metrics for
distributed systems that could capture all the complexity and provide
meaningful measures for a wide range of applications. This will possibly
mean that new classes of algorithms and simulation models need to be
developed. These should be able to characterize the variety of workloads
and applications that can be used to better understand the behaviour of
distributed computing systems under different operating conditions.
6.
Healing and Self-Repair in Large Scale Distributed Computing Systems
As the
complexity of distributed systems increases time there will be a need to endow
such systems with capabilities that make them capable of operating in
disaster scenarios. What makes this problem very complex is the
heterogeneous nature of today's distributed computing environments that
could be made up of hundreds or thousands of components (computers,
databases, etc). In addition, a user in one
location might not be able to have control over other parts of the system.
So it is rather logical that there is a need for 'smart' algorithms
(protocols) that can achieve such an acceptable level of fault-tolerance
and account for a variety of disaster recovery scenarios.

Networking
1. Detection of Anomalous
Variations in Dynamic Networks
The intranet is fast becoming the
preferred enterprise solution for delivering interoperable communications
for internal information exchange. The term intranet implies a private data
network that makes use of communication protocols and services of the
Internet, such as the TCP/IP protocol suite. Over recent years these data
networks have experienced significant growth in size and complexity
resulting in an increase in frequency, type and severity of network
problems. To ensure early detection and identification of these problems
better network management techniques must be employed. In the management of
large enterprise intranets (data networks), it becomes difficult to detect
and identify causes of abnormal change in traffic distributions when the
underlying logical topology is dynamic. Network management techniques use
statistical trending methods and visualization tools to monitor network
performance. These techniques are good for managing traffic but can be
inadequate when networks are very dynamic (physical and logical structures
of time-varying nature added to traffic variations). This project aims to
complement these existing techniques with suitable metrics that allow the
automatic detection of significant change within a network and alert
operators to when and where the change occurred. Applications are manifold:
discovery and prediction of network faults and abnormalities, overload,
congestion, hotspots, etc. Possible topics: network reconstruction out of
routing tables, where to put (a given number of) probes in order to get
maximal coverage of network abnormalities, how does network monitoring
depend on network protocols? If one has a time series of network
transaction files, can one not monitor network (when?) and not lose too
much information? What to do if there are 'holes' in time series or in network(s)?
In other words: Can a network be monitored without full knowledge of the
entire network (network inference?)
2.
Modelling the Energy use of Data Centre Networks
This project will investigate how energy
is consumed in data centres and model the consumption with different
workloads running in data centres. Specifically, the project will examine
the link between the data centre energy use and the performance of the
underlying network technology. The approach of this project is to conduct
experiments in a prototype data centre and extract workload patterns, and
to develop energy-efficient algorithms for the movement of data within the
data centre. The second phase of the project will also investigate the
interplay between data centre energy use and the wider electricity network
(Smart Grid).
3.
Resource Management for Network Function Virtualization
With the growing
demand of the Cloud services, Network Function Virtualization (NFV) is
gaining popularity among the application service providers, Internet
service providers and Cloud service providers. NFV is proving to be an
effective and flexible alternative for the service deployments across
multiple-clouds. NFV is an emerging network architecture to increase flexibility
and agility within operator's networks by placing virtualized services on
demand in Cloud Data Centers (CDCs). One of the
main challenges for the NFV environment is how to efficiently allocate
Virtual Network Functions (VNFs) to Virtual Machines (VMs) and how to
minimize network latency in the rapidly changing network environments.
Although a significant amount of work/research has been already conducted
for the generic VNF placement problem and VM migration for efficient
resource management in CDCs, network latency among various network
components and VNF migration problem have not been comprehensively
considered yet to the best of our knowledge. Firstly, to address VNF
placement problem, we need to design more comprehensive models based on real
measurements to capture network latency among VNFs with more granularity to
optimize placement of VNFs in CDCs. We also need to consider resource
demand of VNFs, resource capacity of VMs and network latency among various
network components. Our objectives are to minimize both network latency and
lead time (the time to find a VM to host a VNF).
|