Professor Albert Y. ZOMAYA

Life grants nothing to us mortals without hard work.--Horace

The following is a list of honours and postgraduate projects offered by Professor Zomaya. A variety of techniques and solution methodologies can be used to address each of the problems listed below, and these could range from heuristics to graph-theoretic or biologically inspired techniques and approaches. The projects can be scaled up from an honours project to a Masters or a PhD. Interested students should contact Professor Zomaya to discuss the specifics of a given research project.

Cloud Computing, Data Centres, Green Computing

1. Fine-grain Resource Allocation Exploiting Application Diversity in Clouds

Cloud computing can be best characterized by its essentially elastic and scalable resource allocation for various applications with a pay-as-you-go pricing model. These characteristics are realized primarily by virtualization technologies with (massively) multi-core processors. As the adopt of clouds prevails, their use is becoming increasingly diverse ranging from hosting web services, performing data analytics to running high performance scientific and engineering applications. To this extent, many cloud service providers (e.g., Amazon EC2) provision a number of different service offerings, such as High-CPU, High-Memory and Cluster GPU. This project investigates virtual machine composition and placement exploiting the heterogeneity in both resources and applications for efficient resource utilization with respecting QoS requirements of application. The student is expected to have an understanding of operating systems and good algorithm skills, but he/she is not required to have prior experience with cloud computing.

2. Exploiting Heterogeneous Computing Systems for Energy Efficiency

Large-scale distributed computing systems like clouds can be best described by their essentially heterogeneous and dynamic characteristics both in resources and applications. However, it is often the case that resources in a particular system are homogeneous. Recently, a number of efforts have been witnessed to build those systems with heterogeneous resources to better manage a diverse set of workloads (or applications). In other words, a computer system may be built with typical high-performance processors and low-power processors like Intel Atom processors. This project investigates energy-aware scheduling and resource allocation strategies for different workloads exploiting power consumption characteristics of heterogeneous processors. Students, with good programming background particularly in a Linux environment, are encouraged to apply.

3. Fairness and Efficiency Analysis of Cloud Resource Management Algorithms

Resource allocation in cloud computing plays an important role in improving data centre efficiency and end user satisfaction. In this project, we consider fairness as an important factor of user satisfaction. Achieving efficiency and fairness simultaneously is difficult sometimes. Recent research shows that Hadoop Fair Schedule is not efficient for applications with heterogeneity resource demands. The project will review the performance of existing resource allocation algorithms on the two aspects. It will further work towards a fair and efficient resource allocation algorithm in the context of the Pay-As-You-Go (PAYG) cloud pricing model. The student is expected to be familiar with distributed computing concepts and comfortable with Java programming.

4. Problem Diagnostic Mechanisms for Resource Provisioning in Cloud Computing

There are many ways things may go wrong when running applications in the cloud. Software bugs and misconfigurations are common causes of system failures. When problems like these happen, the resources these applications run on are not effectively used and in the worst case, these applications may jeopardize the resource use of other normal applications. This project intends to analyze the patterns of system behaviours when these problems occur and investigate the impact of these problems for resource provisioning mechanisms. It also intends to develop automatic method to detect these problems and enhance cloud resource provisioning algorithms to deal with these problems. The student is expected to have good understanding of distributed computing principles and be comfortable with Java programming.

5. Modelling Consolidation Behaviour of Collocated Virtual Machines in Private Clouds

Clouds are literally changing the way we think, we program and we even solve our problems. Besides many advantages public clouds provide to users, they also introduce serious concerns (e.g., security) that particularly stop many large organisations to migrate their computation to public clouds; private cloud seems to be their solution. In private clouds, each organization is in charge of handling its Virtual Machines (VM) to better provision its available resources. One of the most effective ways to provision recourses is to collocate several VMs on a Physical Machine (PM); a process that may also degrade performance of the provided services hosted by VMs. The projects aims to investigate relations between performance degradation of VMs according to their own loads as well as their neighbours that are also competing to use the shared recourses on PMs. Students who are interested need moderate knowledge of programming. Tools to monitor, benchmarks to test, and a private cloud to experiment are all already developed and provided; students only need to collect raw data and model the not-so-easy ecology of such complicated systems.

6. Application Isolation Techniques in Cloud Computing Platforms

The cloud computing model allows people to use CPU, storage and even network bandwidth from remote resource providers. These resource providers often host lots of third-party applications in tens of thousands machines in their data centres. As many third-party applications share physical CPUs, storage and networks, how to isolate these applications becomes an issue. The project will review the technologies, such as virtual machines, used by existing cloud computing infrastructure providers for achieving application level isolation and examine the effectiveness of these technologies. It will also investigate how to make a cloud computing platform trustworthy.

7. Application-Specific Service Level Agreement and Energy-Efficiency Improvement in Cloud Computing Platforms

Cloud computing environments are gaining popularity as the de facto platforms for many applications. These systems bring a range of heterogeneous resources that should be able to function continuously and autonomously. However, these systems expend a lot of energy. Thus, this project aims to develop new algorithms and tools for energy-aware resource management allocation for large-scale distributed systems enabling these systems to become environmentally friendly. The proposed framework will be 'holistic' in nature seamlessly integrating a set of both site-level and system-level/service-level energy-aware resource allocation schemes addressing a range of complex scenarios and different operating conditions.

8. Interruption Prediction Models for Business Applications in Clouds

Although performance interruption of cloud-based applications is a well-known fact when migrating virtual machines (VMs), no actual model was ever proposed to predict magnitude of such interruptions prior to reorganizing VMs in virtualized environments. This project starts by collecting data for modelling transition behaviour of VMs (cloud applications) during their live-migration process. The collected data is then analyzed to design mathematical, empirical, and/or, analytical models to predict application transient behaviour during a given service. The last step is to evaluate the proposed model on business applications already running in the cloud.

9. Designing a QoS-Aware Controller for Dynamic Scheduling in Processing of Cloud-Based Data Streaming Platforms

More and more companies have to deal with huge amounts of streaming data that need to be quickly processed in a real time fashion to extract meaningful information. Stream data processing is different from well-studied batch data processing which does not necessarily need to be done in real-time (the issue of velocity). In such environments, data must analyzed/transformed continuously in the main RAM before it is stored on the hard drive. Especially in environments that the value of the analysis decreases with time. Normally, this is done by a cluster of server (worker) nodes that continuously work on processing the incoming stream of data. One of the major issues posed by streaming data processing is keeping the QoS level under fluctuations of request rates. Past research showed that the presence of high arrival rate of streaming data within short periods causes serious degradation to the overall performance of underlying system. In this project, we are looking for creating some advanced controller techniques to allocate effectively available resources to handle the big data streaming with complex arrival patterns in order to preserve the QoS enforced by end-users.

Internet of Things, Edge and Fog Computing

1. Localization Problems in Internet of Things Networks

Internet of Things (IoT) is a novel design paradigm, intended as a network of thousands (or millions) of tiny sensors communicating with each other to offer innovative solutions to real time problems. These sensors form a network (Wireless Sensor Networks) to monitor physical environment and disseminate collected data back to the base station through multiple hops. WSN has the capability to collect and report data for a specific application. The location information plays an important role for various wireless sensor network applications. A majority of the applications are related to location based services (LBS). The development of sensor technology, processing techniques, and communication systems give rise to a development of smart sensors for adaptive and innovative applications (health, transport, smart cities, etc). So a single localization technique is not adequate for all application. In this work, we are interested to understand the different problems associated with the complexity associated with the variety of localization techniques that arise in IoT applications. The richness of localization techniques stems from a variety of conditions, such as, information (distributed or centralized), transmission range (range-based, range-free), operating environment (in-door, out-door), node density, mobility, and others.

2. Energy Minimization in Cloud of Things Systems

Cloud Computing has become the de facto computing platform for application processing in the era of Internet of Things (IoT). However, limitations of the cloud model such as the high transmission latency and high monetary and energy costs are giving birth to a new computing paradigm called Edge Computing (a.k.a Fog Computing). Fog computing aims to move the data processing at the network edge so as to reduce Internet traffic, improve response time, as well as to provide a better model to deal with data safety and privacy concerns. However, since the servers at the Fog layer are not as powerful as the ones at the cloud, there is a need to balance the data processing in between the Fog and the Cloud. A naive solution where all applications run at the fog layer may lead to high computation costs and even increase the response time. Therefore, the trade-off in terms of average response time and average monetary and energy costs needs to be addressed within the fog computing model. This project deals with a range of problems associated with resource management in Internet/Cloud of Things systems with several trade-offs such as average response time and average cost.

3. Managing Delay-Sensitive Applications in the Internet of Things

The steep rise of Internet of Things (IoT) applications along with the limitations of Cloud Computing to address all IoT requirements promotes a new distributed computing paradigm called Fog Computing, which aims to process data at the edge of the network. With the help of Fog Computing, the transmission latency and monetary spending caused by Cloud Computing can be effectively reduced. However, executing all applications in fog nodes will increase the average response time since the processing capabilities of fog is not as powerful as cloud. A tradeoff issue needs to be addressed within such systems in terms of average response time and average cost. In this paper, we develop an online algorithm, unit-slot optimization, based on the technique of Lyapunov optimization. It is a quantified near optimal solution and can online adjust the tradeoff between average response time and average cost. We evaluate the performance of our proposed algorithm by a number of experiments. The experimental results not only match up the theoretical analyses properly, but also demonstrate that our proposed algorithm can provide cost-effective processing while guaranteeing average response time.

4. Real-Time IoT Data Stream Analysis on Edge Computing Systems

Data from Internet of Things applications (IoT) have distinct life stages, 1) initial (1-5 minutes), medium-term (5-30 minutes), long-term (30minutes-1 day) and historical (1 day+). Some IoT applications require the use of 100% freshly generated sensor data to optimize local performance and flag potential problems, and some IoT applications require combing device data with neighbourhood data to drive medium-term optimization and performance decisions. These unique requirements drive a recent requirement for Edge-based analytics. In this work, you are expected to complete the following two tasks, 1) to determine what kind of IoT applications requires edge analytics and 2) to conduct edge analytics for a real IoT application on a Raspberry Pi or other platforms.

5. Sustainable Edge Computing for Internet of Things Systems

There are multiple factors that can impact energy consumption of IoT applications running over an Edge computing platform, which include access network technologies, idle power consumption of Edge servers, the locations of IoT applications processing (the proportion of applications processed at edge servers and/or cloud servers), and virtualization and network management. All these factors can be adjusted to reduce energy consumption in current CoT computing architectures. We aim to address energy efficiency issues in IoT with an emerging interdisciplinary research trend, called sustainable edge computing. The overall idea of our approach is to localise IoT traffic and power resources. We will focus on the technologies and systems that can be used in conjunction with Edge computing in order to control energy consumption of IoT applications and services. This project aims to propose a new approach to exploiting the capabilities of microgrids in the context of Edge computing in order to reduce the energy consumption of IoT applications and services. There are several open issues to be addressed in this project, including but not limited to: Energy-harvesting sensors and IoT device use of local energy generation; where exactly to locate Edge computation and storage considering microgrids can provide local energy sources for each individual house; the impact of local energy storage (batteries and super capacitors) and renewable energy sources in a neighbourhood; integrating Edge computing and microgrids at different network levels and different locations; and design scheduling schemes to ensure system performance as well as its sustainability.

Parallel & Distributed Computing

1. Load Balancing and Task Scheduling in Parallel, Distributed, and Cluster Computing Environments

Scheduling and load balancing are two important problems in the area of parallel computing. Efficient solutions to these problems will have profound theoretical and practical implications that will affect other parallel computing problems of similar nature. Little research attempted a generalized approach to the above problems. The major problems encountered are due to the interprocessor communication and delay because of inter-dependency between the different subtasks of a given applications. The mapping problem arises when the dependency structure of a parallel algorithm differs from the processor interconnection of the parallel computer, or when the number of processes generated by the algorithm exceeds the number of processors available. This problem can be further complicated when the parallel computer system contains heterogeneous components (e.g. different processors and link speeds, such as in Cluster and Grid Architectures). This project intends to investigate the development of new classes of algorithms for solving a variety of scheduling and load-balancing problems for static and dynamic scenarios.

2. Scheduling Communications in Cluster Computing Systems

Clusters of commodity computer systems have become the fastest growing choice for building cost-effective high-performance parallel computing platforms. The rapid advancement of computer architectures and high-speed interconnects have facilitated many successful deployments of this type of clusters. Researchers in previous studies have reported that, the cluster interconnect significantly impacts the performance of parallel applications. High-speed interconnects not only unveil the potential performance of the cluster, but also allow clusters to achieve better performance/cost ratio than clusters with traditional local area networks. Towards this end, this project aims to study the how computations and communications influence the performance of such systems. Applications tend to range from the compute-intensive to the communication-intensive and an understanding of such applications and how they map efficiently onto clusters is important.

3. Parallel Machine Learning and Stochastic Optimization Algorithms

Optimization algorithms can be used to solve a wide range of problems that arise in the design and operation of parallel computing environments (e.g., datamining, scheduling, routing). However, the many classical optimization techniques (e.g., linear programming) are not suited for solving parallel processing problems due to their restricted nature. This project is investigating the application of some new and unorthodox optimization techniques such fuzzy logic, genetic algorithms, neural networks, simulated annealing, ant colonies, Tabu search, and others. However, these techniques are computationally intensive and require enormous computing time. Parallel processing has the potential of reducing the computational load and enabling the efficient use of these techniques to solve a wide variety of problems.

4. Autonomic Communications in Parallel and Distributed Computing Systems

The rapid advancement of computer architectures and high-speed interconnects have facilitated many successful deployments of many types of parallel and distributed systems. Researchers in previous studies have reported that, the design of interconnects significantly impacts the performance of parallel applications. High-speed interconnects not only unveil the potential performance of the computing system, but also allow such systems to achieve better performance/cost ratio. Towards this end, this project aims to study the how computations and communications influence the performance of such parallel and distributed computing systems.

5. Quality of Service in Distributed Computing Systems

There is a need to develop a comprehensive framework to determine what QoS means in the context of the distributed systems and the services that will be provided through such infrastructure. What complicates the scenario is that the fact the distributed systems will provide a whole range of services and not only high performance computing. There is a great need for the development of different QoS metrics for distributed systems that could capture all the complexity and provide meaningful measures for a wide range of applications. This will possibly mean that new classes of algorithms and simulation models need to be developed. These should be able to characterize the variety of workloads and applications that can be used to better understand the behaviour of distributed computing systems under different operating conditions.

6. Healing and Self-Repair in Large Scale Distributed Computing Systems

As the complexity of distributed systems increases time there will be a need to endow such systems with capabilities that make them capable of operating in disaster scenarios. What makes this problem very complex is the heterogeneous nature of today's distributed computing environments that could be made up of hundreds or thousands of components (computers, databases, etc). In addition, a user in one location might not be able to have control over other parts of the system. So it is rather logical that there is a need for 'smart' algorithms (protocols) that can achieve such an acceptable level of fault-tolerance and account for a variety of disaster recovery scenarios.

Networking

1. Detection of Anomalous Variations in Dynamic Networks

The intranet is fast becoming the preferred enterprise solution for delivering interoperable communications for internal information exchange. The term intranet implies a private data network that makes use of communication protocols and services of the Internet, such as the TCP/IP protocol suite. Over recent years these data networks have experienced significant growth in size and complexity resulting in an increase in frequency, type and severity of network problems. To ensure early detection and identification of these problems better network management techniques must be employed. In the management of large enterprise intranets (data networks), it becomes difficult to detect and identify causes of abnormal change in traffic distributions when the underlying logical topology is dynamic. Network management techniques use statistical trending methods and visualization tools to monitor network performance. These techniques are good for managing traffic but can be inadequate when networks are very dynamic (physical and logical structures of time-varying nature added to traffic variations). This project aims to complement these existing techniques with suitable metrics that allow the automatic detection of significant change within a network and alert operators to when and where the change occurred. Applications are manifold: discovery and prediction of network faults and abnormalities, overload, congestion, hotspots, etc. Possible topics: network reconstruction out of routing tables, where to put (a given number of) probes in order to get maximal coverage of network abnormalities, how does network monitoring depend on network protocols? If one has a time series of network transaction files, can one not monitor network (when?) and not lose too much information? What to do if there are 'holes' in time series or in network(s)? In other words: Can a network be monitored without full knowledge of the entire network (network inference?)

2. Modelling the Energy use of Data Centre Networks

This project will investigate how energy is consumed in data centres and model the consumption with different workloads running in data centres. Specifically, the project will examine the link between the data centre energy use and the performance of the underlying network technology. The approach of this project is to conduct experiments in a prototype data centre and extract workload patterns, and to develop energy-efficient algorithms for the movement of data within the data centre. The second phase of the project will also investigate the interplay between data centre energy use and the wider electricity network (Smart Grid).

3. Resource Management for Network Function Virtualization

With the growing demand of the Cloud services, Network Function Virtualization (NFV) is gaining popularity among the application service providers, Internet service providers and Cloud service providers. NFV is proving to be an effective and flexible alternative for the service deployments across multiple-clouds. NFV is an emerging network architecture to increase flexibility and agility within operator's networks by placing virtualized services on demand in Cloud Data Centers (CDCs). One of the main challenges for the NFV environment is how to efficiently allocate Virtual Network Functions (VNFs) to Virtual Machines (VMs) and how to minimize network latency in the rapidly changing network environments. Although a significant amount of work/research has been already conducted for the generic VNF placement problem and VM migration for efficient resource management in CDCs, network latency among various network components and VNF migration problem have not been comprehensively considered yet to the best of our knowledge. Firstly, to address VNF placement problem, we need to design more comprehensive models based on real measurements to capture network latency among VNFs with more granularity to optimize placement of VNFs in CDCs. We also need to consider resource demand of VNFs, resource capacity of VMs and network latency among various network components. Our objectives are to minimize both network latency and lead time (the time to find a VM to host a VNF).

Wireless and Mobile Computing

1. Using (mobile) Agents in Mobile Computing and Sensor Networks

Recently, a new approach for learning features of a domain of interest has been proposed. The main characteristic of the novel approach is the fact that global information about the domain is obtained by combining in some clever way local information gathered by independent agents. In fact, for a large number of practically-relevant domains the following paradigm can be used: (1) a large number of agents, each with a specific mandate is being sent into the domain, (2) each agent learns a given characteristic or feature of the domain, (3) a subset of the agents is recovered and debriefed. Somewhat surprisingly, for many domains it is possibly to recover a strict subset of the agents and still obtain full knowledge about the domain. It would be very useful to implement strategies for 1-3 above for a number of particular domains arising in various practical applications. Of a particular interest is the area of mobile computing and wireless networks.

2. Applying Geometric Graphs in Wireless Networks (in collaboration with Dr. Weisheng Si, University of Western Sydney)

With the availability of GPS services, the position information of nodes has become available in many types of wireless networks such as sensor networks, mesh networks, and vehicular ad hoc networks. With the knowledge of node positions, the network topologies of these networks can be modelled by geometric graphs. This project investigates how to utilize the theories of geometric graphs to address the following issues in wireless networks: topology control, connectivity analysis, routing algorithm design, and fault tolerance. Many results have been achieved in this area, but some of the fundamental research problems remain unsolved and novel research problems keep arising due to the new applications of these networks. So this project will explore many challenging and interesting problems.

3. Self-Organising Protocols for Wireless Sensor Networks

In a wireless sensor network, energy conservation is the primary design goal. Research shows that in a low-energy radio network, the energy consumed by receiving and listening (or attempting to receive) messages is of the same order of magnitude as transmitting them. The most efficient way to save energy is keeping sensor nodes turned off as long as possible. These sleeping-or-awaking nodes need the capabilities of self-organisation and re-organisation to adapt to dynamic environment and network settings. This work address issues of energy efficient self-organisation in sensor networks. The work also deals with situations in which the network needs to efficiently adapt in catastrophe scenarios by maintaining reasonable energy levels that keep the network active for the longest period of time.

4. Federating Autonomous Sensor Networks

An important component of our research is motivated by the need to use the inherent capacity of sensor networks for data collection, surveillance and target tracking as a key ingredient for establishing ubiquitous monitoring and control capabilities in support of civilian and defence applications. Indeed, a single sensor network cannot satisfy the broad spectrum of application requirements, especially when these requirements change drastically along the dimensions of time, space, and context. On the other hand, deploying numerous sensor networks in an area of interest may be infeasible. The goal is to develop a new sensor network system that will act as a distributed service provider. To build such a distributed system, we are looking at innovative sensor network system architectures that will facilitate rapid self-organization and dynamic reconfiguration of component sensor networks in support of adaptive service deployment, composition, and federation to cover the dynamic needs of numerous applications.

Last changed: October 10, 2019