SerFer: Serverless Inference of Machine Learning Models

With the advent of serverless functions, we’re seeing more and more applications trying to exploit the serverless domain. This can be attributed to three key benefits offered by serverless platforms: 1. Functions can be triggered in response to events, i.e. it adopts an event based programming model. 2. Provisioning and scalability issues are not the concern of application. 3. Serverless platforms adopt the pay-per-use model, hence prevent wastage of application resources that happens either by provisioning too much and hurting budget or provisioning too less and hurting throughput and/or latency. Machine Learning inference is a throughput and latency sensitive domain since it needs to be done in real-time. Furthermore, such inference usually involves expensive computations. We study the viability of doing such inference on serverless platforms. In doing so, we also characterize when it would be suitable to move from expensive GPU machines to serverless platforms.

Project Report: here

Code: here

HotSpots: Reducing Contention on Hot Leaves in B-trees

B-Tree indices are heavily used in modern transactional database systems for efficient search and range queries. Concurrency in these systems is maximized by locking at fine granularity. Lock contention among writers can be a severe performance bottleneck. In particular, when insertions are on a sequential key, all threads will contend for the rightmost leaf of the B-tree. We propose an Auxiliary Structure to augment concurrent B-trees and reduce contention. We evaluate our design against a concurrent B-tree using Optimistic Lock Coupling (OLC) [5] and an im- plementation we call byte-reordering that solves contention but does not support range queries. We find that in the OLC implementation, lookups scale well with number of threads but insertions do not. We find that byte-reordering is very effective at reducing contention and is a viable solution if the use case does not require range scans. Due to implementation difficulties, we were not able to optimize our Auxiliary Structures performance, so its performance lags significantly behind that of the other implementations; however, we believe more engineering effort is required to evaluate the full potential of this idea. Finally, we summarize some lessons learned regarding the design and implementation of concurrent data structures.

Project Report: here

Code: here

Breaking KVM with MalOS

Virtual machine monitors (VMMs) like KVM provide hardware resources to many different guest operating systems, often with the goal of fairness and security in mind. For these VMMs to be useful, however, they should equitably and usefully distribute server resources to all guests regardless of what other guests do. In this paper, we test several mechanisms for abusing the shared resource allocation in order to adversely affect collocated guest virtual machines in KVM. None of the methods used by malicious VMs were able to affect the performance of the host or other VMs.

Project Report: here

Code: here

Evaluation of Interprocess Communication Mechanisms

The operating system provides general mechanisms for flexible interprocess communication (IPC). In this paper, we have studied and evaluated three popular IPC techniques in Linux - pipes, sockets and shared memory. We have carefully constructed the experiments to measure the latency and throughput of each IPC. An accurate timer is the most basic necessity for this evaluation. Therefore, we have compared three timer APIs and picked the one with most reliable results to conduct our experiments. Our observations show that the best performance is given by shared memory, followed by TCP/IP sockets and the slowest is pipes. However, it is highly dependent on two important factors - message sizes and hardware cache. We have studied the effects of these factors to draw some insightful conclusions. We also discuss the usefulness of the three IPCs for different applications.

Project Report: here