Avoiding Consistency Exceptions Under Strong Memory Models

Shared-memory languages and systems generally provide weak or undefined semantics for executions with data races. Prior work has proposed memory consistency models that ensure well-defined, easy-to-understand semantics based on region serializability (RS), but the resulting system may throw a consistency exception in the presence of a data race. Consistency exceptions can occur unexpectedly even in well-tested programs, hurting availability and thus limiting the practicality of RS-based memory models.

To our knowledge, this work is the first to consider the problem of availability for memory consistency models that throw consistency exceptions. We first extend existing approaches that enforce RSx, a memory model based on serializability of synchronization-free regions (SFRs), to avoid region conflicts and thus consistency exceptions. These new approaches demonstrate both the potential for and limitations of avoiding consistency exceptions under RSx. To improve availability further, we introduce (1) a new memory model called RIx based on isolation of SFRs and (2) a new approach called Avalon that provides RIx. We demonstrate two variants of Avalon that offer different performance–availability tradeoffs for RIx.

An evaluation on real Java programs shows that this work’s novel approaches are able to reduce consistency exceptions, thereby improving the applicability of strong memory consistency models. Furthermore, the approaches provide compelling points in the performance–availability tradeoff space for memory consistency enforcement. RIx and Avalon thus represent a promising direction for tackling the challenge of availability under strong consistency models that throw consistency exceptions.

Paper:   PDF
Talk:   Slides shared in Google Slides
Source code:   Available for download from the Jikes RVM Research Archive

Providing Global Snapshot Transactions In SQL Data Warehouse

This work is on supporting global snapshot in SQL Data Warehouse (SQL DW) that allows users to perform consistent reads cross distributed databases. Existing SQL DW supports only read uncommitted isolation level. The goal of this work is to provide SQL DW users the illusion of snapshot isolation as in a standalone SQL Server instance in a distributed environment.  We introduce a design based on distributed Multi-versioning Concurrency Control (MVCC) with central time authority. The system has become a feature of Azure SQL DW service.

Relaxed Dependence Tracking for Parallel Runtime Support

Memory access dependence tracking is a fundamental component for building many runtime support systems. Existing forms of runtime support slow programs significantly in order to track (i.e., detect or control) an execution's cross-thread memory access dependences accurately. This paper investigates the potential for runtime support to hide latency introduced by dependence tracking, by tracking dependences in a relaxed way—meaning that not all dependences are tracked accurately. The key challenge in relaxing dependence tracking is to preserve both the program's semantics and the runtime support's guarantees. We present an approach called relaxed tracking (RT) and demonstrate its practicality by building two types of RT-based runtime support. Our evaluation shows that RT hides much of the latency incurred by dependence tracking, although RT-based runtime support incurs costs and complexity in order to handle relaxed dependence information. By demonstrating how to relax dependence tracking to hide latency while preserving correctness, this work shows the potential for addressing a key cost of dependence tracking, thus advancing knowledge in the design of parallel runtime support.

Paper: Pdf
TalkSlides (Google Slides)
Open source code linkSource Code, which includes implementation of RT, RT-based recorder, and RT-based STM.

Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics

Transactional memory offers a high-level abstraction for writing concurrent programs. Existing software transactional memory systems are impractical because they add high overhead and often provide weak progress guarantees and/or semantics. LarkTM is a new software transactional system. It is fast and provides strong atomicity/isolation. It is significantly faster than DeuceSTM, IntelSTM, and NOrec; scales well; and provides a strong progress guarantees that are not usually found in other systems.

Open source code linkSource Code, which includes implementation of LarkTM-O, LarkTM-S, and NOrec (JVM), IntelSTM (JVM)

BenchmarkSTAMP (Java version)

“We believe it has a good potential to impact future Java TM systems.” -- PPoPP 2015 program committee.

Memcached Design on High Performance RDMA Capable Interconnects

In data retrieval and analysis, data lookups can be expensive, and Memcached, which is a distributed memory caching layer, is used to facilitate lookups. Memcached is implemented using traditional BSD sockets, and although socket interface provides portability, it entails additional processing and multiple message copies. Meanwhile, High-Performance Computing (HPC) has adopted advanced interconnects (e.g. InfiniBand, 10 Gigabit Ethernet/iWARP, RoCE) and provides low network latency, high bandwidth, and low CPU overhead. This project provides a novel design of Memcached for RDMA capable networks. It extends the existing open-source Memcached software and makes it RDMA capable.


  • Integrated memcached with the INCR communication library for Infiniband
  • Benchmarked memcached + InfiniBand with memcached + sockets/10GigE, memcached + sockets/SDP and IPoIB
  • Improved Memcached’s latency to retrieve 4KB data size by a factor of four than the latency from using 10GigE , and obtained a factor of six's improvement in throughput.