Skip to content

Nesting Locks For Finer Control

April 8, 2012

Posts on this blog has been non-technical lately primarily because I work on a secretive project, and haven’t been able to share what I’m up to, but it’s time to write something new even if much of the detail remains hidden. I’ve concealed what the system actually does, so please excuse the vagueness of the problem description.

Recently I encountered a synchronization problem that couldn’t be solved using a plain mutex, or even a read-write lock. But it turns out the nesting of these two locks gave a solution. Even though I am happy with the way it works, I briefly describe why this isn’t the best solution, so beware if you feel like copying this approach.

The problem, obscured

Imagine a storage system. There’s three main activities, and several minor ones. Items 1-3 are major operations, and I’ve lumped the minor ones into item 4:

  1. Put something into storage
  2. Retrieve something from storage
  3. Run periodic maintenance on the storage
  4. Get state from the storage

Now here are the synchronization contraints on the various actions, with average running time:

OperationRunning timeConstraint
Put50msExclusive
Get50msExclusive (GetState allowed)
Maintenance10sExclusive and Shared (GetState allowed)
GetState10msShared


The first question that arises from this table is why should a Get need exclusive access, if it only reads what was already written to storage? The reason is that internal data may be modified during a read. This is due to optimizing the Put path by deferring updates to some state until the item is fetched.

The second question about this table is what is meant by “Exclusive and Shared” access during a Maintenance task? This is because the task runs for many seconds, so exclusive access the entire time is unacceptable, as we could never expect anyone (or thing) to wait more than a fraction of a second for data. So the maintenance is broken into three phases:

Maintenance PhaseRunning timeConstraint
Phase 150msNo Put or Get
Phase 210sNo Put
Phase 310msNo Put or Get


These extra contraints show that read-access to data is blocked only during the short first and last phases, while write-access is blocked the entire lengthy duration of the maintenance. We cannot allow any Put to occur in-between the phases or else the maintenance will result in corrupt data. Even though new data cannot be stored during this period, existing data remains accessible. This was an acceptable trade-off for this system.

Another limitation (this is more design decision than constraint) is that locking occurs at the entry to these operations, i.e. the underlying system is a black-box of thread un-safe code, so the synchronization wraps the entire operation.

Given the above constraints, it should be obvious that a single mutex over the operations does not help. If not for the maintenance task, a mutex would be fine for everything else. But a mutex excludes Get commands during Maintenance.

At first glance, it seems like a read-write lock works, since it provides either exclusive or shared access to a resource, and could guard against Put during Maintenance, while still allowing Get. But because Get also requires exclusion, it’s not a perfect fit either.

A superior solution

The solution I arrived at uses a pair of locks to gain finer control. Before describing this approach, I must suggest an alternative, better technique: use a combination of condition variable and state. This kind of problem is essentially why the monitor construct exists. So this is clearly an example of “Do as I say, not as I do.”

You may wonder why even choose to write about this inferior solution. I am because I found it to be an interesting exercise. In retrospect this solution now seems obvious, but it wasn’t immediately (at least to me)… it took some trial-and-error to get the maintenance task safe while meeting the other constraints.

This solution

Even though the operations API is treated as monolithic, the approach comes from dividing the operations into two sections: first, the functions that Get stuff, and second, everything else. Now, we use two locks in tandem, one read-write lock, and one mutex. A Get function will always exclude another Get, so the mutex guards these type of operations. The read-write lock guards everything else, and here is the important insight, including the mutex.

Using the read-write lock to guard the mutex provides two things: safety against deadlock, and proper flow control for all the scenarios above. Protection comes from the fact that when using nested locks, always acquiring them in a fixed order prevents a kind of deadlock where each thread has one of the locks, but neither can proceed because they can’t obtain the other.

As for flow control, this table shows the operations:

OperationLocking sequence
Putwrite lock
Getread lock, then get mutex
Maintenance Phase 1read lock, then get mutex, test and set maintenance flag
Maintenance Phase 2keep read lock, but release mutex
Maintenance Phase 3keep read lock, but get mutex, clear maintenance flag when done
GetStateread lock


During a Put nothing else can happen, meeting the Exclusive criteria of Table 1. Every other operation requires the a shared lock, meaning anything but Put may happen. The second level of locking is used only to exclude Get operations.

Lastly we have Maintenance. The sequence of locking the phases was inspired by the upgradable lock concept, in which a lock may change its exclusivity while not fully releasing. In this case it must obtain and release Get protection for the first and last phase, all the while preventing Put.

There’s one last detail, which is that the current double-locking scheme can’t stop multiple maintenance sessions from running simultaneously. Since this event isn’t meant to be triggered often, it is unlikely to happen. But to be safe, a flag is set during the operation so that another one will bail early if set. Introducing this extra state only reinforces the idea that the condition variable approach mentioned earlier is better.

An aside on system design

One last comment is why this problem arises in the first place, and why it was the first time I encountered it despite working in multi-thread systems for a long time. Other projects were made thread-safe by distributing the use of synchronization mechanisms across the various objects and functions of the code base. In other words, typically any data structure that wasn’t naturally thread-safe was protected at the point nearest its use (for example, putting a lock around the get and set functions of a hash table).

In this project, all locking happens at the API level. Synchronization is a gate-keeper into the rest of the code, which is treated as a black box. There are trade-offs in this approach. The main benefit here is that none of the internals need be thread-safe, and this simplifies much of the code. The main downside is that such coarse granularity may lead to performance loss because locks must be held longer. It’s not exactly minimizing the critical region. For this particular project, this trade-off is OK, but it’s definitely something to be aware of.

Some code

To use this mechanism, the locking code is wrapped in classes. This is primarily to enforce the RAII principal: a lock is obtained by instantiating an object, and better yet, automatically released once the object goes out of scope. This allows exiting early from a critical region (either by return or exception) without forgetting to unlock something.

Here are the wrapper class implementations:

class DualLock {
 private:
  pthread_rwlock_t rwlock_;
  pthread_mutex_t mutex_;
 
 public:
  DualLock() {
    pthread_mutex_init(&mutex_, NULL);
    pthread_rwlock_init(&rwlock_, NULL);
  }
  ~DualLock() {
    pthread_mutex_destroy(&mutex_);
    pthread_rwlock_destroy(&rwlock_);
  }
 
  // Scoped lock for Put operations
  class ExclusivePut {
   public:
    explicit ExclusivePut(DualLock &l) : dual_lock_(l) {
      pthread_rwlock_wrlock(&dual_lock_.rwlock_);
    }
    ~ExclusivePut() {
      pthread_rwlock_unlock(&dual_lock_.rwlock_);
    }
   private:
    DualLock &dual_lock_;
  };
 
  // Scoped lock for Get operations
  class ExclusiveGet {
   public:
    explicit ExclusiveGet(DualLock &l) : dual_lock_(l) {
      pthread_rwlock_rdlock(&dual_lock_.rwlock_);
      pthread_mutex_lock(&dual_lock_.mutex_);
    }
    ~ExclusiveGet() {
      pthread_mutex_unlock(&dual_lock_.mutex_);
      pthread_rwlock_unlock(&dual_lock_.rwlock_);
    }
   private:
    DualLock &dual_lock_;
  };
 
  // Scoped lock for State and Maintenance
  class Shared {
   public:
    explicit Shared(DualLock &l) : dual_lock_(l) {
      pthread_rwlock_rdlock(&dual_lock_.rwlock_);
    }
    ~Shared() {
      pthread_rwlock_unlock(&dual_lock_.rwlock_);
    }
 
    // Upgrading from Shared to Exlusive requires the
    // Shared lock to be owned, this is used by Maintenance
    class Exclusive(Shared &shared) {
      explicit Exclusive(DualLock &l) : dual_lock_(shared_.dual_lock_) {
        pthread_mutex_lock(&dual_lock_.rwlock_);
      }
      ~Exclusive() {
        pthread_mutex_unlock(&dual_lock_.rwlock_);
      }
     private:
      DualLock &dual_lock_;
    };
 
   private:
    DualLock &dual_lock_;
  };
};

Lastly, Let’s finish with some pseudo-code to demonstrate the locks in use:

class ApiWrapper {
 private:
  RealImpl *impl_;
  DualLock lock_;
  bool maintain_flag_;
 
 public:
  bool Put(const string &key, const Thing &thing) {
    DualLock::ExclusivePut ep(lock_);
    return impl_->Put(key, thing);
  }
 
  Thing Get(string key) {
    DualLock::ExclusiveGet eg(lock_);
    return impl_->Get(key);
  }
 
  bool Maintain() {
    DualLock::Shared shared(lock_);
 
    // Phase 1 begin with exclusive
    {  // Begin block for lock scoping
      DualLock::Shared::Exclusive ex(shared_);
      if (maintain_flag_) return false;
      maintain_flag_ = true;
      impl_->MaintainPhase1();
    }
 
    // Phase 2 begins now, and in the meantime
    // Get operations are allowed
    impl_->MaintainPhase2();
 
    // Phase 3 begin with exclusive
    {  // Begin block for lock scoping
      DualLock::Shared::Exclusive ex(shared_);
      impl_->MaintainPhase3();
      maintain_flag_ = false;
    }
    return true;
  }
 
  State GetState() {
    DualLock::Shared shared(lock_);
    impl_->GetState();
  }
};
No comments yet

Leave a Reply

Your email address will not be published. Required fields are marked *