Idiomatic resource management

Posted: 2022-09-24

Managing expensive resources is a problem in systems programming. Common examples include mutex critical sections and file descriptors, but reference counts, cryptographic key material, RPC contexts, and more require similar management. Programming languages want to offer idiomatic support for cleaning up such resources.

I worked on large-scale distributed systems at Google. I’ve developed in C++, Python, and Go. As I start to learn OCaml, I am adapting to a very different set of idioms. OCaml is a functional garbage-collected language, so the lexical scopes which naturally structure imperative languages are not present. However, even the three imperative languages encourage different frames of mind.

C++ and RAII

In C++, the most common approach is to use the RAII pattern to acquire and release resources at predictable times. For example, a mutex lock1 guarantees that when its destructor is called, whether due to function return, explicit destruction, exception stack unwinding, or any other reason,2 the mutex is released and another thread may begin its critical section.

 1class ThreadSafeIdentifier {
 2 public:
 3  void set_val(std::string str) {
 4    absl::MutexLock l(&mu_);
 5    val_ = std::move(str);
 6  }
 7
 8  std::string get_val() const {
 9    absl::MutexLock l(&mu_);
10    return val_;
11  }
12
13 private:
14  mutable absl::Mutex mu_;
15  std::string val_ ABSL_GUARDED_BY(mu_);
16};

Several resources work this way. A std::unique_ptr object is constructed with an allocated object, and deallocates that object as part of its destructor. A std::shared_ptr behaves similarly for a reference counted object.3 These tools allow automatic management of heap memory4 without forcing all objects to the heap.5

 1void Frobnicate(std::unique_ptr<Frobber> obj) {
 2  // This does not compile, since unique_ptrs can't be copied...
 3  //    std::unique_ptr<Frobber> obj2 = obj;
 4  // ...but can be moved.
 5  std::unique_ptr<Frobber> obj2 = std::move(obj);
 6  // A shared_ptr can also "take over" a unique_ptr...
 7  std::shared_ptr<Frobber> shared_obj = std::move(obj2);
 8  // ...and shared_ptrs can be copied to add a reference count.
 9  std::shared_ptr<Frobber> shared_obj2 = shared_obj;
10}

Finally, more complex resources can be managed on the stack with similar behavior. For example, a grpc::ClientContext is often stack-allocated in simple RPC clients. This makes it easy for the reader to understand the scope of a client override:

 1bool Frobber::RemoteFrobnicate(const std::string& metadata_value) {
 2  // Add metadata tagging only for this call.
 3  grpc::ClientContext ctx;
 4  ctx.AddMetadata("my_metadata", metadata_value);
 5
 6  FrobnicateRequest req;
 7  FrobnicateResponse resp;
 8  grpc::Status status = remote_frobber_stub_->Frobnicate(&ctx, req, &resp);
 9  return status.ok() && resp.succesful_frobnication();
10}

Go and defer

Go initially aimed to be a systems language with similar performance to C++. They’ve backed off from that, but it’s still an efficient garbage-collected language6 with good tools for structured concurrency. It also has a flexible resource management idiom.

Idiomatic Go style tends to be verbose and explicit, and its resource management is no exception. A common resource is context.Context.7 Like the C++ grpc::ClientContext, a context.Context is meant to be scoped to a logical group of operations. Unlike in C++, the structure of the language doesn’t provide that scoping automatically.

Instead, Go resources often provide a cleanup function. The programmer can [defer] cleanup to provide a similar guarantee to the stack unwinding in a C++ program.

1func RunOrTimeout(ctx context.Context, t time.Duration, f func(context.Context) error) error {
2  ctx, cancel := context.WithTimeout(ctx, t)
3  // cancel() cleans up ctx's timer and other resources, so should be called as
4  // soon as f(ctx) returns.
5  defer cancel()
6  return f(ctx)
7}

Deferred functions are similar to destructors, but they can also update named return values before those are passed back to the caller. For example, the following function will always attempt to close file, and will return the first error encountered by WriteString or Close. A slight modification could combine the errors if both operations failed.

1func WriteStringAndClose(file *os.File, contents string) (n int, err error) {
2  defer func() {
3    if close_err := file.Close(); close_err != nil && err == nil {
4      err = close_err
5    }
6  }()
7  return file.WriteString(contents)
8}

Python and with

Python is also garbage-collected, and uses a traditional try/catch/finally mechanism for cleanup:

1def frobnicate(frob_elems):
2  frobber = unchecked_frobber(qux="qux_key")
3  try:
4    for elem in frob_elems:
5      frobber.unchecked_frobnicate(elem)
6  finally:
7    frobber.release_resources()

This is verbose and easy to forget, since exceptions are separate from the return values and signatures of the relevant functions. A more idiomatic and reliable mechanism is to use a context manager.8 For example, file objects have context managers which close the file regardless of how the context manager’s block is left:

1def generator_to_file(filename, next_line_generator):
2  with open(filename, 'w') as file_obj:
3    for s in next_line_generator():
4      file_obj.write(s)

Python context managers use magic functions named __enter__ and __exit__ under the hood, which is pretty unfriendly to library writers. However, Python wants to encourage this idiom and so its standard library provides contextlib to turn try/catch/finally blocks into context managers!

 1# A little more library setup...
 2@contextlib.contextmanager
 3def scoped_frobber(**resource_kwargs):
 4  frobber = unchecked_frobber(**resource_kwargs)
 5  try:
 6    yield frobber
 7  finally:
 8    frobber.release_resources()
 9
10# ... results in much nicer user code
11def scoped_frobnicate(frob_elems):
12  with scoped_frobber(qux="qux_key") as frobber:
13    for elem in frob_elems:
14      frobber.unchecked_frobnicate(elem)

OCaml and ~f

OCaml is a strict functional language, meaning that a function’s arguments are evaluated before the function itself. I find that easier to understand than e.g. Haskell’s lazy evaluation. In particular it makes resource acquisition predictable.

OCaml is also garbage-collected, so resource release is complex to track. It uses an idiom similar to Python’s with blocks: a function is wrapped and provided with the resource, which is guaranteed to be released after the function is done.9 OCaml’s type system makes it straightforward to chain functions together, or use higher-level functions to manipulate the return value.

For example, Async_unix.Reader.with_close calls a function f, closes the reader, then propagate’s f’s result.

1open Async
2
3let frobnicate_and_close existing_reader frobber =
4  Reader.with_close
5    existing_reader
6    ~f:(fun () -> Frobber.frobnicate frobber existing_reader)

The full resource acquisition and release idiom looks like you’d expect:

1open Async
2
3let frobnicate_with_file frobber filename =
4  Reader.with_file ~f:(fun reader -> Frobber.frobnicate frobber reader)

upon

A related function is Async_unix.Reader.file_lines, which I used in my Advent of Code harness. In its implementation, the file’s contents are turned into an Async.Pipe.t, and close-on-completion is accomplished by an Async.upon call which waits for the pipe.

The upon idiom is specific to Async, and it’s extremely flexible: any deferred action can trigger another, including cleanup as in this case. However, it’s much more opaque than a context and much easier to forget.

Imperative secrets

To better understand OCaml, I decided to take a look at the implementation of Async_unix.Reader.with_close. It turns out that while the API exposed to the programmer is functional, the implementation of Async_unix.Reader.close, which interacts with a mutable file descriptor and file state, is written like imperative code!

OCaml supports an imperative style where references can be mutated or dereferenced, those actions return unit10, and functions returning unit can be chained together with semicolons. Because OCaml is strict, this yields predictable evaluation order.

I was relieved, to be honest. I find it much easier to understand complex code written as steps than as a chain of monadic binding or function application, followed by some monad-specific way to evaluate the end of the chain. Strict evaluation and the imperative escape hatch are pragmatic choices which make it easier for the larger programming world to engage with OCaml.

Thoughts

In every language, programmers can idiomatically acquire and release resources without distracting from the “business logic” of their application.11

As a library writer, I’m partial to the flexibility and explicitness of C++. Because constructor and destructor call times are explicit, it’s possible to write API contracts enforced by the basics of the language without too much boilerplate. The language encourages libraries to release resources as soon as possible.

By contrast, the garbage-collected languages all require the library writer to think hard about when non-memory resources should be released, since memory itself is deferred to the garbage collector. We have to depend on conventions specific to those languages.

That’s not a bad thing. For library code especially, it’s important that future readers can grasp contracts and subtleties in the implementation. Using the language idiomatically makes future maintenance easier and improves a library’s chances of long-term success.

--Chris


Appendix: bash and trap

I have written my fair share of “load-bearing” bash scripts, for which I assume an eternal punishment awaits. Even bash has an idiom of sorts for resource acquisition and release.

One approach is to trap cleanup.

1FILE="$(mktemp)"
2trap 'rm ${FILE?}' EXIT
3
4# Do the work using filename ${FILE}

Another involves deferring cleanup to the kernel’s reference count.

1FILE="$(mktemp)"
2exec {FD}<>"${FILE?}"
3rm "${FILE?}"
4
5# Do the work using fd ${FD}, which is closed when the shell exits.

Finally, a mutex of sorts between processes on a single machine with a local filesystem can be implemented via flock and a subshell.

1(
2    flock "${FD?}";
3    # Do critical section work
4) {FD}>/var/lock/lockname

Note that the last approach is unreliable if your filesystem depends on the network, and unreliable locks are worse than no locks.


  1. absl::MutexLock comes from Abseil, standard C++ tooling used at Google. ↩︎

  2. You won’t be surprised to learn that C++ has sharp edges. It is possible to release stack space without calling destructors, e.g. if you use setjmp/longjmp↩︎

  3. The reference semantics underlying the return, move, and copy rules are subtle enough to deserve their own post. ↩︎

  4. C++ puts the programmer in full control, which makes it possible to do all sorts of dangerous things. std::unique_ptr and std::shared_ptr are library objects, and the programmer is not required to use them. Even if they are used, you can dereference a null pointer or access freed memory by using a moved-from reference. With std::unique_ptr and std::shared_ptr it is more difficult to write a double-free bug, at least. ↩︎

  5. Garbage-collected languages, by contrast, offer a stronger memory guarantee. They disallow weak references (bare pointers or std::weak_ptr) which circumvent reference counting. Garbage collection can also detect cyclic references, whereas simple counting cannot. Nothing is free, though: as discussed below, for more complex resources, the unpredictable cleanup time inherent in garbage collection pushes work up to the programmer. It’s also comparatively difficult for the language runtime to avoid heap allocations. ↩︎

  6. Go hides its garbage collection well through clever implementation and escape analysis↩︎

  7. I told you Go was verbose. ↩︎

  8. The term “context” is used a lot! ↩︎

  9. Also as with contextlib, it’s up to the library writer to remember exceptions and handle them. ↩︎

  10. unit is an OCaml type which represents “no value” or “void.” See Real World OCaml for more info. ↩︎

  11. Go is verbose enough to stress this claim. ↩︎


Home | Feedback | RSS