Idiomatic resource management
Posted: 2022-09-24
Managing expensive resources is a problem in systems programming. Common examples include mutex critical sections and file descriptors, but reference counts, cryptographic key material, RPC contexts, and more require similar management. Programming languages want to offer idiomatic support for cleaning up such resources.
I worked on large-scale distributed systems at Google. I’ve developed in C++, Python, and Go. As I start to learn OCaml, I am adapting to a very different set of idioms. OCaml is a functional garbage-collected language, so the lexical scopes which naturally structure imperative languages are not present. However, even the three imperative languages encourage different frames of mind.
C++ and RAII
In C++, the most common approach is to use the RAII pattern to acquire and release resources at predictable times. For example, a mutex lock1 guarantees that when its destructor is called, whether due to function return, explicit destruction, exception stack unwinding, or any other reason,2 the mutex is released and another thread may begin its critical section.
1class ThreadSafeIdentifier {
2 public:
3 void set_val(std::string str) {
4 absl::MutexLock l(&mu_);
5 val_ = std::move(str);
6 }
7
8 std::string get_val() const {
9 absl::MutexLock l(&mu_);
10 return val_;
11 }
12
13 private:
14 mutable absl::Mutex mu_;
15 std::string val_ ABSL_GUARDED_BY(mu_);
16};
Several resources work this way. A
std::unique_ptr
object
is constructed with an allocated object, and deallocates that object as part of
its destructor. A
std::shared_ptr
behaves
similarly for a reference counted object.3 These tools allow automatic
management of heap memory4 without forcing all objects to the
heap.5
1void Frobnicate(std::unique_ptr<Frobber> obj) {
2 // This does not compile, since unique_ptrs can't be copied...
3 // std::unique_ptr<Frobber> obj2 = obj;
4 // ...but can be moved.
5 std::unique_ptr<Frobber> obj2 = std::move(obj);
6 // A shared_ptr can also "take over" a unique_ptr...
7 std::shared_ptr<Frobber> shared_obj = std::move(obj2);
8 // ...and shared_ptrs can be copied to add a reference count.
9 std::shared_ptr<Frobber> shared_obj2 = shared_obj;
10}
Finally, more complex resources can be managed on the stack with similar
behavior. For example, a
grpc::ClientContext
is often stack-allocated in simple RPC clients. This makes it easy for the
reader to understand the scope of a client override:
1bool Frobber::RemoteFrobnicate(const std::string& metadata_value) {
2 // Add metadata tagging only for this call.
3 grpc::ClientContext ctx;
4 ctx.AddMetadata("my_metadata", metadata_value);
5
6 FrobnicateRequest req;
7 FrobnicateResponse resp;
8 grpc::Status status = remote_frobber_stub_->Frobnicate(&ctx, req, &resp);
9 return status.ok() && resp.succesful_frobnication();
10}
Go and defer
Go initially aimed to be a systems language with similar performance to C++. They’ve backed off from that, but it’s still an efficient garbage-collected language6 with good tools for structured concurrency. It also has a flexible resource management idiom.
Idiomatic Go style tends to be verbose and explicit, and its resource management
is no exception. A common resource is
context.Context
.7 Like the C++
grpc::ClientContext
, a context.Context
is meant to be scoped to a logical
group of operations. Unlike in C++, the structure of the language doesn’t
provide that scoping automatically.
Instead, Go resources often provide a cleanup function. The programmer can
[defer
] cleanup to provide a similar guarantee to the stack unwinding in a C++
program.
1func RunOrTimeout(ctx context.Context, t time.Duration, f func(context.Context) error) error {
2 ctx, cancel := context.WithTimeout(ctx, t)
3 // cancel() cleans up ctx's timer and other resources, so should be called as
4 // soon as f(ctx) returns.
5 defer cancel()
6 return f(ctx)
7}
Deferred functions are similar to destructors, but they can also update named
return values before those are passed back to the
caller. For example, the following function will always attempt to close file
,
and will return the first error encountered by WriteString
or Close
. A
slight modification could combine the errors if both operations failed.
1func WriteStringAndClose(file *os.File, contents string) (n int, err error) {
2 defer func() {
3 if close_err := file.Close(); close_err != nil && err == nil {
4 err = close_err
5 }
6 }()
7 return file.WriteString(contents)
8}
Python and with
Python is also garbage-collected, and uses a traditional try
/catch
/finally
mechanism for cleanup:
1def frobnicate(frob_elems):
2 frobber = unchecked_frobber(qux="qux_key")
3 try:
4 for elem in frob_elems:
5 frobber.unchecked_frobnicate(elem)
6 finally:
7 frobber.release_resources()
This is verbose and easy to forget, since exceptions are separate from the return values and signatures of the relevant functions. A more idiomatic and reliable mechanism is to use a context manager.8 For example, file objects have context managers which close the file regardless of how the context manager’s block is left:
1def generator_to_file(filename, next_line_generator):
2 with open(filename, 'w') as file_obj:
3 for s in next_line_generator():
4 file_obj.write(s)
Python context managers use magic functions named __enter__
and __exit__
under the hood, which is pretty unfriendly to library writers. However, Python
wants to encourage this idiom and so its standard library provides
contextlib
to turn
try
/catch
/finally
blocks into context managers!
1# A little more library setup...
2@contextlib.contextmanager
3def scoped_frobber(**resource_kwargs):
4 frobber = unchecked_frobber(**resource_kwargs)
5 try:
6 yield frobber
7 finally:
8 frobber.release_resources()
9
10# ... results in much nicer user code
11def scoped_frobnicate(frob_elems):
12 with scoped_frobber(qux="qux_key") as frobber:
13 for elem in frob_elems:
14 frobber.unchecked_frobnicate(elem)
OCaml and ~f
OCaml is a strict functional language, meaning that a function’s arguments are evaluated before the function itself. I find that easier to understand than e.g. Haskell’s lazy evaluation. In particular it makes resource acquisition predictable.
OCaml is also garbage-collected, so resource release is complex to track. It
uses an idiom similar to Python’s with
blocks: a function is wrapped and
provided with the resource, which is guaranteed to be released after the
function is done.9 OCaml’s type system makes it straightforward to chain
functions together, or use higher-level functions to manipulate the return
value.
For example,
Async_unix.Reader.with_close
calls a function f
, closes the reader, then propagate’s f
’s result.
1open Async
2
3let frobnicate_and_close existing_reader frobber =
4 Reader.with_close
5 existing_reader
6 ~f:(fun () -> Frobber.frobnicate frobber existing_reader)
The full resource acquisition and release idiom looks like you’d expect:
1open Async
2
3let frobnicate_with_file frobber filename =
4 Reader.with_file ~f:(fun reader -> Frobber.frobnicate frobber reader)
upon
A related function is
Async_unix.Reader.file_lines
,
which I used in my Advent of Code harness. In its
implementation,
the file’s contents are turned into an Async.Pipe.t
, and close-on-completion
is accomplished by an Async.upon
call
which waits for the pipe.
The upon
idiom is specific to Async
, and it’s extremely flexible: any
deferred action can trigger another, including cleanup as in this case. However,
it’s much more opaque than a context and much easier to forget.
Imperative secrets
To better understand OCaml, I decided to take a look at the implementation of
Async_unix.Reader.with_close
. It turns out that while the API exposed to the
programmer is functional, the implementation of Async_unix.Reader.close
, which
interacts with a mutable file descriptor and file state, is written like
imperative
code!
OCaml supports an imperative style where references can be mutated or
dereferenced, those actions return unit
10, and functions returning unit
can be chained together with semicolons. Because OCaml is strict, this yields
predictable evaluation order.
I was relieved, to be honest. I find it much easier to understand complex code written as steps than as a chain of monadic binding or function application, followed by some monad-specific way to evaluate the end of the chain. Strict evaluation and the imperative escape hatch are pragmatic choices which make it easier for the larger programming world to engage with OCaml.
Thoughts
In every language, programmers can idiomatically acquire and release resources without distracting from the “business logic” of their application.11
As a library writer, I’m partial to the flexibility and explicitness of C++. Because constructor and destructor call times are explicit, it’s possible to write API contracts enforced by the basics of the language without too much boilerplate. The language encourages libraries to release resources as soon as possible.
By contrast, the garbage-collected languages all require the library writer to think hard about when non-memory resources should be released, since memory itself is deferred to the garbage collector. We have to depend on conventions specific to those languages.
That’s not a bad thing. For library code especially, it’s important that future readers can grasp contracts and subtleties in the implementation. Using the language idiomatically makes future maintenance easier and improves a library’s chances of long-term success.
--Chris
Appendix: bash and trap
I have written my fair share of “load-bearing” bash scripts, for which I assume an eternal punishment awaits. Even bash has an idiom of sorts for resource acquisition and release.
One approach is to trap
cleanup.
1FILE="$(mktemp)"
2trap 'rm ${FILE?}' EXIT
3
4# Do the work using filename ${FILE}
Another involves deferring cleanup to the kernel’s reference count.
1FILE="$(mktemp)"
2exec {FD}<>"${FILE?}"
3rm "${FILE?}"
4
5# Do the work using fd ${FD}, which is closed when the shell exits.
Finally, a mutex of sorts between processes on a single machine with a local
filesystem can be implemented via flock
and a subshell.
1(
2 flock "${FD?}";
3 # Do critical section work
4) {FD}>/var/lock/lockname
Note that the last approach is unreliable if your filesystem depends on the network, and unreliable locks are worse than no locks.
absl::MutexLock
comes from Abseil, standard C++ tooling used at Google. ↩︎You won’t be surprised to learn that C++ has sharp edges. It is possible to release stack space without calling destructors, e.g. if you use
setjmp
/longjmp
. ↩︎The reference semantics underlying the return, move, and copy rules are subtle enough to deserve their own post. ↩︎
C++ puts the programmer in full control, which makes it possible to do all sorts of dangerous things.
std::unique_ptr
andstd::shared_ptr
are library objects, and the programmer is not required to use them. Even if they are used, you can dereference a null pointer or access freed memory by using a moved-from reference. Withstd::unique_ptr
andstd::shared_ptr
it is more difficult to write a double-free bug, at least. ↩︎Garbage-collected languages, by contrast, offer a stronger memory guarantee. They disallow weak references (bare pointers or
std::weak_ptr
) which circumvent reference counting. Garbage collection can also detect cyclic references, whereas simple counting cannot. Nothing is free, though: as discussed below, for more complex resources, the unpredictable cleanup time inherent in garbage collection pushes work up to the programmer. It’s also comparatively difficult for the language runtime to avoid heap allocations. ↩︎Go hides its garbage collection well through clever implementation and escape analysis. ↩︎
I told you Go was verbose. ↩︎
The term “context” is used a lot! ↩︎
Also as with
contextlib
, it’s up to the library writer to remember exceptions and handle them. ↩︎unit
is an OCaml type which represents “no value” or “void.” See Real World OCaml for more info. ↩︎Go is verbose enough to stress this claim. ↩︎