The C4 Project: Roadmap

The Roadmap

This page contains a high-level analysis of future directions for the C4 project. We provide it as a reference for potential contributors, or as inspiration for similar projects; it isn't a promise or guarantee of future functionality.

This page complements the individual projects' GitHub issue pages, which contain more concrete improvement suggestions.

This page (or, at least, this paragraph) was last updated on 16 February 2021.

Overview

These are the main areas of further work we have picked out for C4:

Language targets
Beyond Litmus
Fuzzer actions
More backends
Throughput
Richer analysis

Language targets

Hypothesis: C is possibly not the best target for finding bugs in concurrency implementations.

Possible reasons why C might not be a good target include:

conservatism in how C compilers approach atomics (LLVM has a lot of hard gating adjacent to FIXME comments);
maturity of compiler implementation efforts (in general, and specifically for atomics which are now nearly 10 years old);
some compilers we might want to target don't support C (eg. MSVC).

Suggestion: consider expanding C4 to handle other languages.

The main thing that would need to change is c4f's output code, which targets a sort-of Litmus/C creole (that both Litmus7 and c4f-c know how to turn into C), and may need teaching to target new languages. See 'Beyond Litmus'.

Another thing to bear in mind here is that c4f internally uses a language called FIR, effectively an ad-hoc C subset with metadata tags and tree simplifications. This means that any support c4f has for output languages needs to fit in the confines of FIR.

Possible other languages include:

C++ c4f #181: Supporting a subset of C++ that aligns well to c4f's internal language (FIR) would be relatively straightforward, and would immediately pay off with support for compilers such as MSVC which at time of writing do not support C atomics. Both C and C++ likely trigger the same codepaths in compilers, though, so this won't greatly improve compiler coverage.
C++ with transactional extensions: Both c4f's internal language (FIR) and external Litmus/C output have some support for C++ transactions, as we considered supporting them at one point. The support didn't get finished (partly because it would imply the same changes needed for targeting C++), and at time of writing only GCC seems to support C++ transactions, but if this situation changes in the future then transactional extensions may be interesting to target.
OpenCL: OpenCL's concurrency model is fairly similar to C's, with additional workgroup partitioning we could introduce into FIR. OpenCL compilers are arguably more likely to exhibit bugs than C compilers. Running tests may require some engineering to replace Litmus with an OpenCL kernel runner.
Rust: Rust targets LLVM, and therefore a lot of LLVM's atomics set-up, but its use of ownership, move semantics, and 'fearless concurrency' make it an interesting possible target. Looking into Rust-on-Cranelift rather than Rust-on-LLVM may be interesting.

Beyond Litmus

Hypothesis: we can find more bugs if we target language that don't fit easily into the Litmus language.

Examples include:

compound data types (structs, unions, complexes);
non-thread functions;
thread spawn patterns other than a fixed set of threads (including compute kernels);
returns and other not-quite-Litmus-safe control flow types.

Many of these have appeared in concurrency-related bug reports, and we would like to investigate them (eg. by adding support for generating them to c4t), but they don't fit well into the Litmus/C format used in almost every stage of the pipeline.

Another area where we may need to break with Litmus compatibility is supporting other languages.

Suggestion 1: make C4 carry proprietary Litmus extensions (and delitmusify them). c4f #164

We could extend C4's Litmus/C language such that, while it remains compatible with input from Memalloy, it has extensions to capture some of the features above. This would break compatibility with Litmus7, and so we might need to add a shim layer that addresses the incompatibility or target a different backend.

c4f already has a rudimentary 'delitmus' command that converts Litmus/C to C with auxiliary information condensed into a separate machine-readabler file. This could be a good starting point for negotiating between an extended Litmus/C and backends.

Suggestion 2: migrate away from litmus entirely.

A more extreme approach might be to decouple FIR from Litmus completely, and have c4f take something else as its main test input and output. This might be a good approach, but would require conversion from Memalloy output (or Memalloy support).

Fuzzer actions

Hypothesis: c4f stresses compilers, but would be more useful if it stressed their concurrency support in a more targeted manner.

A cheap and easy way of improving C4 is to add more actions to c4f. While this requires some programming against c4f's deep module soup, it is an almost endless source of novelty and potential bug finding.

Suggestion 1: add concurrency patterns.

Almost all of c4f's actions at time of writing are either control-flow related (which, perversely, are not too bad at finding control-flow bugs!) or introduce atomic actions in redundant or sequentially-minded patterns. There is a lack of actions that actually graft in atomic action patterns that perform meaningful if semantically redundant communication between threads.

For instance, we could introduce a pattern whereby a thread is set to spin on a generated variable that is then changed in another thread. Failure to perform the change would cause the thread to spin forever. (This would, however, manifest as a run timeout rather than a flagged result, and so is probably not a useful pattern in the current C4 setup where run timeouts are not always indicative of errors).

Backends

Hypothesis: expanding c4t to target more backends will help free up more possibilities.

At time of writing, c4t supports testing via litmus7 only; while it has basic understanding of how to use simulators such as herd7 and RMEM, they aren't fully wired into its testing workflow (mainly because they require more set-up than just handing litmus tests to a program). Supporting other backends would have several advantages:

Adding simulators to c4t would allow for a trade-off between exhaustiveness and freedom from machine quirks on the one hand, and higher turnaround and possibly greater ISA coverage on the other.
Non-Litmus7 simulators may be more amenable to accepting test-cases that go beyond the usual Litmus test format.
Other backends may have useful features, such as partial results collection or running for a particular amount of time rather than iterations, that improve throughput or resilience.
Defend against bugs in backends, either by helping us positively detect them or providing a form of differential testing.

c4t does have some progress towards a multi-backend setup:

Backend jobs can specify input and output targets (eg, maps Litmus/C to C, or C to assembly Litmus, etc.), but most parts of c4t still hardcode in an expectation that we map from Litmus/C to executables.
A backend for running c4f's delitmusifier exists, which could be useful if c4t learned how to chain backends (so, it could chain the delitmusifier onto another backend that expects C).

Suggestion: support more backends.

Some backends that might be nice to support include:

herd7: Well-known simulator for Litmus tests; low coverage of instruction sets, but mature.
RMEM: Also a simulator for Litmus tests; seems more constrained in which ISAs are supported, but supports much more of each. We have experimented with using RMEM to test with c4f and Memalloy manually, but didn't get far; there will still need to be some sanitisation of the output of compilers, loop unrolling etc. Consider exploring RMEM's ELF input support.
phenolphthalein: Side-project of one of the c4 authors which is trying to be a replacement for litmus7 in c4-style tests. At time of writing, it's not as good at litmus7 for exploring weak behaviour, nor is it under any semblance of active or officially sanctioned behaviour. That said, it's mentioned on this roadmap in case it could be useful.

Throughput tuning

Hypothesis: Throughput is important because of the nondeterministic nature of both fuzzing and concurrency bugs, and C4's throughput can be improved.

Being a research project, C4 has not been designed with throughput in mind (except for the high degree of parallelism in c4t, and even then there is room for improvement).

Suggestion: re-engineer parts of c4f and c4t for performance.

Concrete areas of improvement include:

Profiling c4f and c4t.
Rewriting, or at least re-architecting, c4f for efficiency. Currently, c4f is written in a very pure/immutable-focused form of OCaml, and this likely impacts its performance especially where things construct, traverse, and tear down linked lists.
Bin-packing runs of concurrency tests such that, when a test doesn't exhaust all cores, it can run in parallel with another test.
Removing synchronisation between the test director and machine node in c4t, such that the director can perform set-up for the next test cycle while waiting for the node to run the current one. c4t #118
Investigating more efficient ways of transferring tests to, and results from, remote machines.
Offloading as much of each test's runtime to the remote machine as possible, saving on the amount of code that needs to be copied to remote machines. (This is one of the design decisions with phenolphthalein: it has one big runtime that loads small tests in dynamically.)

Richer analysis

Hypothesis: It will always be the case that, when we write C4 papers, we need analysis and statistics that C4 doesn't yet support.

c4t has a lot of support for pulling out aggregate metrics, statistics, saved tests, and other useful artefacts from test runs. This support has slowly accumulated over the process of trying to write papers on C4, where we have always found that we need more information about tests than we have on hand.

Consider connecting c4t to a SQL database. c4t #129

This is a heavyweight suggestion, especially since it likely introduces either an external dependency or (for SQLite) a C FFI dependency into c4t, but it is a highly future-proof and efficient way of handling the problem of needing rich analysis.

Effectively, doing this would replace c4t's current JSON file based statistics aggregation, and instead have such requests be served by SQL queries on a database full of test cycle information.