Abstract

Traditional imperative synchronous programming languages heavily rely on a strict separation between data memory and communication signals. Signals can be shared between computational units but cannot be overwritten within a synchronous reaction cycle. Memory can be destructively updated but cannot be shared between concurrent threads. This incoherence makes traditional imperative synchronous languages cumbersome for the programmer.

The recent definition of sequentially constructive synchronous languages offers an improvement. It removes the separation between data memory and communication signals and unifies both through the notion of clock synchronised shared memory. However, it still depends on global causality analyses which precludes procedural abstraction. This complicates reuse and composition of software components.

This report shows how procedural abstraction can be accommodated inside the sequentially constructive model of computation. We present the Sequentially Constructive Procedural Language (SCPL) and its semantic theory of policy-constructive synchronous processes. SCPL supports procedural abstractions using policy interfaces to ensure that procedure calls are memory safe, wait-free and their scheduling is determinate and causal. At the same time, a policy interface constrains the level of freedom for the implementation and subsequent refactoring of a procedure. As a result, policies enable separate compilation and composition of procedures.

We present our extensions abstractly as a formal semantics for SCPL and motivate it concretely in the context of the open-source, embedded, real-time language Blech.

I. Motivation & Contribution

Production cost and energy efficiency requirements drive embedded systems towards consolidated platforms utilising multicore processing units that share common resources. Software architectures for such systems are growing in complexity and integration effort. It is imperative to maintain a deterministic and predictable interplay between software components and prevent data races or deadlocks by design, especially for safety-critical embedded systems. One approach for synchronising concurrent systems are classical locking mechanisms such as mutexes or spin-locks. However they are inefficient for frequent, fine-grained communication and hard to debug.

Where the focus is on control-flow, imperative synchronous programming (ISP) languages such as Esterel [1], Quartz [2], PRETC [3], ForeC [4] or Céu [5] are a good choice to develop

An abridged version of this report has appeared in: Forum on Specification and Design Languages (FDL 2020), Kiel, Germany, 15-17 September 2020.
reactive, safety-critical embedded software. Based on precise mathematical foundations, ISP utilises mechanisms such as logical clocks and clock-aligned signals for inter-module synchronisation and communication. The compiler verifies determinacy by static program analysis and thus relieves the developer from the burden of manually solving synchronisation problems.

However, traditional ISP languages come with rather specialised syntax and rigidly synchronised signals as the only form of decoupling, which precludes sharing of complex data structures and multiple updates within a clock cycle. While current ISP languages enable the building of complex systems through white-box [6]–[8] or grey-box [9] modules they do not properly support black-box abstraction. Notably, “.. the Esterel language has no mechanism for separate compilation or pre-compiled component libraries..” [10, p. 34]. Procedural abstractions for ISPs exist but they are not an integrated part of the source language and its well-defined semantics. The modular structures generated by compilers [11]–[13] for ISP are based on shared memory and cannot be coded in ISP itself. This makes SP programming an archane experience for main-stream programmers familiar with procedural abstractions.

Recently, the sequentially constructive model of computation, which has given rise to ISP languages such as SCCharts [8], has reconciled shared memory and signals. In [14] this approach has been extended to policy-coherent shared objects that can encapsulate general abstract data structures, but without considering modularisation. In this report we show that using those techniques, ISP can achieve black-box procedural abstraction on general abstract data structures. We present the core language SCPL and its structural operational semantics in a generic memory model of structured data whose coherence is protected by policy interfaces. An SCPL process \( P \) which is constructive with respect to a given interface \( C \) guarantees memory safe, deterministic and deadlock-free execution in all memory stores coherent for this interface \( C \). Together with its interface, a pair \( (P, C) \) then constitutes a generic unit for procedural composition. A procedure can be mapped safely into another memory context \( C' \) through interface extensions \( f : C \subseteq C' \) that preserve policy-constructiveness. The scheduling of a procedure \( (P, C) \) in its calling context is wait-free and entirely determined by its interface \( C \) and the map \( f \). The procedure \( P \) is treated like a black-box and can be replaced by any step function functionally equivalent to \( (P, C) \). In this way, procedures operating on shared data can be assembled as first-class, black-box actions just like the reading or emission of a signal in conventional ISP. SCPL reconciles the high-level syntax of ISP with intermediate and low-level computational structures under a uniform and constructive synchronous semantics. It provides the same safety guarantees and predictability as conventional ISP but in the shared memory context familiar to main-stream programmers. It promotes synchronous programming less as a language but as a “mark-up” of conventional syntax via scheduling interfaces.

The Blech language developed at Bosch is among the first ISP languages with first-class procedures that does not depend on white-box inlining. However, there is no formal semantics for it, yet. In this report, we fill this gap. In doing so we adapt and extend the theory of [14] and provide the calculus SCPL. It can serve as an intermediate format not only for Blech but also other sequentially constructive languages, like SCCharts [8].

II. INTRODUCTORY EXAMPLE

Let us consider a simple reactive system and its implementation in Blech to illustrate the concepts of sequentially synchronous reaction, synchronized shared memory and black-box procedural abstraction. Figure 1 depicts a reactive system in Blech syntax. System consists of two decoupled procedures (called activities in Blech) implementing a controller component
Ctrl, a component Sample for external data input and imports module MAC. The latter implements a multiply–accumulate operation in software by the two decoupled procedures MacOp and Accumulate. The MAC operation in MacOp computes the product of two multipliers \( m_1 \), \( m_2 \) and adds its result to an accumulator \( \text{Accumulate} \), i.e., \( \text{acc} \leftarrow \text{acc} + m_1 \cdot m_2 \). Usually, MAC operations are implemented by highly-optimised circuits and are part of DSP and floating-point execution units [15], [16].

The procedures of System communicate via shared memory, passed in procedure calls by reference. Module System imports MAC and shares its data structures. Procedure Sample interacts with the external environment, providing a pair of values of type float64 fetched as inputs from the environment via external function get. Note the multiple writes to \( \text{arr} \) in one reaction. The controller Ctrl instantiates the shared data structures \( \text{arr} \) and \( \text{s:State} \) and concurrently combines calls of the activities Sample and Accumulate. Note, that Sample and Ctrl are declared as singleton, meaning that there may exist at most one instance (thread) of such a subprogram in a concurrent context. This is usually required by threads that interact directly with external or limited resources.

MAC declares the accumulator state by the structure State, i.e., by two sample values \( m_1, m_2 \) and the accumulator acc, typed as double precision floating-point numbers. Each procedure declares its in- and output parameters. In Blech this is expressed by two separate lists for input and output, e.g., the interface of procedure Accumulate declares \( \text{arr: [2]float64} \) as an input that can be read and \( \text{s:State} \) as an output that can be read and written during execution. The procedure Accumulate concurrently combines the mapping of the array \( \text{arr} \) to \( m_1 \) and \( m_2 \), a call of procedure MacOp(s)(s.acc) to perform the multiply-accumulate, and a third thread \( \text{await s.acc > CAP} \) to detect that acc exceeds the threshold, which triggers the forced termination of the first two threads. The preemption is indicated by indicated by the keywords weak in the cobegin parallel construct.

All activities are reaction procedures which operate in synchronous lock-step. At each tick
of a global logical clock, each activity performs one iteration of its behaviour until it either
pauses at an await statement waiting for another cycle, or terminates by reaching the end
of the activity. In the example, exactly two data samples are multiplied and accumulated
into the state \( s \) during each reaction cycle, called a macro-step. The physical timing which
relates the clock iteration and the data sampling determines how much computation can be
performed during each macro-step. This needs to be calibrated by timing analysis. Assuming
this is done, each reaction is considered to happen \emph{instantaneously} in zero time (Synchrony
Hypothesis) in the high-level semantics.

Our example implements a stream-processing function that could also be programmed in
a data flow language like Lustre. Such languages can provide a higher level of abstraction
when control-flow is determined by static functional relationships between signals. Yet, in
many practical applications, control-flow is not static or functional and more naturally de-
composed through interactions in shared memory. For instance, the Accumulate activity in
Fig. 1 must terminate when the accumulated signal on \( s\text{.acc} \) surpasses a threshold value \( \text{CAP} \). This is achieved by the watchdog thread await \( s\text{.acc} > \text{CAP} \) running in parallel with \( \text{MacOp}(s)(s\text{.acc}) \). The latter, which is labelled as a weak thread is preempted by the
watchdog when the threshold condition is detected. When \( \text{MAC}\text{.Accumulate} \) terminates, then
also the main activity \( \text{Ctrl} \) terminates and returns a final value. Preemption and terminating
stream processes are not naturally coded in pure Lustre.

Control-flow constructs, of course, can be simulated in data flow, see e.g. SCADE [17].
Yet, this introduces the problem that efficient control-flow code exploiting shared memory
and inplace destructive updates must be “rediscovered” by the compiler. Where the shared
memory procedural code such as in Fig. 1 can be provided by the programmer it is often
more efficient. The catch is that procedural imperative code does not give the same degree of
safety guarantees that come with domain-specific languages such as Lustre, Signal or Esterel.
Shared memory multi-threading introduces many tangled risks of data races and memory
incoherence that seem hard to avoid. In this report we aim to show that the sequentially
constructive model of synchronous computation can be applied to marshall procedural multi-
threaded code in much the same way that the static analyses of domain-specific ISP languages
safeguard interactions of synchronous modules.

To safeguard against memory races, Blech functions and procedures are implicitly wrapped
by \textit{policy interfaces} that specify the standard data types of their parameters, the memory access
methods applied to these and their causal ordering. A policy interface acts as an assumption-
guarantee contract to enable black-box abstraction and separate compilation. For instance,
the declaration \( \text{MacOp}(ms)(a \text{ shares } ms\text{.acc}) \) tells us that the parameter \( ms \) in the first list
is read-only whereas \( a \) in the second list is potentially written. This information is used to
resolve the scheduling order in Accumulate where the procedure call \( \text{MacOp}(s)(s\text{.acc}) \) is
concurrent (the \texttt{cobegin...with...end} construct) with the watchdog testing \( s\text{.acc} > \text{CAP} \).
Assuming the data type \texttt{float64} of the shared cell \( s\text{.acc} \) specifies a regular data flow
variable, the policy interface of the calling site Accumulate will impose the “write-before-
read” causal order \( s\text{.acc}\text{.wr} \rightarrow s\text{.acc}\text{.rd} \) to eliminate data races. As a consequence, the call
\( \text{MacOp}(s)(s\text{.acc}) \) will be scheduled before \( s\text{.acc} > \text{CAP} \). As scheduling precedences go,
the parameter lists in Blech procedure headers are analogous to the separation between \texttt{input}
and \texttt{inputoutput} signals in Esterel module interfaces. The essential difference is that Blech
parameters may be destructively updated during a macro-step, which is not possible in Esterel.
Because of that the MacOp interface also contains a \texttt{sharing} specification \( a \text{ shares } ms\text{.acc} \).
This tells us that the procedure implements all memory accesses to $a$ and to the substructure $m.s.\text{acc}$ of $m.s$ in a single thread. This makes it possible to alias the two parameters in the activity call $\text{MacOp}(s)(s.\text{acc})$ of $\text{Accumulate}$ and perform the operation instantaneously in the same memory cell. While Esterel would detect a causal cycle, in sequentially constructive Blech this is memory safe, because the write access $s.\text{acc}.wr$ is sequentially ordered after the read $s.\text{acc}.rd$.

The formal mechanism to control the safe instantiation of activities is an extension relation on interfaces. The compiler checks that the memory mapping applied at the procedure call establishes a formal extension of the interface of the callee to that of the caller. This is indicated by the blue arrows in Fig. 1. For instance, the call $\text{MacOp}(s)(s.\text{acc})$ implies a mapping $[s/m.s, s.\text{acc}/a]$ of the (formal) memory paths of the callee to the (actual) paths of the caller. This aliasing is a proper extension because (i) the data types match up and (ii) the interface of the callee $\text{MacOp}$ exports a sharing between $a$ and $m.s.\text{acc}$.

The use of policy interfaces for procedure instantiation and scheduling likewise applies to external procedures for which the code is not available. The external function $\text{get}$ imported by System returns a new, different sample value each time it is called. This would induce nondeterminism when called from concurrent threads. Therefore, it is flagged by the keyword $\text{singleton}$ which is part of the policy interface of $\text{get}$ and protects it from concurrent accesses.

Policy interfaces also actively help to resolve the scheduling order of procedure calls. For instance, in Ctrl the memory cell $arr$ is in the output list of procedure call $\text{Sample}() (arr)$ at the same time in the input list of $\text{MAC.Accumulate}(arr)(s)$. Assuming the data type $\text{arr}:[2] \text{float64}$ specifies a regular data flow variable, we apply the “write-before-read” causal order to eliminate data races. The “write-before-read” precedence is part of the policy interface of memory cell $arr$ and forces $\text{Sample}() (arr)$ to be scheduled before $\text{MAC.Accumulate}(arr)(s)$. In a similar way, the three threads in $\text{Accumulate}$ are scheduled with the help of policy interfaces.

### III. The Sequentially Constructive Procedural Language Kernel SCPL

The abstract syntax of SCPL is given in Fig. 2. It provides a set of core operators to represent synchronous semantics of shared memory concurrency and procedural abstractions. A program $M$ consists of a sequence of declarations and a pure control flow $P$. Declarations can be (hierarchically nested) data structures $D_1$, encapsulating scalar data variables and procedures $D_2$. All declarations are assumed to be statically resolved and thus do not nest with control flows. The grammar focuses on the pure control flow in which the sub-language of value expressions $e$ is abstract, assuming side-effect free, atomic computations of (typed and possibly structured) values. Since the imperative semantics of expression evaluation can be coded in terms of regular control flow we do not need to add imperative features to the expression sub-language. Since the focus of this report is to study semantics we may assume that type checking has been done and all memory structures are statically allocated and referenced via fully qualified names, which are called memory paths.

Control-flow is completed by the statements $\text{exit } k$ where the completion code $k \in \{0, 1, 2\}$ distinguishes three forms of continuation. The instance $\text{exit } 0$, also written $\text{nothing}$, instantaneously completes by termination and continues with any sequentially down-stream statements in the current thread. The instance $\text{exit } 2$, called $\text{exit}$, completes instantaneously but returns
\[ M ::= [D_1|D_2]^* P \]
\[ D_1 ::= \text{struct} D_1^+ | \text{var} x = e \]
\[ D_2 ::= \text{proc}(x_1, x_2, \ldots, x_n) = P \]
\[ P ::= \text{exit} k \]
\[ \quad | \text{trap} P \]
\[ \quad | P ; P \]
\[ \quad | P \mid P \]
\[ \quad | \text{if} e \text{ then } P \text{ else } P \]
\[ \quad | P \rangle P \]
\[ \quad | \text{let} x = o.m(e) \text{ in } P \]
\[ \quad | \text{loop} P \]
\[ \quad | \text{run} p(o_1, o_2, \ldots, o_n) \]

Fig. 2. Abstract syntax of SCPL main program \( M \), declarations \( D_1 \), \( D_2 \) and control-flow statements \( P \), where \( k \in \{0, 1, 2\} \) is a completion code, \( e \) is a pure value expression without side effects, \( x \) a value variable, \( o \), \( o \) i memory paths, \( m \) a method name and \( p \) a procedure name.

control to an immediately enclosing trap...end which introduces a trap scope. The third completion level exit 1, abbreviated pause, does not terminate but completes by pausing. It waits for the next macro-step in which it will then terminate instantaneously. These exit statements are well known from Esterel [6].

Sequential composition \( P_1; P_2 \) first executes \( P_1 \) until it terminates and then continues with \( P_2 \). When \( P_1 \) exits or pauses, then also \( P_1; P_2 \) exits or pauses, respectively. The prescriptive program order between \( P_1 \) and \( P_2 \) implements a sequentially constructive semantics differing from traditional synchronous languages such as Esterel. In Esterel, the semicolon acts like a parallel composition with an extra control wire that permits \( P_2 \) to start only when \( P_1 \) has terminated. Since the statement blocks \( P_1 \) and \( P_2 \) are parallel, there may be a causality cycle between them in Esterel. In SCPL, the program \( P_1; P_2 \) can never have a causality cycle unless this it already exists in \( P_1 \) or \( P_2 \).

In the parallel composition \( P_1 \mid P_2 \), both processes \( P_1 \) and \( P_2 \) are executed to completion, concurrently in separate threads. Their statements are interleaved according to the policy interfaces associated with method and procedure calls as described below. The completion level of the parallel \( P_1 \mid P_2 \) is the maximum of the completion levels of both threads. Specifically, the parallel terminates \((k = 0)\) when both threads \( P_1 \) terminate. It exits \((k = 2)\) if one of the threads exits, though letting the other thread run to completion. Otherwise, in an instant where one of the threads pauses \((k = 1)\) and the other does not exit \((k = 2)\), the parallel construct also pauses. Again, both threads run to completion. This behaviour coincides with the standard semantics of synchronous parallel AND from languages such as Esterel (for general completions \( k \in \mathbb{N} \)) or SCCharts (for completions \( k \leq 1 \)).

The construct if \( e \) then \( P_1 \) else \( P_2 \) expresses conditional branching of control flow. First, the condition \( e \) is evaluated and then, depending on its value, either \( P_1 \) or \( P_2 \) is instantaneously executed. In particular \( P_1 \) and \( P_2 \) may overwrite memory that was read by \( e \) because these writes take place sequentially after the condition test. Notice, we ensure that the conditional \( e \) is evaluated strictly before any of the branches \( P_1 \) and \( P_2 \) can start. Hence, the evaluation of \( e \) cannot depend on any statement inside \( P_1 \) or \( P_2 \). This sequentially constructive semantics
makes the conditional in Blech behave like in SCCharts rather than Esterel. Furthermore, the branches $P_1$ and $P_2$ are mutually exclusive, like in SCEst. In Esterel, for contrast, a conditional $\text{if } e \text{ then } P_1 \text{ else } P_2 \text{ end}$ can be equivalently coded as a parallel composition of one-sided conditionals $\text{if } e \text{ then } P_1 \text{ end } | \text{ if } \neg e \text{ else } P_2 \text{ end}$. This can create causality cycles between $P_1$ and $P_2$ in Esterel, which do not exist in SCPL or SCEst.

The operator $P_1 \parallel P_2$ is a parallel OR with built-in preemption sequentialised from left to right. We first run $P_1$ to completion, disregarding any policy constraints between $P_1$ and $P_2$. Once $P_1$ completes, there are three possibilities depending on its completion code. If $P_1$ terminates or exits, then $P_2$ is instantaneously aborted and the construct terminates or exits, respectively. This is the strong abort behaviour of the construct. If however $P_1$ pauses in state $P_1'$, then control passes instantaneously to $P_2$ which is permitted to complete the macro-step. When $P_2$ terminates (exits), the configuration $P_1' \parallel P_2$ terminates (exits) as well, thereby weakly aborting $P_1$. If $P_2$ pauses in state $P_2'$, then the construct pauses in state $P_1' \parallel P_2'$. This generates a cycle-wise, round-robin sequential left-to-right evaluation which cannot be expressed in Esterel. In SCCharts \[8\], which is sequentially constructive, the operator $\parallel$ does not exist but can be coded. Céu \[5\] does implement this operator as $\text{par/or}$ but it does not have any equivalent of $\parallel$ nor a notion of causality.

The statement $\text{let } x = o.m(e) \text{ in } P$ calls a method $m$ in a structure identified by memory path $o$. Methods are actions on the memory, defined outside SCPL and not to be confused with procedures. The value of expression $e$ is the (composite) input argument of the method and the return value is bound to the value variable $x$. Sequentially after the atomic execution of $o.m$, which may have a side-effect on memory, control is passed to $P$, which may depend on the return value $x$. The value of $x$ is not stored in memory. It is thread local and its scope is $P$. For example, an assignment $x := y + 13$ where $x$ and $y$ are memory references would be represented by the sequence $\text{let } x = y.\text{rd}() \text{ in } \text{let } x = x.\text{wr}(x + 13) \text{ in } \text{nothing}$. For convenience, we will abbreviate $\text{let } x = o.m(e) \text{ in } P$ as $x := o.m(e); P$ when $x$ is used in $P$ and as $o.m(e); P$, otherwise.

Control flow loops are expressed as $\text{loop} P$. The thread $P$ is run until it terminates, whereupon $P$ is repeated, instantaneously, from the beginning. We can build terminating rules using $\text{loop}$ and $\text{trap}$ as we will see in the next section.

A procedure call $\text{run } p(\bar{s})$ instantiated from a static declaration $\text{proc } p(\bar{x}) = P$ starts the precompiled procedure body $P$, passing a list $\bar{s}$ of memory paths by reference. The mapping $f = [\bar{s}/\bar{x}]$ is subject to static typing and policy constraints as described below. The callee $p$ may access the memory through method calls of the given arguments. The procedure is a generic control-flow process that is statically associated with the name $p$ as an external host code or a precompiled SCPL process. The procedure is run atomically until it completes by either pausing or terminating. Procedures provide a general form of behaviour abstraction in which arbitrary control flow (subject to interface restrictions) can be encapsulated. Concurrent instantiations $\text{run } p(\bar{s}_1) | \text{run } p(\bar{s}_2)$ will execute the same program code in concurrent threads. This is permitted provided the procedures do no create data races on the memory structures shared between $\bar{s}_1$ and $\bar{s}_2$. To this end, each procedure $p$ has an associated causality interface $\pi_p$, called a $p$-policy, which specifies its synchronisation with the memory. The interface $\pi_p$ is instrumental not only to avoid races but also in order to guarantee memory safe, wait-free execution of each individual call. This will be explained in detail later.
Assignment \[ x = y \text{ op } z \]

let \( v_1 = y \text{ rd}() \) in

let \( v_2 = z \text{ rd}() \) in

let \( \_ = x \text{ wr}(v_1 \text{ op } v_2) \) in

nothing

Conditional statement if \( c \) then \( P \) else \( Q \) end

if \( c \) then \( P \) else \( Q \)

Loop repeat \( P \) until \( c \) end

trap(\( \text{loop}(\ P \ ; \ \text{if } c \ \text{then} \ \text{exit}) \))

Sequential composition \( P \ ; Q \)

\( P \ ; Q \)

Reaction step await \( c \)

repeat pause until \( c \) end

Preemption when \( c \) abort \( P \) end

await \( c \ \} \ P \)

Activity call run \( A(\vec{\sigma}_1,\vec{\sigma}_2) \)

run \( A(\vec{\sigma}_1,\vec{\sigma}_2) \)

Co-begin cobegin \( P \) with \( Q \) end

cobegin \( P \) with \( Q \) end

\( P \ \} \ \text{await} \ t \ | \ (Q \ ; t = \text{true}) \)

Co-begin with weak cobegin weak \( P \) with \( Q \) end

cobegin weak \( P \) with \( Q \) end

Fig. 3. Rewrite rules that transform a Blech program into an SCPL program. The result is obtained by applying these rules recursively. A few remarks: In SCPL expressions are purely built on values. Therefore, assignment is decomposed into reading memory cells, combining the obtained valued and then writing the result back into memory. Analogously, we assume every Blech condition expression \( c \) is mapped to a boolean value \( c \) in SCPL by prepending a corresponding assignment. If the else branch is missing, nothing is used in the else-case of the SCPL program. SCPL only has an infinite loop. Therefore it is wrapped in a trap and a guarded \text{exit} is added as the last statement to the loop's body. Conceptually, this construction separates control flow of the loop and the evaluation of the condition inside the loop. This makes the formal semantics simpler. In BLECH a whitespace (or semicolon) token is used to separate statements whereas a semicolon must be used in SCPL. An SCPL procedure has only one parameter list which comprises all input and input-output parameters. By means of a policy one can ensure that input parameters can only be read. Co-begin with strong branches is directly mapped onto \{. Using a fresh termination flag \( t \) we can encode a weak branch in SCPL. Obviously, the same construction works if the second branch is weak or both branches are weak.

IV. MAPPING BLECH TO SCPL

SCPL is designed to be a compact formalism for studying semantics. It is not a practical programming language in any specific domain. In this section we argue that semantics of languages used in practice may be formally explained by mapping them to SCPL. This can be shown for Esterel, SCEst and in particular we consider BLECH[18] here.

Blech is an imperative, synchronous language that offers high-level abstractions and safety guarantees for reactive, real-time embedded programming. BLECH compiles to C code, which may be integrated into existing projects or simulation frameworks. What makes BLECH interesting for our purposes is that it is among the first ISP languages with first-class procedures that does not depend on white-box inlining. However, there is no formal semantics for it, yet. As indicated in Sec. II a simple form of policy interfaces is already built into the language by design to support common programming patterns and guarantee an easy-to-understand behaviour.

The rules in Fig. 3 map BLECH syntax to SCPL. We take the liberty to also map to a mix of SCPL with BLECH when it is clear how to further transform the intermediate term to pure SCPL. Figure 3 only shows the essential control flow statements of BLECH. More constructs are found in our example in Fig. 1: the \text{return} being the last statement in an activity can be understood as a one-time assignment to a special (anonymous) output parameter. Furthermore, a \text{co-begin} with more than two branches can be rewritten as a nested \text{co-begin} with only two branches at each level by grouping together strong and weak branches. Other constructs in BLECH that

\[ \text{blech-lang.org} \]
are not shown in our example, too, may be rewritten inside the language itself: A while-loop can be rewritten as a repeat-loop. The Blech statement \texttt{when \ c\ reset\ P\ end} is a shorthand notation for an abort inside a loop that restarts P \emph{if it was preemted}. Blech functions can be mapped to SCPL procedures in the same way as activities. Finally note, that Blech allows arbitrary expressions on the right hand side of assignments or as input arguments. They can be rewritten in static single assignment form and therefore our mapping assumes only variable names in procedure arguments and only a binary right hand side of an assignment.

The focus of this work is to explain the operational semantics of a program. Thus we do not consider Blech constructs that are only used for structuring a program such as do-blocks, variable declarations, namespaces and modules.

V. THE SCPL MEMORY MODEL

A SCPL program is executed in a store structured into a system of memory contexts. These constitute a name space to access and manipulate the data stored in the memory cells of the store. A context is a set of typed memory paths \(O\). For each \(o:t \in O\) a set \(M(t)\) of context methods is determined by the path type \(t\), which is statically fixed. The memory paths are the entry points into the memory and provide a primitive notion of locality and separation. The set of qualified methods (\(q\)methods) is \(Mtd(O) = \{o.m \mid o:t \in O, m \in M(t)\}\).

\textbf{Example 1} The memory paths \(O = \{n: \text{int}, b: \text{bool}, p: \text{point3D}, \ldots\}\) may be typed references to scalar memory cells such as integers, booleans or composite data structures such as point objects. The access methods on scalar memory will typically be

\[ M[\text{int}] = \{\text{rd:unit} \rightarrow \text{int}, \text{wr:int} \rightarrow \text{unit}\} \]

for reading and for writing and

\[ \text{translate}_x: \text{float} \rightarrow \text{unit} \in M[\text{point3D}] \]

for translating the point object in \(x\)-direction. Substructures of composite data can be modelled either through memory paths or via methods. For instance, take a method call \(p.x: \text{wr}(v)\) for writing the \(x\) coordinate of \(p\). We may consider \(p.x: \text{float32} \in O\) as a path and \(\text{wr:float32} \rightarrow \text{unit} \in M[\text{float32}]\) as the method. Alternatively, we can take \(p: \text{point3D} \in O\) as the path and \(x: \text{wr:float32} \rightarrow \text{unit} \in M[\text{point3D}]\) as the method.

\textbf{Example 2} The memory paths used by \texttt{MacOp} (see Fig. [7]) at its declaration-site, are

\[ O_{\text{MacOp}} = \{ms:\text{State}, a: \text{float64}\} \]

where \(a\) is a scalar memory cell of type \text{float64} and \(ms\) a composite data structure of type \text{State}. The access methods for the scalar \(a\) are

\[ M(\text{float64}) = \{\text{rd:unit} \rightarrow \text{float64}, \text{wr:float64} \rightarrow \text{unit}\} \]

for reading and for writing a double precision floating point number. The members of the composite structure \(ms:\text{State}\) can be modelled through methods or via memory paths. In context \(O_{\text{MacOp}}\) a method call \(ms.m1.\text{rd}()\) is composed of path \(ms:\text{State} \in O_{\text{MacOp}}\) and method \(m1.\text{rd:unit} \rightarrow \text{float64} \in M(\text{State})\). Alternatively, in context

\[ O'_{\text{MacOp}} = \{ms.m1, ms.m2, ms.\text{acc}, a\} \]
the path is \( ms.m1 \) and \( rd: \text{unit}\rightarrow \text{float64} \in M(\text{float64}) \) is the method. In both cases, we get the same set of \( q \text{methods}, i.e., Mtd[O_{\text{macOp}}] = Mtd[O_{\text{macOp}}'] = \{ ms.m1, m2 \mid m1 \in \{ m1, m2, \text{acc} \} \land m2 \in \{ \text{rd, wr} \} \}. \) Furthermore, we may also have a special method

\[
\text{mac: float64} \rightarrow \text{float64} \rightarrow \text{float64} \rightarrow \text{unit} \in M[\text{State}]
\]

for computing the MAC operation, taking as inputs the substructures of \( \text{State} \) determined by their fixed order.

A memory path can be standard simple data structure like signals, variables, buffers or composite structures like vectors or arrays. It may also encapsulate complex behaviours such as external displays, or precompiled reactive modules.

**Example 3** In the Cronos modular compiler [9] an Esterel module \( m: \text{module} \in O \) is compiled as an ADA package where \( M(\text{module}) \) contains accessor methods \( t.\text{set:unit}\rightarrow \text{unit} \) to set each input signal \( x \), methods \( y.\text{rd:unit}\rightarrow \text{bool} \) to read each output signal \( y \), partial step evaluation functions \( i.\text{eval:unit}\rightarrow \text{unit} \) for modular scheduling and methods \( \text{init, reset, run, clear all of type unit}\rightarrow \text{unit} \) for executing the module as a whole.

In general, a memory context will be organised as a tree-like name space. For the purposes of this report, it suffices to consider a factorisation of memory actions through “triple qualifications” \( o.m(v) \) consisting of memory paths \( x: t \in O \), context methods \( m \in M(t) \) and run-time parameters \( v \). We do not consider the static structure introduced through the typing of data structures and assume this has been resolved by the compiler in the upstream static program analysis. The methods \( M \) of a store are provided by the compiler and implemented externally.

Each \( q \text{method} \ o.m \in Mtd[O] \) denotes a method to update the store and return a value as a result. For simplicity, we assume each \( q \text{method} \ o.m \) takes a single input argument and returns a single output value, both taken from a fixed global domain \( \mathbb{D} \) of values.

**Definition 1 (Execution Structure)** An (execution) structure \( \Sigma = (\mathbb{S}, \bullet, \odot) \) is a domain \( \mathbb{S} \) of memory configurations, called (typed) stores together with memory actions \( \bullet \) and \( \odot \). Each \( \Sigma: O \in \mathbb{S} \) has an associated path context \( O \) defining the methods applicable to it: For each \( o.m \in Mtd[O] \) the operation \( \Sigma \bullet o.m(v) \in \mathbb{D} \) yields the return value of \( o.m \) when called with parameter value \( v \in \mathbb{D} \) and \( \Sigma \odot o.m(v): O \in \mathbb{S} \) is the updated store in the same context.

The domain of stores needs to be uniform in the sense that the semantics of a method \( o.m \) for \( o: t \in O \) only depends on the type \( t \) not on the path name \( o \). It must be is invariant under type-preserving renamings of memory paths.

**Definition 2 (Uniformity)** A function \( f: O_1 \rightarrow O_2 \) is a path map if \( f(o_1: t_1) = o_2: t_2 \) implies \( t_1 = t_2 \). We extend \( f \) to map \( q \text{methods} \) and method calls by putting \( f(o.m) = f(o).m \) and \( f(o.m(v)) = f(o).m(v) \). The structure \( \Sigma \) is (path) uniform if for every \( \Sigma_2: O_2 \in \mathbb{S} \) there exists a store \( \Sigma_2[f]: O_1 \in \mathbb{S} \) with \( \Sigma_2[f] \bullet a(v) = \Sigma_2 \bullet f(a)(v) \) and \( \Sigma_2[f] \odot a(v) = (\Sigma_2 \odot f(a)(v))[f] \) for \( a \in Mtd(O_1) \).

A uniform structure permits us to retract a store \( \Sigma_2 \) over \( O_2 \) along a path map \( f: O_1 \rightarrow O_2 \) to a store \( \Sigma_2[f] \) which emulates \( o.m(v) \) in \( O_1 \) by indirection via \( f \), executing the renamed call \( f(o).m(v) \) in \( O_2 \).

**Example 4** Consider the path context \( O = \{ x, y, z \} \) and process

\[
P = y.\text{wr}(8): x.\text{wr}(5); u:= y.\text{rd}(); (A, B)
\]
where $A = \text{if } u = 5 \text{ then } y.\text{wr}(0) \text{ and } B = z.\text{wr}(2)$. Let $\Sigma$ be a store which implements $x, y, z$ in disjoint memory cells. Then, the value that $P$ reads from $y$ is $u = 8$, whence the write to $y$ in $A$ is by-passed and $P$ only writes to $z$ in $B$. Now consider the instantiated process

$$P[f] = x.\text{wr}(8):x.\text{wr}(5):w := x.\text{rd}();((\text{if } u = 5 \text{ then } x.\text{wr}(0))|B)$$

with $f = [x/x,y/z] : O \rightarrow O$ where $y$ is substituted by $x$. Now the value read into variable $u$ is 5 and thus $x$ is overwritten by 0. The execution of $P[f]$ in a store $\Sigma$ can be simulated with $P$ in an “aliasing” store $\Sigma[f]$ in which actions on path $y$ are redirected to take place in $x$. For the path map $f' = [x/x,y,y/z] : O \rightarrow O$ the instantiation $P[f']$ creates a write-write conflict because now both thread $A$ and $B$ concurrently write to $x$. While process $P$ and $P[f]$ are schedulable and deterministic, the process $P[f']$ is not.

**Example 5** In the memory context $O'_{\text{MacOp}}$ for MacOp (Ex. 2) consider a generic process $A$ operating on state $m$s,

$$A = ms.m1.\text{wr}(1.1);ms.m2.\text{wr}(19.0);v := ms.\text{acc}.\text{rd}();\text{if } v > \text{CAP} \text{ then exit}$$

which fills fields $m1, m2$ with values 1.1, 19.0; then reads the $\text{acc}$ field, and if this value $v$ is above CAP it exits, otherwise it terminates. Let $f = [s.\text{acc}/ms.m1,s.\text{acc}/ms.m2,s.\text{acc}/ms.\text{acc}]$ be an instantiation $f : O'_{\text{MacOp}} \rightarrow O'_{\text{Accumulate}}$ that maps the formal parameter $ms$ into actual state $s$ of the $\text{accumulate}$ context, such that all cells of $ms$ are identified with $s.\text{acc}$. If $\Sigma : \Sigma_{\text{Accumulate}}$ is a store that defines some start value for state $s$, then the behaviour of the instantiated process $A[f]$ can be simulated in a store $\Sigma[f] : O'_{\text{MacOp}}$ in which all actions on paths $ms.m1, ms.m2$ and $ms.\text{acc}$ are redirected to take place in memory cell $s.\text{acc}$, respectively. Specifically, the call $ms.m1(1.1)$ writes to the same cell as $ms.m2(19.0)$, namely $s.\text{acc}$, which is the same from which the read $ms.\text{acc}.\text{rd}$ obtains the test value $v$.

The locality of methods $m1 \in M(t_1)$ and $m2 \in M(t_2)$ for non-overlapping memory paths $o_1 : t_1 \in O$ and $o_2 : t_2 \in O$ in the same path context $O$ may be captured by the requirement on the execution structure $S$ that a qmethod $o_1.m1$ cannot depend on and does not influence the behaviour of a qmethod $o_2.m2$. Where name spaces overlap, however, we must protect the memory from data races and impose some synchronisation on method calls.

**Example 6** For instance, the writing $n.\text{wr}(1)$ and reading $b.\text{rd}()$ on distinct scalar memory cells $n:\text{int}$ and $b:\text{bool}$ are independent and can be executed without synchronisation. The same applies to the writing $p.x.\text{wr}(20)$ and $p.y.\text{wr}(30)$ of distinct coordinates of a point $p$. However, the reading $n.\text{rd}()$ and writing $n.\text{wr}(1)$ of the same path $n$ conflict in stores in which cell $n$ has a value different from 1. In synchronous programming such data races are eliminated by prescribing a write-before-read precedence of method $o1.\text{wr}$ over $o2.\text{rd}$ if $o1$ and $o2$ are overlapping memory paths.

**Example 7** In the context $O'_{\text{MacOp}}$ consider the process

$$B = u := ms.m1.\text{rd}();ms.\text{acc}.\text{wr}(2.0u);\text{run MacOp}(ms,ms.\text{acc})$$

which overwrites the accumulator cell $ms.\text{acc}$ with the doubled value of $ms.m1$ and then calls $\text{MacOp}$ to perform a single multiply-accumulate on state $ms$. In the parallel composition $A|B$ with $A$ from Ex. 5 we now have shared accesses on $ms$. The data races are avoided by prescribing a write-before-read precedence for overlapping memory paths $o1, o2 \in O'_{\text{MacOp}}$.
This means that A is first permitted to write \( ms.m1.wr(1.1) \) before B can read \( ms.m1.rd() \). The subsequent write \( ms.m2.wr(19.0) \) in A and the write-back \( ms.acc.wr(20.0u) \) of B can go unsynchronised, assuming that \( ms.m1 \) and \( ms.acc \) are distinct cells. Finally, the reading of \( ms.acc.rd \) by A must wait until MacOp has finished writing its output \( ms.acc \).

Every store must be protected by a scheduling policy which implements a set of causal precedences on the memory accesses. The semantics of SCPL (Sec. VI) will execute the actions of a program in line with these precedences. We use a simple algebra of \textit{causal precedence graphs} which express state-less synchronisation protocols. These eliminate the data dependence from the notion of \textit{policies} as defined in [14]. This simplification makes policy-conformant schedulability a property of the static program syntax. Though this restricts the class of constructive programs it suffices for practical purposes.

**Definition 3 (P-policy)** A precedence policy (p-policy) over O is a subset \( \pi \subseteq Mtd(O) \) of admissible methods equipped with a binary precedence relation \( \ldots \subseteq \pi \times \pi \). If \( a, b \in \pi \) and \( a \ni b \), we write \( a \prec b \) if \( a \notin b \). If \( a \equiv b \) or \( b \equiv a \), we say \( a \) and \( b \) are in conflict, written \( a \ni b \). If \( a, b \in \pi \) are admissible, yet \( a \nmid b \) and \( b \nmid a \), then \( a \) and \( b \) are called concurrently independent, written \( a \ni b \).

A precedence \( a \ni b \) enforces the scheduling constraint that if in some execution action \( a \) and \( b \) happen together, then \( b \) is scheduled strictly after \( a \). If \( a \nmid b \) then \( a \) and \( b \) may be executed concurrently in any order. On the other hand, if \( a \nmid b \) then \( a \) and \( b \) must run sequentially in the same thread. More generally, the semantics of \( a \nmid b \) is that the run-time ordering between \( a \) and \( b \) is computed by the program itself, possibly causally depending on input data. In contrast, \( a \ni b \) permits the order of execution to be decided by the compiler or the run-time scheduler.

**Example 8** Consider a context with two distinct scalar dataflow path \( O = \{x, y : t\} \) with read and write methods \( Mtd(t) = \{rd, wr\} \). The memory is specified by a p-policy \( var \) with full admissibility \( \pi = Mtd(O) \) but restricted scheduling order. To capture the “write-before-read” dataflow causality, \( var \) must contain the precedences \( var \ni \omega.wr \ldots \omega.rd \) for each \( \sigma \in \{x, y\} \).

For instance, in the parallel composition \( x := 5 \parallel y := x \) the assignment \( x := 5 \) with its implicit \( x.wr \) access is executed before the assignment \( y := x \) which contains a read access \( x.rd \). Obviously, the causal precedence \( var \ni x.wr \ldots x.rd \) eliminates any write-read data races. Write-write races are eliminated by the (symmetric) causal precedence \( var \ni \omega.wr \ldots \omega.wr \) which forbids concurrent threads to write to the same path. Thus, a program like \( x := 5 \parallel x := 7 \) becomes unschedulable, and thus is rejected as non-causal. Note, in line with sequential constructiveness [13] the precedence constraint only applies to concurrent accesses. If the accesses are already resolved in sequential program order, as in \( y := x ; x := 5 \) or \( y := x \parallel x := 5 \), then the read \( x.rd \) takes place before the write \( x.wr \). Similarly, sequentially ordered writes as in \( x := 5 ; x := 7 \) or \( x := 5 \parallel x := 7 \) are ok.

Finally, note that \( var \) does not contain precedences between accesses to \( x \) and \( y \), because these are separated in memory and thus \( var \ni x.m1 \ni y.m2 \) for all methods \( m \in \{rd, wr\} \).

**Example 9** If composite memory structures \( \sigma : t \in O \) (e.g. Quartz [2] signals) behave like synchronous registers, they support the four methods \( M(t) = \{\text{pre}, wr, rd, nxt\} \) where \( \sigma.pre \) extracts the value from the previous instant, \( \sigma.nxt \) writes the value for the next instant, while \( \sigma.rd \) and \( \sigma.wr \) are the read and write access to the value computed in the current macro-step. In the high-level syntax of Blech, composite memory structures are instantiated by
declarations \([\text{var}] \; \text{let} \; o : t = e\), where \(t\) a structure type and \(e\) an expression determining a composite (deep) initialisation value. A memory path \(o\) tagged with \text{var} is mutable and if tagged by \text{let} is immutable. In the latter case, the structure is protected so the write methods \(o.\text{wr}\) and \(o.\text{nxt}\) are not permitted. Hence \(M(\text{let} \; t) = \{\text{pre, rd}\}\) if \(o\) is immutable and \(M(\text{var} \; t) = \{\text{pre, rd, wr, nxt}\}\) if \(o\) is mutable. The associated p-policy \(\text{reg}\) has all \(\text{Mtd}[O]\) admissible, implements the “write-before-read” causality on the current value of the structure and eliminates write-write races. Formally, for all \(o_1, o_2 \in O\), if \(o_1\) and \(o_2\) overlap, \(\text{reg} \mid\mid o_1.\text{wr} \dashv o_2.\text{wr}, \text{reg} \mid\mid o_1.\text{wr} \dashv o_2.\text{rd} \text{reg} \mid\mid o_1.\text{nxt} \dashv o_2.\text{nxt}\). The reading of the previous value is independent from any other access, i.e., \(\text{reg} \mid\mid o_1.\text{pre} \diamond o_2.m_2\) and also \(\text{reg} \mid\mid o_1.m_1 \circ o_2.m_2\) for all non-overlapping paths \(o_1\) and \(o_2\).

Example 10 For a Cronos module \(m\) as described in Ex. \(\S\) we build a p-policy \(\pi_m\) to express the internal coupling of methods as side-effects on the state, in order to protect the module from being used incoherently.

\[
\begin{align*}
\pi_m & \quad \text{reset} \\
\text{init} & \quad \text{clear} \\
& \quad \text{x.set} \quad \text{run} \quad \text{y.rd}
\end{align*}
\]

The initialisation \text{init} must be called before anything else. Then, \text{clear, reset to reset inputs and registers can be made. In the third phase, the signal emissions \text{x.set} on inputs are scheduled, and only then the step function \text{run} can go ahead. When the step function is done, the output signals can be read with methods \text{y.rd}. The precedence graph constrains concurrent accesses from producing memory glitches. E.g., the \text{run} method cannot be called concurrently, because it modifies the internal state in arbitrary ways. This is prohibited by the self-loop in the precedence relation. All other method, without self-loops, can be called several times concurrently, because their effect is idempotent. This is specifically important for the setting of a signal \text{x.set} which may be called by several concurrent threads, in general.

The policy in Ex. \([\S]\) may be considered an extended from of the \text{init-update-read} protocol of sequentially constructive (SC) variables introduced in \([\S]\) and used in SCCharts \([\S]\). Specifically, an SC variable \(s\) (of any data type) supports the method \(s.\text{ini}\) of initialisation (called “absolute” writes), update \(s.\text{upd}\) (called “relative” writes) and reading \(s.\text{rd}\). The associated init-update-read p-policy \(\text{iur}\) has the precedences such that \(\text{iur} \mid\mid \text{ini} \dashv \text{ini}, \text{iur} \mid\mid \text{ini} \dashv \text{upd}, \text{iur} \mid\mid \text{ini} \dashv \text{rd}\). Hence, initialisations can only be performed by a single thread, updates \text{upd} must wait for initialisation but can be concurrent, and finally reads \text{rd}, too, can be concurrent but must wait for both \text{ini} and \text{upd}. The \text{iur} scheduling regime is a special case of a scheduling directive \([\S]\) which in turn is a special stateless case of a general policy \([\S]\).

The purpose of a p-policy is to ensure that all conformant schedules generate a deterministic response on the store in all instants. For this to be true, the store must be coherent for the p-policy.

Definition 4 (Coherence) A pair \(C = (O, \pi)\) consisting of a context \(O\) and a p-policy \(\pi\) is called a (memory) interface. A store \(\Sigma : O \in S\) is \(C\)-coherent if all method calls concurrently independent under \(\pi\) commute with each other. Formally, let \(a, b \in \text{Mtd}[O]\) with \(\pi \mid\mid a \circ b\). Then, for all \(u, v \in D\):

1) \(\Sigma \circ a(u) = (\Sigma \circ b(v)) \circ a(u)\)
2) \(\Sigma \circ a(u) \circ b(v) = \Sigma \circ b(v) \circ a(u)\).
Moreover, for all \( a \in \text{Mtd}[O] \) and \( v \in \mathbb{D} \) the updated store \( \Sigma \cup a(u) : O \in \mathbb{S} \) must remain \( C \)-coherent.

Def. [4] is an extension of [14]. Condition (1) guarantees that the value returned by a method call \( a(u) \) does not depend on whether it is executed before or after a concurrently independent call \( b(v) \). Condition (2) expresses that final store does not depend on the order in which both calls are conducted.

Example 11 The standard implementation of atomic read and write on a memory cell is coherent for the \( p \)-policy \( \mathit{var} \). To implement \( \mathit{reg} \) coherently we need three memory cells, viz. to store the previous, current and the next value. This is expensive yet permits many causal programs. A tighter \( p \)-policy could add extra precedences \( \mathit{reg} \vdash \mathit{pre} \leadsto \mathit{wr} \), \( \mathit{reg} \vdash \mathit{pre} \leadsto \mathit{nxt} \) and \( \mathit{reg} \vdash \mathit{rd} \leadsto \mathit{nxt} \) for overlapping \( \mathit{rd} \)s. This interface can be implemented coherently with a single memory cell. However, the extended policy \( \mathit{reg} \) is less concurrent and may leave some programs non-constructive. If a program, written under the \( \mathit{reg} \) memory interface, turns out to be constructive for \( \mathit{reg} \), then the registers can be optimised and implemented under \( \mathit{var} \), by identifying the method \( \mathit{nxt} \) with \( \mathit{wr} \) and \( \mathit{pre} \) with \( \mathit{rd} \).

Definition 5 (Interface Extension) An extension \( f : C_1 \subseteq C_2 \) of interfaces \( C_1 = (O_1, \pi_1) \) is a path map \( f : O_1 \rightarrow O_2 \) such that if \( a, b \in \pi_1 \) then \( f(a), f(b) \in \pi_2 \) and if \( \pi_2 \vdash f(a) \leadsto f(b) \) then \( \pi_1 \vdash a \leadsto b \).

If the extension is an inclusion \( f : O_1 \subseteq O_2 \) of paths, then we drop \( f \) and write \( C_1 \subseteq C_2 \), or simply \( \pi_1 \subseteq \pi_2 \). In this case \( \pi_1 \) enforces a more restrictive scheduling on a subset of the qmethods from \( \pi_2 \), without any renaming of paths.

Example 12 The discrete \( p \)-policy \( \mathit{top} \) contains all method calls \( \text{Mtd}[O] \) without any \( \leadsto \) edges. It is the maximum element, i.e., \( \pi \subseteq \mathit{top} \) for every \( \pi \). The \( p \)-policy \( \mathit{top} \) permits unconstrained concurrent access to the store. The empty \( p \)-policy \( \mathit{bot} = \emptyset \) admits no actions and represents a locked store, disallowing any memory access. It is the minimum element satisfying \( \mathit{bot} \subseteq \pi \) for all \( \pi \). The indiscrete \( p \)-policy \( \mathit{seq} \) is the fully connected \( \leadsto \) relation on \( \text{Mtd}[O] \). It specifies a store that can only be accessed by a single thread. The set of \( p \)-policies \( \pi \subseteq \mathit{seq} \) is isomorphic to the set of (fully connected) subsets of \( \text{Mtd}(\mathcal{C}) \). We have \( \mathit{bot} \subseteq \mathit{var} \subseteq \mathit{seq} \) and \( \mathit{bot} \subseteq \mathit{seq} \subseteq \mathit{reg} \subseteq \mathit{top} \).

Procedures are abstractions of control flow to manipulate the store. They encapsulate program modules generic in their memory context which is abstracted through context parameters. An procedure is accessed through its name and instantiated to a concrete memory context like a module in Esterel or a node in Lustre. In contrast to Esterel or Lustre, however, the memory that can be shared through concurrent procedures can be of general type and is not restricted to signals, buffers or data flow variables. In order to achieve modularisation, we do not depend on the source code of an procedure to be available. For our evaluation semantics, an procedure is a black-box that is precompiled like a host procedure in Esterel. However, in contrast to Esterel, an procedure has an associated causality interface.

Each generic procedure \( p \) comes with a memory interface \( C_p = (O_p, \pi_p) \) in which \( \pi_p \) exports the admissible qmethods through which the memory is potentially accessed and how these are causally dependent on each other. Interface extensions (Def. [5]) is the key mechanism to instantiate procedures (see below). As an interface, the \( p \)-policy \( \pi_p \) embodies an assumption-guarantee contract between the external use context and the internal behaviour.
of the procedure. Outside at the call site, the causal type implies a coherence constraint on the store for the procedure to be safely instantiated. Inside of the procedure, the causal type constrains the scheduling of memory accesses by the encapsulated implementation code. For instance, if \( \pi_p \vdash b \ldots c \), then the implementation must guarantee that any concurrent call to \( c \in Mtd[O_p] \) happens after any concurrent call to \( b \in Mtd[O_p] \). The store then does not need to be protected against any data races arising from the \( b \) and \( c \) types of actions. On the other hand, if \( \pi_p \vdash b \diamond c \), then there is no precedence information about how the procedure will execute the method calls. Because of the potential race situation, this amounts to the coherence assumption on the store that the order of execution of \( b \) and \( c \) does not matter. The store must protect the accesses in order to stay coherent, in which case the implementation is free to schedule the accesses in any way it wants.

**Example 13** Every activity in Blech is a SCPL procedure. Consider the activity `add` as seen in Fig. 2. It consists of two threads. The first thread waits on variable \( b \) and terminates as soon as it has a value greater than 10. The second thread concurrently reads \( b \) and variable \( a \) and writes the sum of their values into an output variable \( s \). The parameters

```markdown
activity add(a: int32, b: int32)(s: int32)
cobegin
    await b > 10
    with weak
    repeat
        s = a + b
        await true
    end
end
```

in the Blech interface by default are var type memory paths, so the available methods are

\[
Mtd[O_{add}] = \{a.rd, a.wr, b.rd, b.wr, s.rd, s.wr\}. 
\]

The split parameter lists of the Blech interface however indicate that \( a, b \) are read-only, while \( s \) is a mutable output. Hence the p-policy is

\[
\pi_{add} = \{a.rd, b.rd, s.wr \ldots s.wr, s.wr \ldots s.rd\} \subseteq \text{var} \text{ which guarantees that the add is not reading } a \text{ or } b \text{ at all, possibly writing } s, \text{ but never concurrently, and also writes } s \text{ before any concurrent reading of } a. \text{ A tighter p-policy, exposing the fact that add is not reading } s \text{ at all, and only writing it in the same thread in which it reads } a \text{ is } \pi_{add}^s = \{b.rd, a.rd \ldots s.wr, s.wr \ldots s.wr, s.wr \ldots a.rd\} \subseteq \pi_{add}. \text{ Also, note that } \pi_{add} \text{ and thus } \pi_{add}^s \text{ are extensions of each of the } \text{var p-policies for variables } a, b, \text{ and } s \text{ individually.}

**Example 14** Reconsider the activity `Accumulate` from Fig. 4. It consists of three threads. The first thread adapts the inputs from arr to the factors \( m1 \) and \( m2 \) of ms. The third thread waits on path ms.acc and terminates its value is greater than CAP. The second thread concurrently reads \( b \) and variable \( a \) and writes the sum of their values into an output variable \( s \). The parameters in the Blech interface by default are var type memory paths, so the available methods are

\[
Mtd[O_{add}] = \{a.rd, a.wr, b.rd, b.wr, s.rd, s.wr\}. 
\]

The split parameter lists of the Blech interface however indicate that \( a, b \) are read-only, while \( s \) is a mutable output. Hence the p-policy is

\[
\pi_{add} = \{a.rd, b.rd, s.wr \ldots s.wr, s.wr \ldots s.rd\} \subseteq \text{var} \text{ which guarantees that the add is not writing } a \text{ or } b \text{ at all, possibly writing } s, \text{ but never concurrently, and also writes } s \text{ before any concurrent reading of } a. \text{ A tighter p-policy, exposing the fact that add is not reading } b \text{ at all, and only writing it in the same thread in which it reads } a \text{ is }
\[ \pi_{add} = \{ b.\text{rd}, a.\text{rd} \rightarrow s.\text{wr}, s.\text{wr} \rightarrow s.\text{wr}, s.\text{wr} \rightarrow a.\text{rd} \} \subseteq \pi_{add}. \] Also, note that \( \pi_{add} \) and thus \( \pi_{add}^* \) are extensions of each of the \( \text{var} \) \( \pi \)-policies for variables \( a, b, \) and \( s \) individually.

A procedure call \( \text{run } p[f] \) takes a procedure \( p \) with interface \( C_p \) and an interface extension \( f : C_p \subseteq C \) to evaluate its (precompiled) behaviour in store \( \Sigma[f] \) for declaration context \( C_p \) for a store \( \Sigma \) of the call context \( C \). The conditions of Def. 5 make sure that the procedure call is memory safe, even if the renaming \( f \) aliases memory paths.

**Example 15** Consider contexts \( O_{\text{Accumulate}} \) and \( O_{\text{MacOp}} \) from Fig. 7 such that

\[
O_{\text{Accumulate}} = \{ \text{arr}: [2]\text{float64}, s:\text{State} \}
\]
\[
O_{\text{MacOp}} = \{ ms:\text{State}, a:\text{float64} \}
\]

with admissible methods

\[
M([2]\text{float64}) = \{ [0].\text{rd}, [0].\text{wr}, [0].\text{rd}, [1].\text{wr} \}
\]
\[
M(\text{State}) = \{ m1.\text{rd}, m1.\text{wr}, m2.\text{rd}, m2.\text{wr}, \text{acc.rd}, \text{acc.wr} \}
\]
\[
M(\text{float64}) = \{ r.d, w.r \}.
\]

The split parameter lists of \( \text{Accumulate} \text{(arr: [2]float64)(s:State)} \) indicate that \( \text{arr} \) is read-only, while \( s \) is mutable. The \( \pi \)-policies implied by this procedure header are given by

\[
\pi_{\text{Accumulate}} = \{ \text{arr}[0].\text{rd}, \text{arr}[1].\text{rd}, \text{s.m.wr} \rightarrow \text{s.m.wr}, \text{s.m.wr} \rightarrow \text{s.m.rd} \mid m \in \{ m1, m2, \text{acc} \} \}
\]
\[
\pi_{\text{MacOp}} = \{ ms.m.\text{rd}, a.\text{wr} \rightarrow a.\text{wr}, a.\text{wr} \rightarrow a.\text{rd} \mid m \in \{ m1, m2, \text{acc} \} \}
\]

The definition of \( \pi_{\text{MacOp}} \) implements the interface for a procedure header \( \text{MacOp} \text{(ms)(a)} \) and ignores the shares annotation in the actual procedure declaration \( \text{MacOp} \text{(ms)(a shares ms.acc) of Fig. 7} \) for the moment. Hence \( \pi_{\text{MacOp}} \) expresses the concurrent independence

\[ a.\text{wr} \circ \text{ms.acc.rd}. \]

This turns invalid through the instantiation \( \text{run MacOp(s)(s.acc)} \) in \( \text{Accumulate} \) with the path map \( f = [s/\text{ms}, \text{s.acc}/a] \) which aliases the formal procedure parameters \( \text{ms} \) and \( a \) to point to overlapping memory cells \( f(\text{ms}) = s \) and \( f(a) = s.\text{acc} \). Because of this, the path map \( f \) is not an extension from the interface \( C_{\text{MacOp}} = (O_{\text{MacOp}}, \pi_{\text{MacOp}}) \) under which \( \text{MacOp} \) is declared and implemented into the interface \( C_{\text{Accumulate}} = (O_{\text{Accumulate}}, \pi_{\text{Accumulate}}) \) under which the call is executed. In fact, \( f : C_{\text{MacOp}} \nsubseteq C_{\text{Accumulate}} \) because \( \pi_{\text{Accumulate}} \vdash f(\text{a.wr}) \rightarrow f(\text{ms.acc.rd}) \) but \( \pi_{\text{MacOp}} \nvdash a.\text{wr} \rightarrow \text{ms.acc.rd} \), considering that \( f(a.\text{wr}) = s.\text{acc.wr} \) and \( f(\text{ms.acc.rd}) = s.\text{acc.rd} \). The situation is resolved if we now take into account the sharing ‘\( \text{a shares ms.acc} \)’. This enforces the missing precedence \( \pi_{\text{MacOp}} \vdash a.\text{wr} \rightarrow \text{ms.acc.rd} \) in the procedure’s interface. Thus, \( f \) becomes an interface extension, at the cost that \( \text{MacOp} \) must internally synchronise accesses to \( a \) and \( \text{ms.acc} \).

**Example 16** The generic interface \( \pi_{\text{add}} \) of the procedure \( \text{add} \) of Ex. 4 captures the coherence assumptions on the store for generic formal parameters \( a, b \) and \( s \). We do not know how they are related, since these will be determined only at the call site of \( \text{add} \). In particular, the concurrent independence \( \pi_{\text{add}} \vdash b.\text{rd} \circ s.\text{wr} \) turns invalid if both parameters get aliased through the instantiation. For instance, suppose we instantiate add in a call \( i\text{add} = \text{run add(a,c,c)}(c) \) where \( a \) and \( b \) are the same memory cell \( c \). The corresponding renaming \( f = [a/a, c/b, c/s] \) is not a extension \( f : (O, \pi_{\text{add}}) \subseteq (O, \text{var}) \) from the interface under which \( \text{add} \) is declared.
and implemented (p-policy \( \pi_{\text{add}} \)) into the interface under which the call is executed (p-policy \( \varpi \)). This is because \( \varpi \models c_{\text{wr}} \leadsto c_{\text{rd}} \) but \( \pi_{\text{add}} \not\models s_{\text{wr}} \leadsto b_{\text{rd}} \), although \( f(s_{\text{wr}}) = c_{\text{wr}} \) and \( f(b_{\text{rd}}) = c_{\text{rd}} \). This has practical consequences for modular compilation: If the code for \( \text{add} \) is pre-compiled, rather than inline, the method call \( \text{add}(a, c)(c) \) is unsafe and must be rejected. The generic interface \( \pi_{\text{add}} \) of \( \text{add}(a, b)(s) \) does not constrain the order in the reading of \( b \) and the writing of \( s \). This will not matter, provided the memory referred to by \( b \) and \( s \) are separated. However, when \( b \) and \( s \) are instantiated with memory that are overlapping, then the ordering matters. Then the call \( \text{add}(a, c)(c) \) may produce unpredictable results with externally precompiled scheduling. Suppose on the other hand, the concurrent body of \( \text{add}(a, c)(c) \) is inline and we are scheduling the code as white-box in the use context. Then, the (in this case unique) schedule will be chosen that satisfies the “write-before-read” policy \( \varpi \) on cells \( a \) and \( c \).

There are three ways for an unsafe procedure call \( \text{run } p[f] \) to become memory safe. First, as in standard synchronous languages, such as Esterel, the procedure is inline as white-box. Second, the calling context changes to another store, which is coherent for a more relaxed p-policy. Third, the implementation of \( p \) is forced to avoid potential memory glitches with a more restrictive p-policy like adding a sharing constraint.

**Example 17** For the instantiation \( \text{run } \text{add}(a, c)(c) \) in Ex. [6] to remain conformant with \( \varpi \) we would need an inter-path precedence \( s_{\text{wr}} \leadsto b_{\text{rd}} \) for the body of \( \text{add} \), i.e., make sure that the read of cell \( b \) is guaranteed to see the effect of writing \( s \). The code above in Fig. 4 does not achieve that, obviously. For contrast, consider the modified code \( \text{add1} \) in Fig. 5 which implements a similar (but not the exactly same) behaviour as \( \text{add} \) in a single thread.

```bleech
activity add1(a: int32, b:

repeat
   s = a + b
   await true
end
```

Fig. 5. Single-threaded Bleech activity add1. There are no concurrent accesses so that the procedure interface \( \pi_{\text{add1}} \) can expose the constraint \( \pi_{\text{add1}} \models s_{\text{wr}} \leadsto s_{\text{rd}} \). In the Bleech code of Fig. 5 this is expressed via the sharing constraint ‘\( s \text{ shares } a, b \)’ in the interface of \( \text{add1} \). Now the instantiated activity \( \text{add1} = \text{run } \text{add1}(a, c)(c) \) has \( \pi_{\text{add1}} \models s_{\text{wr}} \leadsto s_{\text{rd}} \). Hence, we have conformance of \( \pi_{\text{add1}} \) with the policy of cell \( s \), despite the aliasing. And of course, the instantiation \( \text{add1} \) is thread safe, for trivial reasons. The activity \( \text{add1} \) exports sharing information in its formal causality interface. The bi-directional precedences in the generic interface expose the essential single-threaded nature of the implementation code. After the aliasing instantiation as a method call all read accesses are protected by the causality order.

**VI. Operational Semantics for SCPL**

A program expression is called closed, or a process, if it does not contain free value variables. Otherwise, it is open. For instance, \( x_{\text{wr}}(x+5) \) is open whereas \( \text{let } x = y . \text{rd}() \text{ in } x_{\text{wr}}(x+5) \) is closed. Memory paths, like \( x, y \) in the previous example, are always free since we do
not consider local memory here. These free path names point to the global store. A process \( P \) is \emph{well-formed} for \( C \) if for all method calls \( o.m(e) \) in \( P \) we have \( o.m \in \text{Mtd}[O] \) and for all procedure calls \( \text{run} \ p[f] \) in \( P \) the renaming is an interface extension \( f : C_p \subseteq C \). Our semantics defines the evaluation of well-formed, closed programs for a fixed memory interface \( C = (O, \pi) \) in \( C \)-coherent stores of the execution structure \( S = (S, \bullet, \mathcal{O}) \).

A procedure \( \text{proc} \ p(x) = P \) is compiled as a stateful partial action on stores that may complete by \text{terminating} or \text{pausing}. In the former case, the action \( p \) is finished and relinquishes control. In the latter case, \( p \) synchronises with the clock and waits to execute another macro-step. Like for Blech activities, we do not permit procedures to raise trap exceptions (exit 2 completion). Semantically, the procedure body \( P \) is compiled as a set of \emph{control states} \( S_p \), \emph{initial state} \( p.\text{init} \in S_p \) and a generic family of \emph{step functions} \( p.\text{step}[f] \)\( : S \times S_p \rightarrow S \times S_p \) parameterised in interface extensions \( f : C_p \subseteq C \) mapping the behaviour of \( p \) into the calling context \( S \) and \( C \). In addition, there is a \emph{completion predicate} \( s \downarrow_p k \) for \( k \in \{0, 1\} \), indicating termination with \( k = 0 \) and pausing \( k = 1 \), as well as an operation \( p.\text{tick} : S_p \rightarrow S_p \) that advances a paused state \( s \) to the initial state \( p.\text{tick}(s) \) of the next macro-step.

Our semantics models the execution of a process as the transitive closure of sequential thread evaluations. Each evaluation is a sequence of micro-steps, reminiscent of the “big-step” semantics of the \( \lambda \)-calculus. Formally, an instantaneous \emph{sequential reduction step}, called an \emph{sstep}

\[
\Sigma; \Pi \vdash P \Rightarrow \Sigma'; \Pi' \vdash P'
\]

is an inductive relation specified by the set of structural operational rules in Fig. ???. The relation \( (1) \), which preserves closedness and well-formedness, reduces \( P \) in the environment of a store \( \Sigma \), resulting in an updated store \( \Sigma' \) and a residual process \( P' \). Like in other synchronous constructive semantics \([7], [14], [21]\), the evaluation environment contains a \emph{potential} \( \Pi = \text{can}_s(E) \subseteq \text{Mtd}(C) \) which over-approximates the set of memory accesses pending in the concurrent environment \( E \) in which the evaluation of thread \( P \) is taking place. This context is used to block \( P \) from accessing memory cells that might be changed by the environment. The function \( \text{can}_s(E) \) is explained in the Appendix A. It is defined along very similar lines as in \([14], [21], [22]\).

All the memory accesses of an sstep stem from the main thread \( P \) and its descendants in strict program order. It runs the main thread along an arbitrary number of memory accesses. Since this does not include any context switches we call \( (1) \) a “sequential” reduction step. For communication between any concurrent child threads active in \( P \), several sssteps must be chained up. The definition of an sstep, as given below, is somewhat more complex in comparison to more standard small-step or big-step semantics. It is however technically expedient, because it nicely separates the sequential steps of a thread from the concurrency control of the scheduler. The reduction \( (1) \) may be thought of as a multi-step evaluation of expression \( P \), similar to a sequence of \( \beta \)-reductions in the \( \lambda \)-calculus. Like \( \beta \)-reduction, our evaluation reduction \( (1) \) is confluent and thus defines a unique outcome.

When \( P \) is a purely sequential process, i.e., it does not contain any concurrency operators (\( |, | \)) and it executes on its own, then \( \Pi = \emptyset \) and the sstep reduction \( (1) \) will run \( P \) to completion, i.e., until it terminates, pauses or exits. Thus, for sequential processes without other concurrent threads competing for memory, the reduction \( (1) \) behaves like a big-step evaluation for a single macro-step. In general, where \( P \) is concurrent or runs in a non-empty
environment $\Pi \neq \emptyset$, the continuation $P'$ in (1) either has completed or is waiting at a method or procedure call. In the latter case, if the reduction is maximal, then each active thread in $P'$ is blocked by the potential $\Pi$. For instance, a read $x.rd$ in $P$ must wait for all concurrent writes to the same cell $x$ to have taken place, before it can go ahead. Thus, if the potential indicates a pending write, $x.wr \in \Pi$, the evaluation step (1) will block at the read.

**Definition 6 (Completed Process)** A process $P$ is $k$-complete for $k \in \{0, 2\}$ if $P = \text{exit } k$. A process $P$ is 1-complete if one of the following holds:

- $P = \text{exit } 1$
- $P = Q ; R$ and $Q$ is 1-complete
- $P = Q \star R$ with $\star \in \{\}, |$ and both $Q$ and $R$ are 1-complete.

We write $P \downarrow k$ if $P$ is $k$-complete and $P \downarrow \bot$, otherwise.

The completed processes are the abstract values or normal forms of our evaluation system. The iteration of step (1) performs an evaluation of the surface behaviour of a process, i.e., until it becomes completed. If the residual is 1-completed, the process has paused, and we can apply a tick operator to activate the next macro-step, called its depth behaviour.

**Definition 7 (Tick Function)** For 1-completed processes the (syntactic) tick function steps the process to the next tick:

$$
\text{tick}(\text{exit } 1) = \text{exit } 0 \\
\text{tick}(Q ; R) = \text{tick}(Q) ; R \\
\text{tick}(Q \mid R) = \text{tick}(Q) \mid \text{tick}(R) \\
\text{tick}(Q \uparrow R) = \text{tick}(Q) \uparrow \text{tick}(R).
$$

We now discuss the SCPL reduction rules for the inductive relation (1) in two charges as seen in Fig. 6 and Fig. 7. All the rules are defined for an implicit interface $C = (O, \pi)$ and execution structure $S = (\mathbb{S}, \bullet, \odot)$. The statement $P \downarrow K$ abbreviates the condition $\exists k \in K. P \downarrow k$, extending somewhat the notation of Def. 6 for conciseness of presentation.

**Expressions:** We assume the evaluation of expressions is side-effect free and does not involve any memory accesses. Therefore, the evaluation of sub-expressions can be done in any order without causing potential non-determinism. Other than that, we do not restrict the language of expressions, merely assume the existence of an evaluation relation $\text{eval}(e) \in \mathbb{D}$ for closed expressions defined in some external fashion.

**Completion & Branching:** The statements $\text{exit } k$ immediately complete. The evaluation of a conditional statement $\text{if } e \text{ then } P \text{ else } Q$ is standard. By construction, the expression $e$ is closed, so that its value $\text{eval}(e)$ can be determined instantaneously without memory synchronisation.

**Sequencing:** The sequential composition $P ; Q$ first executes the micro-steps of $P$ until it completes. If $P$ pauses then control resumes in $P$ for the next instant. If $P$ terminates then control passes instantaneously to $Q$ within the current instant. If $P$ exits with completion $k = 2$, then the sequential statement also exits at the same level. The rule Seq$_1$ handles the cases where we wait or pause in $P$, while Seq$_2$ is used when $P$ terminates and Seq$_3$ takes effect when $P$ exits at $k = 2$. 

19
\[ eval(e) = T \quad \Sigma; \Pi \vdash P \Rightarrow \Sigma' \vdash P' \quad \text{Cond}_1 \]
\[ eval(e) = F \quad \Sigma; \Pi \vdash P \Rightarrow \Sigma' \vdash P' \quad \text{Cond}_2 \]
\[ \Sigma; \Pi \vdash \text{if} \; e \; \text{then} \; P \; \text{else} \; Q \Rightarrow \Sigma' \vdash P' \quad \text{Cmp} \]
\[ \Sigma; \Pi \vdash \text{else} \; P \Rightarrow \Sigma' \vdash P' \quad \text{Cmp} \]
\[ \Sigma; \Pi \vdash \text{exit} \; k \quad \text{Cmp} \]
\[ \Sigma; \Pi \vdash P \Rightarrow \Sigma' \vdash \text{exit} \; 0 \quad \Sigma; \Pi \vdash Q \Rightarrow \Sigma'' \vdash Q' \quad \text{Seq}_1 \]
\[ \Sigma; \Pi \vdash P; Q \Rightarrow \Sigma'' \vdash Q \quad \text{Seq}_2 \]
\[ \Sigma; \Pi \vdash P \Rightarrow \Sigma' \vdash \text{exit} \; 2 \quad \text{Seq}_3 \]
\[ \Sigma; \Pi \vdash \text{loop} \; P \Rightarrow \Sigma' \vdash \text{exit} \; 2 \quad \text{Rep}_1 \]
\[ \Sigma; \Pi \vdash \text{loop} \; P \Rightarrow \Sigma' \vdash \text{exit} \; 0 \quad \Sigma; \Pi \vdash \text{loop} \; P \Rightarrow \Sigma'' \vdash P'' \quad \text{Rep}_2 \]
\[ \Sigma; \Pi \vdash \text{trap} \; P \Rightarrow \Sigma' \vdash \text{trap} \; P' \quad \text{Trp}_1 \]
\[ \Sigma; \Pi \vdash \text{trap} \; P \Rightarrow \Sigma' \vdash \text{exit} \; 0 \quad \text{Trp}_2 \]

Fig. 6. SCPL reduction rules for conditionals if then else, completion statements exit, sequential composition \( P ; Q \), iteration loop and trap exceptions.

**Loops:** The reductions of the loop construct \( \text{loop} \; P \) is governed by the rules \( \text{Rep}_1 \)–\( \text{Rep}_2 \). They implement the idea that \( \text{loop} \; P \) should behave like an infinite repetition of the loop body \( P \), i.e., be equivalent to \( P ; \text{loop} \; P \; \text{end} \). This is seen in rule \( \text{Rep}_1 \) which runs \( P \) until it blocks or pauses with \( P'' \Downarrow \{\bot, 1\} \) where it yields as \( P'' ; \text{loop} \; P \; \text{end} \) to pass control to the environment. In case \( P \) terminates as \( P'' = \text{nothing} \) we immediately continue to repeat \( \text{loop} \; P \) with rule \( \text{Rep}_3 \). This is necessary because our sstep reduction runs each thread to completion. Analogously, when \( P \) completes as \( P'' = \text{exit} \) we use rule \( \text{Rep}_2 \) to pass the exit up to an enclosing trap, without interruption of the sstep.

**Traps:** The combination of statements \( \text{exit} \; 2 \) and \( \text{trap} \; P \) provide a simple form of one-level break point, corresponding to the trap completion code 2 in Esterel. When a process executes \( \text{exit} \; 2 \) it leaves the current program scope and directly passes control to an enclosing trap handler, where the exit is transformed into a normal termination to continue the sequential control flow. Similar to the \( \text{loop} \) we have a rule \( \text{Trp}_1 \) which reduces the process \( P \) in a trap \( \text{trap} \; P \) until is blocks or pauses, preserving the trap handler. When \( P \) terminates or exits, the rule \( \text{Trp}_2 \) makes the trap handler terminate.

**Sequential Parallel OR:** The operator \( P \Downarrow Q \) is a parallel with built-in abortion sequentialised from left to right. We first run \( P \) to completion, disregarding any dataflow constraints between \( P \) and \( Q \). Once \( P \) completes there are two possibilities depending on the completion code. If \( P \) terminates, then \( Q \) is strongly aborted and the construct terminates and passes control sequentially to the downstream process. This is the strong abort behaviour of the construct. If however \( P \) pauses in state \( P' \), then control passes instantaneously to \( Q \) which is permitted to complete the tick. When \( Q \) terminates, the preemption construct terminates as well, weakly aborting \( P \). If \( Q \) pauses in state \( Q' \) then the construct pauses in state \( P' \Downarrow Q' \).
The rule SPA₁ runs \( P \) by itself as long as it does not complete, i.e., blocks with completion \( \bot \). This permits the process \( P \) in construct \( P \parallel Q \) to interleave and communicate with its environment. Note that \( P \) is blocked by the potential \( \Pi \) which describes the environment of \( P \parallel Q \) and does not include the potential accesses performed by \( Q \). The rule SPA₂ deals with the situation where \( P \) completes without pausing. In this case, the preemption construct \( \parallel \) complete likewise, while the waiting process \( Q \), is preempted. If \( P \) pauses, by the following rules SPA₃ and SPA₄, control passes to \( Q \): The process \( Q \) is now allowed to run and interact with the environment, possibly blocked by potential \( \Pi \). Again, this potential is only accounting for the environment concurrent to \( P \parallel Q \) and does not include the potential contribution from \( P \). In this case, \( P \) is already paused, so it does not contribute anything. Observe that in \( \parallel Q' \) the evaluation of pause will stutter. So, the rule SPA₃ can be repeated until \( Q' \) finally blocks or pauses in process \( Q' \). Finally, if \( Q' \) pauses as with rule SPA₃, then both processes pause and wait for the next tick. If \( Q \) completes without pausing (\( \text{exit } k \) with \( k \neq 1 \)) then the rule SPA₄ preempts the pausing \( P' \) and completes as \( Q \) does.
Method Call: For closed programs, at the point where a method call $\text{let } x = o.m(e) \text{ in } P$ is reduced, the expression $e$ is closed. Its value is determined as $\text{eval}(e) = v$. We can then perform the method call in the memory structure, obtaining an updated store $\Sigma \circ o.m(v)$ and a return value $\Sigma \# o.m(v)$. The return value $\Sigma \# o.m(v)$ is substituted for value variable $x$ into the continuation program $P$, and the instantiated process $P[\Sigma \# o.m(v)/x]$ is reduced in the new memory $\Sigma \circ o.m(v)$ using rule $\text{Mtd}_2$ in Fig. ?? . The reduction of a method call can only go ahead under the condition $\pi \not\models \Pi \Rightarrow \{o.m\}$ which expresses that the concurrent potential $\Pi$ does not contain a memory access $\sigma'.m'$ with precedence over $o.m$. When this condition is not satisfied, the method call blocks and the stuttering rule $\text{Mtd}_1$ is the only sstep possible.

Procedure Call: Every procedure $p$ has an associated generic method $p.$step$[f]$ that abstracts a step function to manipulate the store. The procedure is statically defined and generic in a list of memory access paths $\bar{x}$. A procedure call $\text{run } p[f]$ instantiates the step function as $p.$step$[f](\Sigma, p.$init$)$ in the initial state $p.$init$. The interface extension $f$ captures the mapping of formal parameters in the step function to the given arguments. The p-policy $\pi_p$ in the interface of $p$ provides causality information about the accesses that procedure is maximally willing to conduct on the memory through the parameters $\bar{x}$. The instantiated p-policy $\pi_p[\sigma/\bar{x}]$ is used to ensure thread-safe atomic execution of the procedure. The rules to evaluate a procedure call are Act$_2$ and Act$_3$. These run the procedure as a single atomic action on the memory until it completes. If the step function pauses, i.e., the next state satisfies $s' \not\downarrow_p 1$, as in Act$_2$ then the procedure call also pauses and installs itself as $\text{exit } 1\text{; run}_p.\text{tick}(s') p[f]$ to be resumed in the next macro-step. If the step function terminates, i.e., $s' \downarrow_p 0$, as in rule Act$_3$, then the procedure call also terminates.

The side-condition $\pi \not\models \Pi \Rightarrow \pi_p[f]$ in Act$_2$ and Act$_3$ ensures that the procedure call is wait-free, i.e., fully evaluates to completion without any intermittent interaction or synchronisation with the environment. It verifies that no memory access performed by $p.$step$[f]$ and thus appearing in $\pi_p[f]$ is blocked by any of the accesses predicted for the concurrent environment which are aggregated in $\Pi$. Technically, this is the case if there are no $o_1.m_1 \in \Pi$ and $o_2.m_2 \in \pi_p[f]$ such that $\pi \vdash o_1.m_1 \leadsto o_2.m_2$. When the side condition fails, the procedure call blocks and yields to the scheduler with stuttering rule Act$_1$. Note, in rules Act$_2$ and Act$_3$ the precedence relation is existentially generalised to sets of qmethods. Specifically, $\pi \vdash \Pi_1 \leadsto \Pi_2$ if there exist $a_i \in \Pi_i$ such that $\pi \vdash a_1 \leadsto a_2$.

Concurrent Parallel AND: The rules $\text{APA}_1$–$\text{APA}_3$ exercise a parallel $P \mid Q$ by performing an sstep in $P$. This sstep is taken in the extended context $\Sigma; \Pi \cup \text{can}_s(Q)$. The potential of the active sibling thread $Q$ is added to the potential $\Pi$ that characterises the outer environment in which the parent $P \mid Q$ is running. In this way, $Q$ can block the memory accesses of $P$. When $P$ finally yields as $P'$ with a completion in $\{\perp, 1\}$ (waiting or pausing), the parallel completes as $P' \mid Q$ under rule $\text{APA}_1$. When $P$ terminates its sstep with exit 0 then rule $\text{APA}_2$ removes child $P$ from the parallel composition. Finally, if $P$ exits with exit 2 then with rule $\text{APA}_3$ the sibling $Q$ is permitted to complete the current macro-step in configuration $(Q; \text{exit } 2) \Rightarrow \text{exit } 2$. The two occurrences of exit 2 force an exit at level 2 as soon as $Q$ completes. This is the behaviour of parallel in Esterel for the completion code 2. The rules $\text{APA}_1$–$\text{APA}_3$ are symmetrical to $\text{APA}_1$–$\text{APA}_3$. They perform an sstep in the right child $Q$ of a parallel $P \mid Q$.

Toplevel processes are evaluated in the empty environment. We write

$$\mathcal{C} \vdash (\Sigma, P) \Rightarrow (\Sigma', P')$$
if $\Sigma_i \not\vdash P \Rightarrow \Sigma \vdash P'$ according to Fig. ?? with global interface $C$. Because of the stuttering rules $\text{Mtd}_1$ and $\text{Act}_1$, a process can yield at each sequentially reachable method or procedure call.

**Definition 8 (Stable Configuration)** A configuration $\Sigma$, $P$ is called stable if for all $\Sigma'$, $P'$ we have $C \vdash (\Sigma, P) \Rightarrow (\Sigma', P')$ iff $\Sigma = \Sigma$ and $P' = P$.

**Proposition 1**
1) If $P$ is completed then $\Sigma \vdash P$ is stable for all stores $\Sigma$.
2) Let $\Sigma \vdash P \Rightarrow \Sigma' \vdash P'$. If $P$ is closed and well-formed then $P'$ is closed and well-formed, and $\text{can}(P') \subseteq \text{can}(P)$.

Bigsteps are maximal evaluation sequences with thread interleaving, running the process to stability which may be blocked or completed.

**Definition 9 (Big- & Macrostep)** The transitive closure $C \vdash (\Sigma, P) \Rightarrow (\Sigma', P')$ of the sstep relation $\Rightarrow$ is a bigstep if $(\Sigma', P')$ is stable and, more specifically, a macro-step if $P'$ is also completed.

Coherence of a store ensures that the transitive closure of $\Rightarrow$ is confluent and thus $\Rightarrow$ leads to a unique final configuration.

**Theorem 1 (Bigstep Determinacy)** For $C$-coherent $\Sigma$, if $C \vdash (\Sigma, P) \Rightarrow (\Sigma_i, P_i)$ for $i \in \{1, 2\}$, then $\Sigma_1 = \Sigma_2$ and $P_1 = P_2$.

**Definition 10 (Constructiveness)** A process $P$ is called $C$-constructive in a store $\Sigma$, if there exists a macro-step $C \vdash (\Sigma, P) \Rightarrow (\Sigma', P')$ such that if $P'$ is 1-completed then $\text{tick}(P')$ is $C$-constructive in $\Sigma$. A process is $C$-constructive, if it is $C$-constructive in all $C$-coherent stores $\Sigma$.

A $C$-constructive process $P$ generates an infinite stream of macro-step reactions on a global store shared by all concurrent threads in $P$. The memory is protected by the p-policy in $C$ which synchronises concurrent accesses in $P$, thus preventing data races in $C$-coherent structures $S$.

Constructiveness and coherence trade off against each other. The restrictive p-policy $\text{seq}$ blocks all concurrent accesses, whence a process is $\text{seq}$-constructive iff it is sequential. Yet, all stores regardless of their implementation of method calls are $\text{seq}$-coherent. The most liberal policy $\text{top}$ does not block any access, whence all processes are trivially $\text{top}$-constructive. However, a store is $\text{top}$-coherent only if all method calls commute with each other. This is very restrictive and prevents any communication between threads. Useful instances of shared memory lie somewhere between these extremes. The policy $\text{var}$ supports single-writer, multi-reader data-flow variables and $\text{var}$-constructive $P$ correspond to causal dataflow process as in Lustre. With policy $\text{reg}$ we can model data-flow with shared memory as in [23]. For single-writer and single-reader unbounded buffers under policy $\text{buf}$ providing asynchronous writes $\text{b.\text{send}}$ and blocking reads $\text{b.\text{wait}}$, the $\text{buf}$-constructive processes are progressive synchronous Kahn dataflow networks. A memory of $\text{SC}$ variables with methods $\text{s.in}$, $\text{s.upd}$, $\text{s.rd}$ and the init-update-read policy $\text{iur}$ obtains sequentially constructive SCCharts [8] and sequentially constructive Esterel, dubbed $\text{SCEst}$ [24].

How does a practical compiler check if a given process $P$ is $C$-constructive? If $P$ is non-constructive, then for some $C$-coherent initial store $\Sigma_0$, the reduction of $P$ in some macro-step
A configuration \((\Sigma_n', P_n')\) which is either (1) divergent, i.e., there is no \((\Sigma''_n, P''_n)\) with
\[
S : C \vdash (\Sigma'_n, P'_n) \Rightarrow (\Sigma''_n, P''_n)
\]
or (2) blocked, i.e., \((\Sigma'_n, P'_n)\) is stable but not completed. In the crash case (1) the process \(P''_n\) is spinning in an infinite sequential loop. This is the only way in which our reduction semantics in Figs. 6 and 7 can fail to generate a step. The search for a derivation must run through an infinite number of applications of rule \text{Rep}3. Since a single step only involves sequential steps, this means that \(P''_n\) must contain an instantaneous control flow cycle which must be a cycle in the static program structure of \(P\). The crash situation (2) in which \(P''_n\) is stable but not completed occurs if the method or procedure calls inside \(P'_n\) are blocked because of the side conditions in rules Mtd2, Act2, Act3 being false. In this case the stability of \(P''_n\) is obtained by rules Mtd1 and Act1 which permit \(P''_n\) to yield back to the scheduler. The blocking inside \(P''_n\) must involve a set of (at least two) threads \(Q_i\) waiting at a method or procedure call with a precedence constraints of form \(\pi \vdash \Pi_i \rightarrow A_i\), where \(A_i = \{o.m\}\) for a method call \(o.m(v)\) and \(A_i = \pi_p[f]\) for a procedure call. The potentials \(\Pi_i\) are obtained from sequentially reachable method calls in the sibling threads \(Q_j\) \((j \neq i)\) concurrent to the thread \(Q_i\) trying to execute the actions \(A_i\). Hence each entry in \(\Pi_i\) must be sequentially reachable from some of the blocked threads \(Q_j\) \((j \neq i)\). But this implies that \(P''_n\) contains an instantaneous cycle of memory accesses involving precedence dependencies from \(C\). This cycle must already be a static cycle in the original process \(P\) from which \(P''_n\) has been unrolled. In sum, the absence of (1) and (2) is implied by the criterion of acyclic sequential schedulability (ASC) which can be statically checked in polynomial time.

**Theorem 2** Let \(C = (O, \pi)\) be a interface and \(P\) be a closed process well-formed for \(C\). Suppose the static program graph of \(P\) does not contain an instantaneous cycle through sequential program order or concurrent precedence dependencies. Then, \(P\) is \(C\)-constructive.

For general policies the problem of determining whether a process is constructive can be arbitrarily complex. Since constructiveness involves loop termination at every tick it is undecidable for infinite data types. But even if loops are clock-guarded and statically bounded, and all memory is boolean, like in Esterel, the decision problem can be intractable (co-NP). Even the simpler question of whether two statements can be executed in the same tick (called the “tick alignment problem”) without considering memory at all, seems to be exponential, see e.g. [25], yet the exact complexity is still unknown. In practice, the static cycle check according to Thm. 2 suffices. What makes our approach important is that this (well-known) efficient ASC criterion can now be applied uniformly to arbitrary shared memory structures controlled by policies. This idea can be fruitfully integrated as a simple extension to current compilers in languages like Lustre, Esterel or SCCharts.

Observe that Thm. 1 suggests an interesting trade off between the memory model and the program making the simple ASC criterion of Thm. 2 even more powerful: A constructive process \(P\) may have been instantiated with SC variables subject to the p-policy \(iur\). Suppose that a static analysis reveals that \(P\) remains constructive even if the policy is strengthened to the policy \(\text{var} \subseteq iur\) for data flow variables. Then the store remains coherent for \(\text{var}\) and thus \(P\) has the same behaviour if it is compiled for shared memory consisting of data flow memory rather than SC variables.
Example 18 Consider the process $P = ms.\text{acc}.\text{wr}(4.2); (A | B)$ in context context $O'_{\text{facop}}$ where $A$ is as in Ex. 5 and $B$ from Ex. 7. $P$ initialises the acc field of $ms$ with value 4.2 and then spawns children $A$ and $B$. If either cell $m1$ or $m2$ gets aliased with acc we run into a write-write conflict, because both $A$ and $B$ then write into $ms.\text{acc}$. This means that $P$ is constructive for an interface $C_P = (O'_{\text{facop}}, \pi_P)$ if $\pi_P$ forces cells $m1$ and $m2$ to be separated from acc, i.e., $\pi_P \models o.\text{wr} \diamond ms.\text{acc}.\text{wr}$ for $o \in \{ms.m1, ms.m2\}$. This is achieved, e.g., by the fully unrestricted p-policy top (see Ex. 12). For $\pi_P = \text{top}$, a store $\Sigma; O'_{\text{facop}} \in SS$ is $C_P$-coherent if in $\Sigma$ all write accesses $o.\text{wr}$ to cells $o \in \{ms.m1, ms.m2\}$ are commuting with write access $ms.\text{acc}.\text{wr}$. This eliminates all stores in which these cells are aliased. Note that in every well-formed instantiation $P[f]$ the interface extension $f : C_P \subseteq C$ must preserve the dependencies expressed in $\pi_P$ (see Def. 5). This makes sure that $P[f]$ remains $C$-constructive. In contrast, if $\pi_P = \text{seq}$, the extension $f$ can do any aliasing it likes. Even, if $f$ collapses all $O'_{\text{facop}}$ into a single cell $x$, $f : (O_P, \text{seq}) \subseteq \{(x), \text{var}\}$ is an extension. Yet, $P[f]$ is obviously not $\{(x), \text{var}\}$-constructive. For the same reason, $P$ is not $C_P$-constructive because of the existence of aliasing stores $\Sigma[f] : O'_{\text{facop}} \in SS$.

A black-box process remains constructive if it is inline expanded along an interface extension and behaves identical.

Theorem 3 (Memory Safety) Let $f : C_1 \subseteq C_2$ be a interface extension. If $P$ is a $C_1$-constructive process then the inline expansion $P[f]$ is $C_2$-constructive.

We wish to export a process $P$ that is constructive in a context $C_P = (O_P, \pi_P)$ as a precompiled procedure. Thm. 3 guarantees that for all interface extensions $f : C_P \subseteq C$ the inline expansion $P[f]$ remains constructive in the calling context $C = (O, \pi)$. We can publish $P$ under a procedure name $p$ and behaviour equivalent to the generic step function $p.\text{step}[f]$ initialised in state $p.\text{init} = P[f]$. The step function pre-compiles $P[f]$ for the empty environment with potential $\Pi = \emptyset$. More precisely, $p.\text{step}[f](\Sigma, P[f]) = (\Sigma', P')$ iff $\Sigma; \emptyset \vdash P[f] \Rightarrow \Sigma' \vdash P'$. Since $P[f]$ is $C$-constructive, $P'$ is the completed next macro-state $P'$. However, in the calling context, the environment may have a potential $\Pi \neq \emptyset$. For the black-box step function to generate the same behaviour as the white-box expansion, $P[f]$ must be wait-free and not be blocked by $\Pi$, so that $\Sigma; \Pi \vdash P[f] \Rightarrow \Sigma' \vdash P'$. This is the case if $\pi \not\vdash \Pi \Rightarrow \text{can}(P[f])$. Since $\text{can}(Q) \subseteq \pi_P[f]$ for $Q = P[f]$ and all successor states $Q$ reachable from $P[f]$ in all macro-steps, we can test wait-free execution by the condition $\pi \not\vdash \Pi \Rightarrow \pi_P[f]$.

Theorem 4 (Black-box Abstraction) Let $P$ be a $C_P$-constructive process and $f : C_P \subseteq C$ an interface extension and $Q$ a $C$-constructive process containing a procedure call $\text{run}_p[f]$ with declaration $\text{proc}_f(x) = P$. Then, the inline expansion $Q\{P[f]/\text{run}_p[f]\}$ is $C$-constructive and generates the same sequence of macro-states as $Q$, from all $C$-coherent stores.

Note, the expansion $Q\{P[f]/\text{run}_p[f]\}$, which schedules the inline expanded process $P[f]$, may be constructive while the black-box execution of $P$ in $Q$ is not constructive.

VII. CONCLUSION & RELATED WORK

We have presented the formal syntax and semantics of SCPL – a core calculus for imperative synchronous programming languages with procedural abstraction and shared memory communication, based on the sequentially constructive model of computation. In SCPL, the composition of black-box processes is controlled through policy interfaces which ensure
memory-safe, wait-free and determinate execution. By design such guarantees can be statically verified at compile time. All memory references and thus all aliasing effects are statically resolved. To achieve that, policies in SCPL act like static data types. They protect the memory from illegitimate concurrent accesses by the program. As procedure interfaces, they constrain the instantiation at the call site relative to the memory accesses implemented at the declaration point. SCPL can accommodate recent languages such as SCCharts or Blech and is the first calculus of this form.

Existing work on modularisation of synchronous control flow is limited to shared signal communication and either follows white-box or grey-box methods. In [6], [8], [10] the module’s source code is statically expanded in-line and globally scheduled into its calling context. Vecchié et. al [13] describe dynamically scheduled white-box compilation of Esterel in concurrent imperative assembly code, based on cooperative scheduling of sequential coroutines. The Cronos compiler [9] provides grey-box modular compilation into ADA. A module is precompiled as a causality graph of partial reaction functions which is instantiated into a module call. The modularisation of [9] is based on the circuit semantics of Esterel. Zheng and Edwards [26] construct grey-box modules for Esterel from a three-valued simulation of Esterel’s constructive semantics directly in program’s control-flow graph. Grey-box modularisation for synchronous data-flow has been investigated by Lublinerman et al. [27] and Pouzet and Raymond [28]. However, although grey-box obtains optimal modular scheduling, it has been shown to be NP-complete and thus may not always be feasible in practice, in particular because causality errors are difficult to diagnose.

In contrast, SCPL procedures are black-box abstractions that can be executed atomically and implemented independently of their use context. They support genuine program factorisation across general concurrent data structures rather than just signals, where most existing work on modularisation is limited to the primitive signal interface. Clock-synchronised data structures with general access methods have been proposed by Aguado et al. [14] but that work, though imperative, does not handle procedural abstraction. Caspi et al. [23] define shared synchronous reactive objects for Lustre which bundle instance variables and partial stream processing modes similar to the objects arising in the modular compilation of data flow [12]. However, this does not deal with destructive memory updates through sequential control flow like SCPL.

The concurrent access policies (CAP) proposed by [14] are more powerful. These can capture precedence relations that dynamically change with memory state. This is useful to model bounded buffers, for example. It is not clear to us however how CAPs, which [14] have studied only in theory, can be implemented efficiently. Our p-policies are a simplification of CAPs for static scheduling in practical compilers. An extension of our theory to CAPs would combine state-dependent scheduling of [13] with shared data structures of [14] in a black-box setting. We leave this to future work.

The scheduling policies of [23] for data flow objects represent finite safety automata (prefix closed, star-free regular languages) that specify the legal sequence of mode calls that make up a memory-safe and full step reaction in the object. In this respect, they play a similar role as our p-policies, yet can express stateful behaviour which p-policies cannot. On the other hand, [23] does not make a difference between sequential and concurrent accesses, which is the key feature of p-policies. For instance, concurrent writes $x = 4$ and $x = 2$ are treated like a choice of sequential writes $\text{if } e \text{ then } x = 4 \text{ in } x = 2 \text{ else } x = 2 \text{ in } x = 4$ and both rejected. In SCPL only the former is non-constructive while the latter, being single-threaded, is accepted.
Specifically, the singleton constraint of Blech that forces memory accesses to be executed sequentially is not expressible by [23].

Benveniste et al. [29] introduce the notion of convex acceptance interfaces (CAI) for the composition of synchronous modules. CAI specify the synchronisation behaviour of a synchronous block regarding read and write accesses to its interface signals and thus can be used for modular causality analysis. CAI interfaces, like CAPs, are more expressive than our p-policies since they are data-dependent. On the other hand, CAI do not consider destructive (sequentially constructive) read and write actions and are limited to signals as shared memory structures. More importantly, as CAI are obtained from data flow specifications (Signal), they do not express negative trigger conditions like the precedences of our p-policies. Hence, they cannot capture Esterel-style reaction to absence. The reason is the CAI express program-order, while p-policies (and also CAPs) express causality between concurrent actions.

Causality analysis for optimal grey-box modularisation [9], [26], [27], CAP [14], CAI [29], as well as for scheduling policies of [23], have worst-case exponential complexity. There have been several other proposals for interface models in synchronous programming [30], [31] which are computationally tractable and close to p-policies in expressing purely static dependency relations. Cuq and Pouzet [30] use polymorphic record (“row”) types as a static type system for the data flow language Lucid Synchrone. It captures instantaneous dependence of functional expressions on argument variables. Lee et al. [31] propose to model the causality interfaces as dioid algebras. These are very general and applicable for input-output blocks of data and control flow alike. Our p-policies can be seen as an instantaneous dependency relation in the sense of [30] for memory accesses in the imperative setting, or as causality interfaces in the sense of [31] with boolean weights. Then, the static type checking of first-order recursion in [30] or the fixed point analysis of [31] corresponds to static ASC cycle detection in the program graph.

All of the standard causality interfaces as far as we are aware, merely describe the properties of data and computations at run-time and thus can be removed after compilation. In contrast to this, SCPL policies actively enforce scheduling priorities and thus control program execution. Since they prescribe a specific synchronisation model, they are an integral part of the program, like the init-update-read specification of SC-variables in SCCharts or the procedural interfaces in Blech. SCPL policies cannot, in general, be globally synthesised as principal type schemes, like the causality types of [30], or the “scheduling policies” of [23]. SCPL p-policies can, however, be locally optimised by shifting along the Δ relation in the spectrum between seq (purely sequential) and top (purely concurrent) without changing program behaviour. We envisage such optimisations to be useful in compilation schemes by source-level transformation.

For future work, we aim to extend the theory of SCPL policies to model dynamic state and develop a contract theory in the spirit of [32]. For this, a policy specification language and its semantics will be developed. Further, we plan to complete SCPL by a general trap mechanism as in Esterel and by local object declarations. For practical experimentation we wish to integrate our interface theory in a concrete open source compiler, such as the Blech or the SCCharts compilers.

ACKNOWLEDGMENT

The authors would like to thank Joaquín Aguado for many inspiring discussions on the use of policies to describe procedural interfaces. The third author thanks Gerald Lüttgen for valuable comments on a preliminary version of this work.
APPENDIX A

**POTENTIALS**

Potentials synchronise a thread with its concurrent environment. Informally, a potential is a summary of all memory accesses potentially performed by the procedures running concurrently with the thread during the current macro-step. In its simplest form, a potential is a subset \( \Pi \subseteq M \) of \( q \)methods where \( M = Mtd(C) \). We obtain \( \Pi \) from the process \( P \) that represents the concurrent environment of the given thread. To compute \( \Pi \) structurally by recursion on the structure of \( P \) we need to predict also a completion code in \( K_t \) for \( P \) and add it to \( \Pi \). Technically, we define a function \( \text{can}_s(P) = (\Pi, K) \) by recursion on the structure of \( P \). The base cases nothing, pause and exit do not perform any memory accesses, yet have different completion potentials. Formally,

\[
\text{can}_s(\text{nothing}) = \bot_0 \\
\text{can}_s(\text{pause}) = \bot_1 \\
\text{can}_s(\text{exit}) = \bot_2.
\]

where \( \bot_k = (\emptyset, \{k\}) \) for \( k \in K \).

For the conditional branching we need to over-approximate using the collective semantics, joining the memory capabilities of both branches as well as the completion codes. If \( \text{can}_s(P_i) = (\Pi_i, K_i) \) then

\[
\text{can}_s(\text{if } e \text{ then } P_1 \text{ else } P_2) = (\Pi_1 \cup \Pi_2, K_1 \cup K_2).
\]

Note that expressions \( e \) do not perform any memory accesses, they are pure values\(^2\) at runtime.

The potential of \( \text{can}_s(P_1; P_2) \) is a sequential composition of memory accesses and completion codes which depends on whether \( P_1 \) can terminate. If \( \text{can}_s(P_i) = (\Pi_i, K_i) \) then

\[
\text{can}_s(P_1; P_2) = \begin{cases} 
(\Pi_1, K_1) & \text{if } 0 \in K_1 \\
(\Pi_1 \cup \Pi_2, K_1 \setminus \{0\} \cup K_2) & \text{otherwise}
\end{cases}
\]

If \( P_1 \) cannot terminate \( (0 \notin K_1) \) then the downstream process \( P_2 \) in a sequential composition \( P_1; P_2 \) cannot contribute anything to the memory accesses of the current macro-step. As a consequence, \( \text{can}_s(\text{nothing}; P) = \text{can}_s(P) \), \( \text{can}_s(\text{pause}; P) = \bot_1 \) and \( \text{can}_s(\text{exit}; P) = \bot_2 \).

The parallel composition \( P_1 | P_2 \) is joining the memory access capabilities of \( P_1 \) and \( P_2 \), while the completion code models the synchronisation of control-flow. A parallel \( P_1 | P_2 \) terminates iff both threads terminate, returns if one thread returns and pauses if one thread pauses and the other thread does not return. Formally, if \( \text{can}_s(P_i) = (\Pi_i, K_i) \) then \( \text{can}_s(P_1 | P_2) = (\Pi_1 \cup \Pi_2, K) \) where

\[
K = \{ \max(k_1, k_2) \mid k_1 \in K_1, k_2 \in K_2 \}.
\]

\(^2\)We assume expressions are computed on the thread-local stack.
The preemptive parallel-or $P \parallel Q$ is like normal parallel as regards the memory access but has a different synchronisation behaviour for control flow. Thread $Q$ can only contribute memory accesses to the current tick, if $P$ may pause. Otherwise, $Q$ gets preempted. Suppose $can_s(P) = (\Pi_1, K_1)$ and $can_s(Q) = (\Pi_2, K_2)$. Then,

$$can_s(P_1 \parallel P_2) = \begin{cases} (\Pi_1, K_1) & \text{if } 1 \not\in K_1 \\ (\Pi_1 \cup \Pi_2, K) & \text{otherwise} \end{cases}$$

where

$$0 \in K \iff 0 \in K_1 \lor 0 \in K_2$$
$$1 \in K \iff 1 \in K_1 \land 1 \in K_2$$
$$2 \in K \iff 2 \in K_1 \lor 2 \in K_2.$$ 

The preemptive parallel-or terminates or returns as soon as one of the threads terminates or returns, and it pauses only if both threads pause.

The purpose of a trap is to catch a return of the enclosed process and turn it into a normal termination. If $can_s(\text{trap } P) = (\Pi, K)$ then $can_s(\text{trap } P) = (\Pi, K')$ where

$$0 \in K' \iff \{0, 2\} \cap K \neq \emptyset$$
$$1 \in K' \iff 1 \in K$$
$$2 \in K' \iff \text{never.}$$

A critical construct is the statically unbounded loop $\text{loop } P$. A loop cannot terminate but if $P$ pauses or returns, then $\text{loop } P$ also pauses or returns, respectively. Since every well-formed loop body $P$ must be non-instantaneous or loop-safe, which is checked by the compiler, each execution through $P$ must necessarily run into a pause or exit eventually. The former stops the loop for the tick and the latter breaks out of the loop altogether. For the potential suppose $can_s(P) = (\Pi, K)$. Then

$$can_s(\text{loop } P) = (\Pi, K \setminus \{0\}).$$

So far all constructs merely manipulate the completion potentials. The potentials for memory accesses arise from memory access statements:

$$can_s(\text{let } x = o.m(e) \text{ in } P) = (\{o.m\}, \{0\}).$$

We assume that each memory access is executed atomically and must complete by termination.

Let $p(\pi)$ be an procedure with interface $S_p = (O_p, M_p, \pi_p)$ in which $O_p = \{x_1, x_2, \ldots, x_n\}$. An procedure call $\text{run } p(\bar{o})$ with $\bar{o} = o_1, o_2, \ldots, o_n$ is approximated by its instantiated p-policy $\pi_p[\bar{o}/\bar{x}]$. When the procedure call $\text{run } p(\bar{o})$ is executed, the instantiated p-policy $\pi_p[\bar{o}/\bar{x}]$ is tested for conformance $\pi_p[\bar{o}/\bar{x}] \subseteq \pi(\Sigma)$ with the memory p-policy. This verifies that the method calls inside the instantiated procedure are both admissible and race-free, i.e., preserve coherence of the store. For potential, we only need the set of memory accesses potentially

3The completion potential $K'$ is written $K'_p$ in Esterel.

4In Blech there are only two access types let and var and the formal parameters are separated into two lists. A mutating activity call $o.p(\bar{a}_1)(\bar{a}_2)$ with let parameters $\bar{a}_1$ and var parameters $\bar{a}_2$ is treated like a procedure call $p(\bar{a}_1)(\bar{a}_2, o)$ and a non-mutating call is handled like a procedure call $p(\bar{a}_1, o)(\bar{a}_2)$. In this way, the information about separation lies entirely in the position of the parameters. Here, we code the same information in the procedure’s interface type $\pi_p$. 

30
executed by \( p \). This is the underlying set \( \pi_p[\overline{\sigma}/\overline{x}] \) of the p-policy. Since all procedures may pause or terminate in each tick, their completion can be 0 or 1. Overall, thus

\[
\text{can}_s(\text{run } p(\overline{\sigma})) = (\pi_p[\overline{\sigma}/\overline{x}], \{0, 1, 2\}).
\] (2)

It is important to observe that the potential \( \text{dom}(\pi_p[\overline{\sigma}/\overline{x}]) \) is determined from the static interface of \( p \) without looking at its actual code. The advantage of this “black-box” approach is that it supports pre-compilation of possibly recursive methods. The disadvantage is that the static interface \( \pi_p[\overline{\sigma}/\overline{x}] \) will typically be a rather conservative over-approximation of the memory accesses actually performed by \( p(\overline{\sigma}) \). If the procedure definitions are available we could use a “white-box” approach and obtain the potential from the procedure code. Specifically, if the procedure \( p \) is a module defined by the declaration \( \text{proc } p(\overline{x}) = Q \) we could safely define

\[
\text{can}_s(\text{run } p(\overline{\sigma})) = \text{can}_s(Q[\overline{\sigma}/\overline{x}]).
\]

Soundness of the method body \( Q \) for interface \( S_p \)

\[
\text{can}_s(Q[\overline{\sigma}/\overline{x}]) \subseteq \pi_p[\overline{\sigma}/\overline{x}].
\]

Using the white-box potential (3) is more expensive at run-time but less conservative than the static proxy (2). For statically verified constructive programs, both versions produce the same result. However, if a program is not statically constructive, the more precise potential (3) may resolve scheduling deadlocks produced by the statical potential (2).