Practically-Self-Stabilizing Vector Clocks in the Absence of Execution Fairness

Practically-Self-Stabilizing Vector Clocks in the Absence of Execution   Fairness

System Settings

The system includes a set of processors $`P = \{p_1, \ldots, p_N\}`$, which are computing and communicating entities that we model as finite state-machines. Processor $`p_i`$ has an identifier, $`i`$, that is unique in $`P`$. Any pair of active processors can communicate directly with each other via their bidirectional communication channels (of bounded capacity per direction, $`\capacity \in \N`$, which, for example, allows the storage of at most one message). That is, the network’s topology is a fully-connected graph and each $`p_i \in P`$ has a buffer of finite capacity $`\capacity`$ that stores incoming messages from $`p_j`$, where $`p_j \in P\setminus \{p_i\}`$. Once a buffer is full, the sending processor overwrites the buffer of the receiving processor. We assume that any $`p_i, p_j \in P`$ have access to $`channel_{i,j}`$, which is a self-stabilizing end-to-end message delivery protocol (that is reliable FIFO) that transfers packets from $`p_i`$ to $`p_j`$. Note that  present a self-stabilizing reliable FIFO message delivery protocol that tolerates packet omissions, reordering, and duplication over non-FIFO channels.

The processor’s program is a sequence of (atomic) steps. Each step starts with an internal computation and finishes with a single communication operation, i.e., packet $`send`$ or $`receive`$. We assume the interleaving model, where steps are executed atomically; one step at a time. Input events refer to packet receptions or a periodic timer that can, for example, trigger the processor to broadcast a message. Note that the system is asynchronous and the algorithm that each processor is running is oblivious to the timer rate. Even though the scheduler can be adversarial, we assume that each processor’s local scheduler is fair, i.e., the processor alternates between completing send and receive operations (unless the processor’s communication channels are empty). Note that a message that a processor $`p_i`$ needs to send to its neighbors takes $`N-1`$ consecutive steps of $`p_i`$ (the execution might include steps of other processors in between), since each step can include at most one send (or receive) operation.

The state, $`s_i`$, of $`p_i \in P`$ includes all of $`p_i`$’s variables as well as the set of all messages in $`p_i`$’s incoming communication channels. Note that $`p_i`$’s step can change $`s_i`$ as well as remove a message from $`channel_{j,i}`$ (upon message arrival) or queue a message in $`channel_{i,j}`$ (when a message is sent). We assume that if $`p_i`$ sends a message infinitely often to $`p_j`$, processor $`p_j`$ receives that message infinitely often, i.e., the communication channels are fair. The term system state refers to a tuple of the form $`c = (s_1, s_2, \cdots, s_N)`$, where each $`s_i`$ is $`p_i`$’s state (including messages in transit to $`p_i`$). We define an execution (or run) $`R={c_0,a_0,c_1,a_1,\ldots}`$ as an alternating sequence of system states $`c_x`$ and steps $`a_x`$, such that each system state $`c_{x+1}`$, except for the initial system state $`c_0`$, is obtained from the preceding system state $`c_x`$ by the execution of step $`a_x`$.

At any point and without warning, $`p_i`$ is prone to a crash failure, which causes $`p_i`$ to either forever stop taking steps (without the possibility of failure detection by any other processor in the system) or to perform an undetectable restart in a subsequent step . In case processor $`p_i`$ performs an undetectable restart, it continues to take steps by having the same state as immediately before crashing, but possibly having lost the messages that other processors sent to $`p_i`$ between crashing and restarting. Processors know the set $`P`$, but have no knowledge about the number or the identities of the processors that never crash.

We assume that transient faults occur only before the starting system state $`c_0`$, and thus $`c_0`$ is arbitrary. Since processors can crash after $`c_0`$, the executions that we consider are not fair . We illustrate the failures that we consider in this paper in Figure 1.

Illustration of the failure model and of transient faults.

We say that a processor is active during a finite execution $`R'`$ if it takes at least one step in $`R'`$. We say that a processor is active throughout an infinite execution $`R`$, if it takes an infinite number of steps during $`R`$. Note that the fact that a processor is active during an infinite execution does not give any guarantee on when or how often it takes steps. Thus, there might be an arbitrarily long (yet finite) subexecution $`R'`$ of $`R`$, such that a processor is active in $`R`$ but not in $`R'`$. Therefore, processors that crash and never restart during an infinite execution $`R`$ are not active throughout $`R`$.

Suppose that $`R'`$ is a prefix of an execution $`R`$, and $`R''`$ is the remaining suffix of $`R`$. We use the concatenation operator $`\circ`$ to write that $`R=R' \circ R''`$, such that $`R'`$ is a finite execution that starts with the initial system state of $`R`$ and ends with a step that is immediately followed by the initial state of $`R''`$. We denote by $`R' \seg R`$ the fact that $`R'`$ is a subexecution (or segment) of $`R`$.

To the end of defining the stabilization criteria, we need to compare the number of steps that violate safety in a finite execution $`R`$ with the length of $`R`$. In the following, we define how to compare finite executions and sets of states according to their size.

We say that the length of a finite execution $`R=c_0, a_0, c_1, a_1, \ldots, c_{x-1}, a_{x-1}`$ is equal to $`x`$, which we denote by $`|R|=x`$. Let $`\MI`$ be an integer that is considered as a practically infinite  quantity for a system $`\cS`$ (e.g., the system’s lifetime). For example, $`\MI`$ can refer to $`2^b`$ (where $`b=64`$ or larger) sequential system steps (e.g., single send or receive events). In this paper, we use $`\ll`$ as a formal way of referring to the comparison of, say, $`N^c`$, for a small integer $`c`$, and $`\MI`$, such that $`N^c`$ is an insignificant number when compared to $`\MI`$. Since this comparison of quantities is system-dependent, we give a modular definition of $`\ll`$ below.

Let $`\pinf`$ denote a system-dependent quantity that is practically-infinite for a system $`\cS`$, such that for an integer $`z\ll \MI`$, we have that $`\pinf := z\cdot \MI`$. For a system $`\cS`$ and $`x\in \N`$, we denote by $`x\ll \pinf`$ the fact that $`x`$ is significantly less than (or insignificant with respect to) $`\pinf`$. We say that an execution $`R`$ is of $`\pinf`$-scale, if there exists an integer $`y\ll \MI`$, such that $`|R| = y\cdot \MI`$ holds.

We define the system’s abstract task $`\cT`$ by a set of variables (of the processor states) and constraints, which we call the system requirements, in a way that defines a desired system behavior, but does not consider necessarily all the implementation details. We say that an execution $`R`$ is a legal execution if the requirements of task $`\cT`$ hold for all the processors that take steps during $`R`$ (which might be a proper subset of $`P`$). We denote the set of legal executions with $`\LE`$. We denote with $`f_R`$ the number of deviations from the abstract task in an execution $`R`$, i.e., the number of states in $`R`$ in which the task requirements do not hold (hence $`R\in\LE \iff f_R = 0`$). Note that the definition of $`\LE`$ allows executions of very small length, but our focus will be on finding maximal subexecutions $`R^*\seg R`$ for a given $`\pinf`$-scale execution $`R`$, such that $`R^*\in\LE`$.

For every infinite execution $`R`$, there exists a partition $`R = R' \circ R''`$, such that $`|R'| = z(N) \in \N`$ and $`f_{R''} = 0`$, where $`z(N)`$ is the complexity measure.

For every infinite execution $`R`$, $`f_R = f(R,N) \in \N`$, where $`f_R`$ is the complexity measure.

For every infinite execution $`R`$, and for every $`\pinf`$-scale subexecution $`R'`$ of $`R`$, $`f_{R'} = f(R',N) \ll |R'|`$, where $`f_{R'}`$ is the complexity measure.

We present a requirement (Requirement) which defines the abstract task of vector clocks. This requirement trivially holds for a fault-free system that can store unbounded values (and thus does not need to deal with integer overflow events). The presence of transient faults can violate these assumptions and cause the system to deviate from the abstract task, which $`\LE`$ specifies (through Requirement). In the following, we present Requirement and its relation to causal ordering (Property [req:3causality]).

We assume that each processor $`p_i`$ is recording the occurrence of a new local event by incrementing the $`i`$-th entry of its vector clock. During a legal execution, we require that the processors count all the events occurring in the system, despite the (possibly concurrent) wrap around events. Hence, we require that the vector clock element of each (active) processor records all the increments done by that processor (Requirement [req:2act]). As a basic functionality, we assume that each processor can always query the value of its local vector clock. We say that an execution $`R^*`$ is a legal execution, i.e., $`R^*\in \LE`$, if Requirement [req:2act] holds for the states of all processors that take steps during $`R^*`$.

Let $`R`$ be an execution, $`p_i`$ be an active processor, and $`V^k_i[i]`$ be $`p_i`$’s value in $`c_k \in R`$. For every active processor $`p_i\in P`$, the number of $`p_i`$’s counter increments between the states $`c_k`$ and $`c_\ell \in R`$ is $`V^\ell_i[i] - V^k_i[i]`$, where $`c_k`$ precedes $`c_\ell`$ in $`R`$.

We explain how faults and bounded counter values affect Requirement and causal ordering. Let $`V`$ and $`V'`$ be two vector clocks, and $`\causalPrecedence(V, V')`$ be a query which is true, if and only if, $`V`$ causally precedes $`V'`$, i.e., $`V'`$ records all the events that appear in $`V`$ . Then, $`V`$ and $`V'`$ are concurrent when $`\neg \causalPrecedence(V, V')\land \neg \causalPrecedence(V', V)`$ holds. We formulate the causal precedence property in Property [req:3causality]. In a fault-free system with unbounded values, Requirement [req:2act] trivially holds, since no wrap around events occur. That is, $`V`$ causally precedes $`V'`$, if $`V[i] \leq V'[i]`$ for every $`i\in \{1,\ldots,n\}`$ and $`\exists_{j\in \{1,\ldots,n\}} V[j] < V'[j]`$ hold . However, this is not the case in an asynchronous, crash-prone, and bounded-counter setting, where counter overflow events can occur. We present cases where Requirement and Property [req:3causality] do not hold due to a counter overflow event in Example [eg:overflow].

For any two vector clocks $`V_i`$ and $`V_j`$ of two processors $`p_i, p_j\in P`$, $`\causalPrecedence(V_i, V_j)`$ is true if and only if $`V_i`$ causally precedes $`V_j`$.

Consider two bounded vector clocks $`V_i = \la v_{i_1},\ldots, v_{i_N} \ra`$ and $`V_j = \la v_{j_1},\ldots, v_{j_N} \ra`$ of $`p_i, p_j\in P`$, such that upon a new event $`p_k\in P`$ increments $`V_k[k]`$ by adding $`1\,\mmi`$. Assume that $`V_i = V_j`$ and $`V_i[i] = V_j[i] = \MI - 1`$ hold (e.g., as an effect of a transient fault). In the following step $`p_i`$ increments $`V_i[i]`$ by 1, thus $`V_i[i]`$ wraps around to $`V_i[i]=0`$, while $`V_j[i] = \MI-1`$ remains. Then, $`V_i[i]`$ mistakenly indicates zero events for $`p_i`$ ($`V_i[i] = 0`$) instead of $`\MI`$, i.e., Requirement does not hold. Also, using the definition of causal precedence in fault-free systems and unbounded counters, $`V_i`$ appears to causally precede $`V_j`$, which is wrong, since $`V_j`$ causally precedes $`V_i`$ ($`p_i`$ had one more event than what $`p_j`$ records). That is, $`V_i[k] = V_j[k]`$ for $`k\neq i`$ and $`V_i[i] = 0 < \MI-1 = V_j[i]`$, which mistakenly indicates that $`p_j`$ records $`\MI-1`$ more events than $`p_i`$.0◻

We remark that Requirement is a necessary and sufficient condition for Property [req:3causality] to hold. Suppose that Requirement does not hold, which means that it is not possible to count the events of a single processor between two states (e.g., as we showed in the previous example). This implies that it is not possible to compare two vector clocks, hence Property [req:3causality] cannot hold. Moreover, if Requirement holds, then it is possible to compare how many events occurred in a single processor between two states, and by extension it is possible to compare all vector clock entries for two vector clocks. The latter is a sufficient condition for defining causal precedence (as in the fault-free unbounded-counter setting ).

In Section 15 we present our solution for computing $`V^\ell_i[i] - V^k_i[i]`$ for Requirement ($`c_\ell,\, c_k`$ are states in an execution $`R`$ and $`p_i\in P`$) and $`\causalPrecedence(V_i, V_j)`$ for Property [req:3causality] in a legal execution. In Section 11 we present an algorithm for replicating vector clocks in the presence of faults and bounded-counters, which we prove to be practically-self-stabilizing in Section 13.