DISCMAT Project

Mathematical Methods in Distributed Computing

About

Practically all computing systems, from fire alarms to Internet-scale services, are nowadays distributed: they consist of a number of computing units performing independent computations and communicating with each other to synchronize their activities. Our dependence on performance and reliability of the distributed computing becomes more and more imminent. Therefore, understanding fundamentals of distributed computing is of crucial importance. The main complication here is the existing immense diversity of distributed applications, models of distributed computations, and performance metrics, combined with the lack of mathematical tools to handle this complexity.

Recently, an impressive attempt to address this challenge was made: a number of long-standing open questions in distributed computability were resolved using some of the most advanced branches of modern mathematics, including the elements of combinatorial and algebraic topology. These encompass proving impossibility of solving the fundamental problems of set agreement and renaming in the wait-free manner. However, most of the existing applications of topology in distributed computing concern theoretical positive or negative results, i.e., proving that no solution to a given problem in a given model exists or proving the existence fact in a non-constructive way. With a few exceptions, there are no convincing examples of using advanced mathematical tools to design new efficient algorithms.

At a higher level, this project aims at better understanding of what can and what cannot be implemented in specific distributed environments. In particular, we intend to apply the power of modern mathematics in deriving new algorithms and tight lower bounds for distributed computing problems.

News

Members

Prof. Petr KUZNETSOV, Télécom ParisTech

Email: petr [dot] kuznetsov [at] telecom-paristech.fr
Phone: +33 (0)1 45 81 81 91
Website: http://perso.telecom-paristech.fr/~kuznetso/
Address: Département Informatique et Réseaux (INFRES)
Télécom ParisTech
46 Rue Barrault, 75013 Paris, France

Prof. Michel RAYNAL, Senior Member, Institut Universitaire de France

Email: raynal [at] irisa.fr
Phone: +33 (0)6 82 56 38 84
Website: http://www.irisa.fr/prive/raynal/
Address: IRISA, ISTIC Université de Rennes (ASAP)
Avenue du Général Leclerc
35042 Rennes, France

Prof. Dmitry FEICHTNER-KOZLOV, University of Bremen

Email: dfk [at] math [dot] uni-bremen.de
Phone: +33 +49 (421) 218 63680
Website: http://www.informatik.uni-bremen.de/~dfk/
Address: Mathematik (FB3)
Bremen University
D-28334 Bremen, Germany

Prof. Achour Mostefaoui, University of Nantes

Email: achour [dot] mostefaoui [at] univ-nantes.fr
Phone: +33 (0)2 76 64 50 31
Website: http://www.univ-nantes.fr/mostefaoui-a
Address: UFR Sciences et Techniques
2, rue de la Houssiniere
44322 Nantes, France

Publications

  • Distributed Universal Constructions: a Guided Tour,
    Michel Raynal
    Bulletin of the EATCS, Vol. 1, No. 121, 2017.
    [ bibtex | paper ]

    The notion of a universal construction is central in computing science: the wheel has not to be reinvented for each new problem. In the context of n-process asynchronous distributed systems, a universal construction is an algorithm that is able to build any object defined by a sequential specification despite the occurrence of up to (n−1) process crash failures. The aim of this paper is to present a guided tour of such universal constructions. Its spirit is not to be a catalog of the numerous constructions proposed so far, but a (as simple as possible) presentation of the basic concepts and mechanisms that constitute the basis these constructions rest on.

  • From Wait-free to Arbitrary Concurrent Solo Executions in Colorless Distributed Computing,
    Maurice Herlihy, Sergio Rajsbaum, Michel Raynal and Julien Stainer
    Theoretical Computer Science, Vol. 683, Pages 1-21, 2017. DOI:10.1016/j.tcs.2017.04.007
    [ bibtex | paper ]

    In an asynchronous distributed system where any number of processes may crash, a process may have to run solo, computing its local output without receiving any information from other processes. In the basic shared memory system where the processes communicate through atomic read/write registers, at most one process may run solo.

    This paper introduces the family of d-solo models, where d-processes may concurrently run solo, 1dn (the 1-solo model is the basic read/write model). The paper then studies distributed colorless computations in the d-solo models, where process ids are not used, either in task specifications or during computation. It presents a characterization of the colorless tasks that can be solved in each d-solo model. Colorless tasks include consensus, set agreement and many other previously studied tasks. It shows that colorless algorithms have limited computational power for solving tasks, only when d>1. When d=1, colorless algorithms can solve the same tasks as algorithms that may use ids. It is well-known that, while consensus is not wait-free solvable in a model where at most one process may run solo, ε-approximate agreement is solvable. In a d-solo model, the fundamental solvable task is (d,ε)-solo approximate agreement, a generalization of ε-approximate agreement. Indeed, (d,ε)-solo approximate agreement can be solved in the d-solo model, but not in the (d+1)-solo model.

    Finally, the paper studies a link between the solvability of d-set agreement and (d,ε)-solo approximate agreement in asynchronous wait-free message-passing systems, which provides an insight on the “maximal partitioning” allowed to solve an approximate agreement task.

    Keywords: Approximate agreement, Asynchronous system, Colorless task, Read–write shared memory, Topology, Wait-free computing.

  • Early Decision and Stopping in Synchronous Consensus: A Predicate-Based Guided Tour,
    Armando Castañeda, Yoram Moses, Michel Raynal and Matthieu Roy
    El Abbadi A., Garbinato B. (Ed.): NETYS 2017, LNCS 10299, Pages 206-221, Springer, 2017. DOI:10.1007/978-3-319-59647-1_16
    [ bibtex | paper ]

    Consensus is the most basic agreement problem encountered in fault-tolerant distributed computing: each process proposes a value and non-faulty processes must agree on the same value, which has to be one of the proposed values. While this problem is impossible to solve in asynchronous systems prone to process crash failures, it can be solved in synchronous (round-based) systems where all but one process might crash in any execution. It is well-known that (t+1) rounds are necessary and sufficient in the worst case execution scenario for the processes to decide and stop executing, where t<n is a system parameter denoting the maximum number of allowed process crashes and n denotes the number of processes in the system.

    Early decision and stopping considers the case where f<t processes actually crash, f not being known by processes. It has been shown that the number of rounds that have to be executed in the worst case is then min(f+2,t+1). Following Castañeda, Gonczarowski and Moses (DISC 2014), the paper shows that this value is an upper bound attained only in worst execution scenarios. To this end, it investigates a sequence of three early deciding/stopping predicates P1=Pcount, P2=Pdif and P3=Ppref0, of increasing power, which differ in the information obtained by the processes from the actual failure, communication and data pattern. It is shown that each predicate Pi is better than the previous one Pi−1, i∈{2,3}, in the sense that there are executions where Pi allows processes to reach a decision earlier than Pi−1, while Pi−1 never allows a process to decide earlier than Pi. Moreover, P3=Ppref0 is an unbeatable predicate in the sense that it cannot be strictly improved: if there is an early deciding/stopping predicate P′ that improves the decision time of a process with respect to Ppref0 in a given execution, then there is at least one execution in which a process decides with P′ strictly later than with Ppref0.

    Keywords: Agreement, Consensus, Early decision, Early stopping, Process crash, Round-based algorithm, Synchronous message-passing system, t-Resilience.

  • Long-Lived Tasks,
    Armando Castañeda, Sergio Rajsbaum and Michel Raynal
    El Abbadi A., Garbinato B. (Ed.): NETYS 2017, LNCS 10299, Pages 439-454, Springer, 2017. DOI:10.1007/978-3-319-59647-1_32
    [ bibtex | paper ]

    The predominant notion for specifying problems to study distributed computability are tasks. Notable examples of tasks are consensus, set agreement, renaming and commit-adopt. The theory of task solvability is well-developed using topology techniques and distributed simulations. However, concurrent computing problems are usually specified by objects. Tasks and objects differ in at least two ways. While a task is a one-shot problem, an object, such as a queue or a stack, typically can be invoked multiple times by each process. Also, a task, defined in terms of sets, specifies its responses when invoked by each set of processes concurrently, while an object, defined in terms of sequences, specifies the outputs the object may produce when it is accessed sequentially.

    In a previous paper we showed how tasks can be used to specify one-shot objects (where each process can invoke only one operation, only once). In this paper we show how the notion of tasks can be extended to model any object. A potential benefit of this result is the use of topology, and other distributed computability techniques to study long-lived objects.

    Keywords: Distributed problems, Formal specifications, Tasks, Sequential specifications, Linearizability, Long-lived objects.

  • Implementing Set Objects in Dynamic Distributed Systems,
    Roberto Baldoni, Silvia Bonomi and Michel Raynal
    Journal of Computer and System Sciences, Vol. 82, No. 5, Pages 654-689, 2016. DOI:10.1016/j.jcss.2015.11.002
    [ bibtex | paper ]

    This paper considers a set object, i.e., a shared object allowing users (processes) to add and remove elements to the set, as well as taking consistent snapshots of its content. Specifically, we show that there not exists any protocol implementing a set object, using finite memory, when the underlying distributed system is eventually synchronous and affected by continuous arrivals and departures of processes (phenomenon also known as churn). Then, we analyze the relationship between system model assumptions and object specification in order to design protocols implementing the set object using finite memory. Along one direction (strengthening the system model), we propose a protocol implementing the set object in synchronous distributed systems and, along the other direction (weakening the object specification), we introduce the notion of a k-bounded set object proposing a protocol working on an eventually synchronous system.

    Keywords: Shared objects, Churn, Asynchronous system, Synchronous system, Per-element sequential consistency.

  • A Fast Contention-Friendly Binary Search Tree,
    Tyler Crain, Vincent Gramoli and Michel Raynal
    Parallel Processing Letters, Vol. 26, No. 3, 2016. DOI:10.1142/S0129626416500158
    [ bibtex | paper ]

    This paper presents a fast concurrent binary search tree algorithm. To achieve high performance under contention, the algorithm divides update operations within an eager abstract access that returns rapidly for efficiency reason and a lazy structural adaptation that may be postponed to diminish contention. To achieve high performance under read-only workloads, it features a rebalancing mechanism and guarantees that read-only operations searching for an element execute lock-free.

    We evaluate the contention-friendly binary search tree using Synchrobench, a benchmark suite to compare synchronization techniques. More specifically, we compare its performance against five state-of-the-art binary search trees that use locks, transactions or compare-and-swap for synchronization on Intel Xeon, AMD Opteron and Oracle SPARC. Our results show that our tree is more efficient than other trees and double the throughput of existing lock-based trees under high contention.

    Keywords: Binary tree, Concurrent data structures, Efficient implementation.

  • Probabilistic Causal Message Ordering,
    Achour Mostéfaoui and Stéphane Weiss
    International Conference on Parallel Computing Technologies, Pages 315-326, Springer, 2017. DOI:10.1007/978-3-319-62932-2_31
    [ bibtex | paper ]

    Causal broadcast is a classical communication primitive that has been studied for more then three decades and several implementations have been proposed. The implementation of such a primitive has a non negligible cost either in terms of extra information messages have to carry or in time delays needed for the delivery of messages. It has been proved that messages need to carry a control information the size of which is linear with the size of the system. This problem has gained more interest due to new application domains such that collaborative applications are widely used and are becoming massive and social semantic web and linked-data the implementation of which needs causal ordering of messages. This paper proposes a probabilistic but efficient causal broadcast mechanism for large systems with changing membership that uses few integer timestamps.

    Keywords: Asynchronous message-passing system, Happened before relation, Logical clock, Message causal ordering, Vector clock.

  • Non-interference and Local Correctness in Transactional Memory,
    Petr Kuznetsov and Sathya Peri
    Theoretical Computer Science, Vol. 688, Pages 103-116, 2017. DOI:10.1016/j.tcs.2016.06.021
    [ bibtex | paper ]

    Transactional memory promises to make concurrent programming tractable and efficient by allowing the user to assemble sequences of actions in atomic transactions with all-or-nothing semantics. It is believed that, by its very virtue, transactional memory must ensure that all committed transactions constitute a serial execution respecting the real-time order. In contrast, aborted or incomplete transactions should not “take effect.” But what does “not taking effect” mean exactly?

    It seems natural to expect that aborted or incomplete transactions do not appear in the global serial execution, and, thus, no committed transaction can be affected by them. We investigate another, less obvious, feature of “not taking effect” called non-interference aborted or incomplete transactions should not force any other transaction to abort. In the strongest form of non-interference that we explore in this paper, by removing a subset of aborted or incomplete transactions from the history, we should not be able to turn an aborted transaction into a committed one without violating the correctness criterion.

    We show that non-interference is, in a strict sense, not implementable with respect to the popular criterion of opacity that requires all transactions (be they committed, aborted or incomplete) to witness the same global serial execution. In contrast, when we only require local correctness, non-interference is implementable. Informally, a correctness criterion is local if it only requires that every transaction can be serialized along with (a subset of) the transactions committed before its last event (aborted or incomplete transactions ignored). We give a few examples of local correctness properties, including the recently proposed criterion of virtual world consistency, and present a simple though efficient implementation that satisfies non-interference and local opacity.

    Keywords: Software transactional memory, Correctness criterion, Opacity, Non-interference, Permissiveness.

  • Progress-Space Tradeoffs in Single-Writer Memory Implementations,
    Damien Imbs, Petr Kuznetsov and Thibault Rieutord
    arXiv preprint arXiv:1709.01879
    [ bibtex | paper ]

    Most algorithms designed for shared-memory distributed systems assume the single-writer multi-reader (SWMR) setting where each process is provided with a unique register readable by all. In a system where computation is performed by a bounded number n of processes coming from a very large (possibly unbounded) set of potential participants, the assumption of a SWMR memory is no longer reasonable. If only a bounded number of multi-writer multi-reader (MWMR) registers are provided, we cannot rely on an a priori assignment of processes to registers. In this setting, simulating SWMR memory, or equivalently, ensuring stable writing (i.e., every written value persists in the memory), is desirable.

    In this paper, we propose a SWMR simulation that adapts the number of MWMR registers used to the desired progress condition. For any given k from 1 to n, we present an algorithm that uses only n+k&iminus;1 registers to simulate a k-lock-free SWMR memory. We also give a matching lower bound of n+1 registers required for the case of 2-lock-freedom, which supports our conjectures that the algorithm is space-optimal. Our lower bound holds for the strictly weaker progress condition of 2-obstruction-freedom, which suggests that the space complexity for k-obstruction-free and k-lock-free SWMR simulations might coincide.

    Keywords: Single-writer memory implementation, Comparison-based algorithms, Space complexity, Progress conditions.

  • Are Byzantine Failures Really Different from Crash Failures?,
    Damien Imbs, Michel Raynal and Julien Stainer
    30th International Symposium on Distributed Computing, Springer LNCS 9888, Pages 215-229, 2016. DOI:10.1007/978-3-662-53426-7_16
    [ bibtex | paper ]

    When considering n-process asynchronous systems, where up to t processes can fail, and communication is by read/write registers or reliable message-passing, are (from a computability point of view) Byzantine failures “different” from crash failures? This is the question addressed in this paper, which shows that the answer is “no” for systems where t<n/3.

    To this end, the paper presents a new distributed simulation whose core is an extended BG simulation suited to asynchronous message-passing systems. More precisely, assuming t<min(n′,n/3), it describes a signature-free algorithm that simulates a system of n′ processes where up to t may crash, on top of a basic system of n processes where up to t may be Byzantine. In addition to extending (in a modular and direct way) the basic BG simulation to Byzantine message-passing systems this simulation also allows crash-tolerant algorithms, designed for asynchronous read/write systems, to be executed on top of asynchronous message-passing systems prone to Byzantine failures.

  • Atomic Read/Write Memory in Signature-Free Byzantine Asynchronous Message-Passing Systems,
    Achour Mostéfaoui, Matoula Petrolia, Michel Raynal and Claude Jard
    Theory of Computing Systems, Vol. 60, No. 4, Pages 677-694, 2017. DOI:10.1007/s00224-016-9699-8
    [ bibtex | paper ]

    This article presents a signature-free distributed algorithm which builds an atomic read/write shared memory on top of a fully connected peer-to-peer n-process asynchronous message-passing system in which up to t<n/3 processes may commit Byzantine failures. From a conceptual point of view, this algorithm is designed to be as close as possible to the algorithm proposed by Attiya, Bar-Noy and Dolev (J. ACM 42(1), 121–132 (1995)), which builds an atomic register in an n-process asynchronous message-passing system where up to t<n/2 processes may crash. The proposed algorithm is particularly simple. It does not use cryptography to cope with Byzantine processes, and is optimal from a t-resilience point of view (t<n/3). A read operation requires O(n) messages, and a write operation requires O(n²) messages.

    Keywords: Asynchronous message-passing system, Atomic read/write register, Byzantine process, Linearizability, Reliable broadcast abstraction.

  • Grasping the Gap Between Blocking and Non-Blocking Transactional Memories,
    Petr Kuznetsov and Srivatsan Ravi
    Journal of Parallel and Distributed Computing, Vol. 101, Pages 1-16, 2017. DOI:10.1016/j.jpdc.2016.10.008
    [ bibtex | paper ]

    Transactional memory (TM) is an inherently optimistic abstraction: it allows concurrent processes to execute sequences of shared-data accesses (transactions) speculatively, with an option of aborting them in the future. Early TM designs avoided using locks and relied on non-blocking synchronization to ensure obstruction-freedom: a transaction that encounters no step contention is not allowed to abort. However, it was later observed that obstruction-free TMs perform poorly and, as a result, state-of-the-art TM implementations are nowadays blocking, allowing aborts because of data conflicts rather than step contention.

    In this paper, we explain this shift in the TM practice theoretically, via complexity bounds. We prove a few important lower bounds on obstruction-free TMs. Then we present a lock-based TM implementation that beats all of these lower bounds. In sum, our results exhibit a considerable complexity gap between non-blocking and blocking TM implementations.

    Keywords: Transactional memory, Obstruction-freedom, Memory stalls, Expensive synchronization, Lower bounds, Invisible reads, Disjoint-access parallelism, Perturbability, Blocking, Non-blocking.

  • Time-Efficient Read/Write Register in Crash-Prone Asynchronous Message-Passing Systems,
    Achour Mostéfaoui and Michel Raynal
    Abdulla P., Delporte-Gallet C. (Ed.): NETYS 2016, LNCS 9944, Pages 250-265, 2016. DOI:10.1007/978-3-319-46140-3_21
    [ bibtex | paper ]

    The atomic register is certainly the most basic object of computing science. Its implementation on top of an n-process asynchronous message-passing system has received a lot of attention. It has been shown that t<n/2 (where t is the maximal number of processes that may crash) is a necessary and sufficient requirement to build an atomic register on top of a crash-prone asynchronous message-passing system. Considering such a context, this paper visits the notion of a fast implementation of an atomic register, and presents a new time-efficient asynchronous algorithm. Its time-efficiency is measured according to two different underlying synchrony assumptions. Whatever this assumption, a write operation always costs a round-trip delay, while a read operation costs always a round-trip delay in favorable circumstances (intuitively, when it is not concurrent with a write). When designing this algorithm, the design spirit was to be as close as possible to the one of the famous ABD algorithm (proposed by Attiya, Bar-Noy, and Dolev).

    Keywords: Asynchronous message-passing system, Atomic read/write register, Concurrency, Fast operation, Process crash failure, Synchronous behavior, Time-efficient operation.

  • Causal Consistency: Beyond Memory,
    Matthieu Perrin, Achour Mostéfaoui and Claude Jard
    ACM PPOPP’16, Pages 26:1-26:12, Barcelona, Spain, March 12-16, 2016. DOI:10.1145/3016078.2851170
    [ bibtex | paper ]

    In distributed systems where strong consistency is costly when not impossible, causal consistency provides a valuable abstraction to represent program executions as partial orders. In addition to the sequential program order of each computing entity, causal order also contains the semantic links between the events that affect the shared objects -- messages emission and reception in a communication channel, reads and writes on a shared register. Usual approaches based on semantic links are very difficult to adapt to other data types such as queues or counters because they require a specific analysis of causal dependencies for each data type. This paper presents a new approach to define causal consistency for any abstract data type based on sequential specifications. It explores, formalizes and studies the differences between three variations of causal consistency and highlights them in the light of PRAM, eventual consistency and sequential consistency: weak causal consistency, that captures the notion of causality preservation when focusing on convergence; causal convergence that mixes weak causal consistency and convergence; and causal consistency, that coincides with causal memory when applied to shared memory.

    Keywords: Causal consistency, Consistency criteria, Pipelined consistency, Sequential consistency, Shared objects, Weak causal consistency.

  • Agreement Functions for Distributed Computing Models,
    Petr Kuznetsov and Thibault Rieutord
    El Abbadi A., Garbinato B. (Ed.): NETYS 2017, LNCS 10299, Pages 175-190, Springer, 2017. DOI:10.1007/978-3-319-59647-1_14
    [ bibtex | paper ]

    The paper proposes a surprisingly simple characterization of a large class of models of distributed computing, via an agreement function: for each set of processes, the function determines the best level of set consensus these processes can reach. We show that the task computability of a large class of fair adversaries that includes, in particular superset-closed and symmetric one, is precisely captured by agreement functions.

  • Perfect Failure Detection with Very Few Bits,
    Pierre Fraigniaud, Sergio Rajsbaum, Corentin Travers, Petr Kuznetsov and Thibault Rieutord
    Bonakdarpour B., Petit F. (Ed.): SSS 2016, LNCS 10083, Pages 154–169, 2016. DOI:10.1007/978-3-319-49259-9_13
    [ bibtex | paper ]

    A failure detector is a distributed oracle that provides each process with a module that continuously outputs an estimate of which processes in the system have failed. The perfect failure detector provides accurate and eventually complete information about process failures. We show that, in asynchronous failure-prone message-passing systems, perfect failure detection can be achieved by an oracle that outputs at most ⌈logα(n)⌉+1 bits per process in n-process systems, where α denotes the inverse-Ackermann function. This result is essentially optimal, as we also show that, in the same environment, no failure detector outputting a constant number of bits per process can achieve perfect failure detection.

    Keywords: Failure detectors, Well-quasi-order, Higman’s lemma.

  • Anonymous Obstruction-Free (n,k)-Set Agreement with n−k+1 Atomic Read/Write Registers,
    Zohir Bouzid, Michel Raynal and Pierre Sutra
    Anceaume E., Cachin C., Potop-Butucaru M. (Ed.): OPODIS 2015, LIPIcs 46, Pages 1-17, 2016. DOI:10.4230/LIPIcs.OPODIS.2015.18
    [ bibtex | paper ]

    The k-set agreement problem is a generalization of the consensus problem. Namely, assuming each process proposes a value, each non-faulty process has to decide a value such that each decided value was proposed, and no more than k different values are decided. This is a hard problem in the sense that it cannot be solved in asynchronous systems as soon as k or more processes may crash. One way to circumvent this impossibility consists in weakening its termination property, requiring that a process terminates (decides) only if it executes alone during a long enough period. This is the well-known obstruction-freedom progress condition.

    Considering a system of n anonymous asynchronous processes, which communicate through atomic read/write registers only, and where any number of processes may crash, this paper addresses and solves the challenging open problem of designing an obstruction-free k-set agreement algorithm with (n−k+1) atomic registers only. From a shared memory cost point of view, this algorithm is the best algorithm known so far, thereby establishing a new upper bound on the number of registers needed to solve the problem (its gain is (n−k) with respect to the previous upper bound). The algorithm is then extended to address the repeated version of (n,k)-set agreement. As it is optimal in the number of atomic read/write registers, this algorithm closes the gap on previously established lower/upper bounds for both the anonymous and non-anonymous versions of the repeated (n,k)-set agreement problem. Finally, for 1 ≤ x ≤ k < n, a generalization suited to x-obstruction-freedom is also described, which requires (n−k+x) atomic registers only.

    Keywords: Anonymous processes, Asynchronous system, Atomic read/write register, Bounded number of registers, Consensus, Distributed algorithm, Distributed computability, Fault-tolerance, k-Set agreement, Obstruction-freedom, Process crash, Repeated k-set agreement, Upper bound.

  • Read-Write Memory and k-Set Consensus as an Affine Task,
    Eli Gafni, Yuan He, Petr Kuznetsov and Thibault Rieutord
    Fatourou P., Jiménez E., Pedone F. (Ed.): OPODIS 2016, LIPIcs 70, Pages 6:1-6:17, 2017. DOI:10.4230/LIPIcs.OPODIS.2016.6
    [ bibtex | paper ]

    The wait-free read-write memory model has been characterized as an iterated Immediate Snapshot (IS) task. The IS task is affine—it can be defined as a (sub)set of simplices of the standard chromatic subdivision. It is known that the task of Weak Symmetry Breaking (WSB) cannot be represented as an affine task. In this paper, we highlight the phenomenon of a “natural" model that can be captured by an iterated affine task and, thus, by a subset of runs of the iterated immediate snapshot model. We show that the read-write memory model in which, additionally, k-set-consensus objects can be used is, unlike WSB, “natural” by presenting the corresponding simple affine task captured by a subset of 2-round IS runs. Our results imply the first combinatorial characterization of models equipped with abstractions other than read-write memory that applies to generic tasks.

  • Set-Consensus Collections are Decidable,
    Carole Delporte-Gallet, Hugues Fauconnier, Eli Gafni and Petr Kuznetsov
    Fatourou P., Jiménez E., Pedone F. (Ed.): OPODIS 2016, LIPIcs 70, Pages 7:1-7:15, 2017. DOI:10.4230/LIPIcs.OPODIS.2016.7
    [ bibtex | paper ]

    A natural way to measure the power of a distributed-computing model is to characterize the set of tasks that can be solved in it. In general, however, the question of whether a given task can be solved in a given model is undecidable, even if we only consider the wait-free shared-memory model.

    In this paper, we address this question for restricted classes of models and tasks. We show that the question of whether a collection C of (l, j)-set consensus objects, for various l (the number of processes that can invoke the object) and j (the number of distinct outputs the object returns), can be used by n processes to solve wait-free k-set consensus is decidable. Moreover, we provide a simple O(n²) decision algorithm, based on a dynamic programming solution to the Knapsack optimization problem.

    We then present an adaptive wait-free set-consensus algorithm that, for each set of participating processes, achieves the best level of agreement that is possible to achieve using C. Overall, this gives us a complete characterization of a read-write model defined by a collection of set-consensus objects through its set-consensus power .

    We conjecture that any “reasonable” shared-memory can be represented by a collection of set-consensus tasks and, thus, characterized by the set-consensus power.

  • On Composition and Implementation of Sequential Consistency,
    Matthieu Perrin, Matoula Petrolia, Achour Mostéfaoui and Claude Jard
    Gavoille C., Ilcinkas D. (Ed.): DISC 2016, LNCS 9888, Pages 284-297, 2016. DOI:10.1007/978-3-662-53426-7_21
    [ bibtex | paper ]

    To implement a linearizable shared memory in synchronous message-passing systems it is necessary to wait for a time linear to the uncertainty in the latency of the network for both read and write operations. Waiting only for one of them suffices for sequential consistency. This paper extends this result to crash-prone asynchronous systems, proposing a distributed algorithm that builds a sequentially consistent shared snapshot memory on top of an asynchronous message-passing system where less than half of the processes may crash. We prove that waiting is needed only when a process invokes a read/snapshot right after a write.

    We also show that sequential consistency is composable in some cases commonly encountered: (1) objects that would be linearizable if they were implemented on top of a linearizable memory become sequentially consistent when implemented on top of a sequential memory while remaining composable and (2) in round-based algorithms, where each object is only accessed within one round.

    Keywords: Asynchronous message-passing system, Crash-failures, Sequential consistency, Composability, Shared memory, Snapshot.

  • In the Search of Optimal Concurrency,
    Vincent Gramoli, Petr Kuznetsov and Srivatsan Ravi
    Suomela J. (Ed.): SIROCCO 2016, LNCS 9988, Pages 143-158, 2016. DOI:10.1007/978-3-319-48314-6_10
    [ bibtex | paper ]

    It is common practice to use the epithet “highly concurrent” referring to data structures that are supposed to perform well in concurrent environments. But how do we measure the concurrency of a data structure in the first place? In this paper, we propose a way to do this, which allowed us to formalize the notion of a concurrency-optimal implementation.

    The concurrency of a program is defined here as the program’s ability to accept concurrent schedules, i.e., interleavings of steps of its sequential implementation. To make the definition sound, we introduce a novel correctness criterion, LS-linearizability, that, in addition to classical linearizability, requires the interleavings of memory accesses to be locally indistinguishable from sequential executions. An implementation is then concurrency-optimal if it accepts all LS-linearizable schedules. We explore the concurrency properties of search data structures which can be represented in the form of directed acyclic graphs exporting insert, delete and search operations. We prove, for the first time, that pessimistic (e.g., based on conservative locking) and optimistic serializable (e.g., based on serializable transactional memory) implementations of search data-structures are incomparable in terms of concurrency. Thus, neither of these two implementation classes is concurrency-optimal, hence raising the question of the existence of concurrency-optimal programs.

    Keywords: Concurrency, Search data structures, Lower bounds.

  • On the Uncontended Complexity of Anonymous Consensus,
    Claire Capdevielle, Colette Johnen, Petr Kuznetsov and Alessia Milani
    Anceaume E., Cachin C., Potop-Butucaru M. (Ed.): OPODIS 2015, LIPIcs 46, Pages 1-16, 2016. DOI:10.4230/LIPIcs.OPODIS.2015.12
    [ bibtex | paper ]

    Consensus is one of the central distributed abstractions. By enabling a collection of processes to agree on one of the values they propose, consensus can be used to implement any generic replicated service in a consistent and fault-tolerant way. In this paper, we study uncontended complexity of anonymous consensus algorithms, counting the number of memory locations used and the number of memory updates performed in operations that encounter no contention. We assume that contention-free operations on a consensus object perform "fast" reads and writes, and resort to more expensive synchronization primitives, such as CAS, only when contention is detected. We call such concurrent implementations interval-solo-fast and derive one of the first nontrivial tight bounds on space complexity of anonymous interval-solo-fast consensus.

    Keywords: Space and time complexity, Lower bounds, Consensus, Interval contention, Solo-fast.

  • Two-Bit Messages are Sufficient to Implement Atomic Read/Write Registers in Crash-prone Systems,
    Achour Mostéfaoui and Michel Raynal
    ACM PODC’16, Pages 381-470, Chicago, Illinois, USA, July 25-29, 2016. DOI:10.1145/2933057.2933095
    [ bibtex | paper ]

    Atomic registers are certainly the most basic objects of computing science. Their implementation on top of an n-process asynchronous message-passing system has received a lot of attention. It has been shown that t < n/2 (where t is the maximal number of processes that may crash) is a necessary and sufficient requirement to build an atomic register on top of a crash-prone asynchronous message-passing system. Considering such a context, this paper presents an algorithm which implements a single-writer multi-reader atomic register with four message types only, and where no message needs to carry control information in addition to its type. Hence, two bits are sufficient to capture all the control information carried by all the implementation messages. Moreover, the messages of two types need to carry a data value while the messages of the two other types carry no value at all. As far as we know, this algorithm is the first with such a sufficiency property on the size of control information carried by messages. It is also particularly efficient from a time complexity point of view.

    Keywords: Algorithm, Asynchronous message-passing system, Atomic read-write register, Message type, Process crash failure, Reliability, Sequence number, Upper bound.

  • A Look at Basics of Distributed Computing (Invited Talk),
    Michel Raynal
    IEEE 36th International Conference on Distributed Computing Systems (ICDCS), Pages 1-11, Nara, Japan, 2016. DOI:10.1109/ICDCS.2016.109
    [ bibtex | paper ]

    This tutorial presents concepts and basics of distributed computing which are important (at least from the author's point of view!), and should be known and mastered by Master students, researchers, and engineers. Those include: (a) a characterization of distributed computing (which is too much often confused with parallel computing); (b) the notion of a synchronous system and its associated notions of a local algorithm and message adversaries; (c) the notion of an asynchronous shared memory system and its associated notions of universality and progress conditions; and (d) the notion of an asynchronous messagepassing system with its associated broadcast and agreement abstractions, its impossibility results, and approaches to circumvent them. Hence, the tutorial can be seen as a guided tour to key elements that constitute basics of distributed computing.

    Keywords: Distributed computing, Parallel processing, Reliability, Computational modeling, Education, Uncertainty, Program processors.

  • Communication Patterns and Input Patterns in Distributed Computing (Invited Talk),
    Michel Raynal
    C. Scheideler (Ed.): SIROCCO 2015, LNCS 9439, Pages 1–15, 2015. DOI:10.1007/978-3-319-25258-2_1
    [ bibtex | paper ]

    A communication pattern is a pattern on messages exchanged in a distributed computation. An input pattern is a vector made up of the input parameters of the processes involved in a distributed computation. This paper investigates three such patterns. The first two, which are related to the causality relation associated with a distributed execution, are on causal message delivery and the capture of consistent global states, respectively. The last one, which concerns the consensus problem, is on vectors defined by the input values proposed by processes (this is also called the “condition-based” approach).

    An aim of the paper is to promote the concept of pattern in distributed computing, both as a way to provide higher abstraction levels (as it is the case in communication patterns), or a tool to investigate computability or optimality issues (as it is the case with input patterns).

    Keywords: Agreement problem, Byzantine failure, Causality, Causal message order, Checkpointing, Consensus, Crash failure, Error-correcting code, Input vector, Message pattern, Zigzag path.

  • Read/write shared memory abstraction on top of asynchronous Byzantine message-passing systems,
    Damien Imbs, Sergio Rajsbaum, Michel Raynal and Julien Stainer
    Journal of Parallel and Distributed Computing, Vol. 93, Pages 1-9, Elsevier, 2016. DOI:10.1016/j.jpdc.2016.03.012
    [ bibtex | paper ]

    This paper is on the construction and use of a shared memory abstraction on top of an asynchronous message-passing system in which up to t processes may commit Byzantine failures. This abstraction consists of arrays of nn single-writer/multi-reader atomic registers, where n is the number of processes. These registers enable Byzantine tolerance by recording the whole history of values written to each one of them. A distributed algorithm building such a shared memory abstraction is first presented. This algorithm assumes t<n/3, which is shown to be a necessary and sufficient condition for such a construction. Hence, the algorithm is resilient-optimal.

    Then the paper presents distributed objects built on top of this read/write shared memory abstraction, which cope with Byzantine processes. As illustrated by these objects, the proposed shared memory abstraction is motivated by the fact that, for a lot of problems, algorithms are simpler to design and prove correct in a shared memory system than in a message-passing system.

    Keywords: Approximate agreement, Asynchronous message-passing system, Atomic read/write register, Broadcast abstraction, Byzantine process, Distributed computing, Message-passing system, Quorum, Reliable broadcast, Reliable shared memory, Single-writer/multi-reader register, t-Resilience.

  • A necessary condition for Byzantine k-set agreement,
    Zohir Bouzid, Damien Imbs and Michel Raynal
    Information Processing Letters, Vol. 116, No. 12, Pages 757-759, Elsevier, 2016. DOI:10.1016/j.ipl.2016.06.009
    [ bibtex | paper ]

    This short paper presents a necessary condition for Byzantine k-set agreement in (synchronous or asynchronous) message-passing systems and asynchronous shared memory systems where the processes communicate through atomic single-writer multi-reader registers. It gives a proof, which is particularly simple, that k-set agreement cannot be solved t-resiliently in an n-process system when View the MathML source. This bound is tight for the case k=1 (Byzantine consensus) in synchronous message-passing systems.

    Keywords: Algorithms, Byzantine process, k-Set agreement, Message-passing system, Atomic read/write register.

  • Generalized Symmetry Breaking Tasks and Non-Determinism in Concurrent Objects,
    Armando Castaneda, Damien Imbs, Sergio Rajsbaum and Michel Raynal
    SIAM Journal of Computing, Vol. 45, No. 2, Pages 379-414, 2016.
    [ bibtex | paper ]

    Processes in a concurrent system need to coordinate using an underlying shared memory or a message-passing system in order to solve agreement tasks such as, for example, consensus or set agreement. However, coordination is often needed to break the symmetry of processes that are initially in the same state -- for example, to get exclusive access to a shared resource, to get distinct names, or to elect a leader. This paper introduces and studies the family of generalized symmetry breaking (GSB) tasks, which includes election, renaming, and many other symmetry breaking tasks, and studies how nondeterminism properties of objects solving tasks affects the computability power of GSB tasks. The aim is to develop the understanding of symmetry breaking tasks and their relation with agreement tasks and to study nondeterminism properties of objects solving tasks and how these properties affect the computability power of symmetry breaking tasks. Among various results characterizing the family of GSB tasks, it is shown that perfect renaming, i.e., (n,n)-renaming, is universal for all GSB tasks. The paper also shows that there is a large family of GSB tasks, which includes perfect renaming, that is strictly more powerful than (n,n-1)-set agreement. Some of these tasks are equivalent to perfect renaming, while others lie strictly between perfect renaming and (n,n+1)-renaming. Results comparing renaming and set agreement are proved, and the results in this paper complement known results. This paper sheds new light on the relations linking set agreement and symmetry breaking. The proofs are based on combinatorial topology techniques and new ideas about different notions of nondeterminism that can be associated with shared objects.

    Keywords: Agreement, Asynchronous read/write model, Coordination, Concurrent object, Crash failure, Decision task, Distributed computability, Non-determinism, Problem hierarchy, Renaming, Set agreement, Symmetry breaking, Wait-freedom.

  • Signature-Free Asynchronous Binary Byzantine Consensus with t<n/3, O(n²) Messages, and O(1) Expected Time,
    Achour Mostéfaoui, Hamouma Moumen and Michel Raynal
    Journal of the ACM, Vol. 62, No. 4, Article 31, 21 pages, Aug. 2015.
    [ bibtex | paper ]

    This paper is on broadcast and agreement in asynchronous message-passing systems made up of n processes, and where up to t processes may have a Byzantine Behavior. Its first contribution is a powerful, yet simple, all-to-all broadcast communication abstraction suited to binary values. This abstraction, which copes with up to t<n/3 Byzantine processes, allows each process to broadcast a binary value, and obtain a set of values such that (1) no value broadcast only by Byzantine processes can belong to the set of a correct process, and (2) if the set obtained by a correct process contains a single value v, then the set obtained by any correct process contains v.

    The second contribution of the paper is a new round-based asynchronous consensus algorithm that copes with up to t<n/3 Byzantine processes. This algorithm is based on the previous binary broadcast abstraction and a weak common coin. In addition of being signature-free and optimal with respect to the value of t, this consensus algorithm has several noteworthy properties: the expected number of rounds to decide is constant; each round is composed of a constant number of communication steps and involves O(n²) messages; each message is composed of a round number plus a constant number of bits. Moreover, the algorithm tolerates message re-ordering by the adversary (i.e., the Byzantine processes).

    Keywords: Asynchronous message-passing system, Broadcast abstraction, Byzantine process, Common coin, Consensus, Distributed algorithm, Optimal resilience, Randomized algorithm, Signature-free algorithm, Simplicity.

  • Distributed Universality,
    Michel Raynal, Julien Stainer and Gadi Taubenfeld
    Algorithmica, Vol. 76, No. 2, Pages 502-535, Springer, Aug. 2015. DOI:10.1007/s00453-015-0053-3
    [ bibtex | paper ]

    A notion of a universal construction suited to distributed computing has been introduced by Herlihy in his celebrated paper “Wait-free synchronization ”(ACMTrans Program Lang Syst 13(1):124–149,1991). A universal construction is an algorithm that can be used to wait-free implement any object defined by a sequential specification. Herlihy’s paper shows that the basic system model, which supports only atomic read/write registers, has to be enriched with consensus objects to allow the design of universal constructions. The generalized notion of a k-universal construction has been recently introduced by Gafni and Guerraoui (Proceedings of 22nd international conference on concurrency theory (CONCUR’11), Springer LNCS 6901,pp 17–27,2011).

    A k-universal construction is an algorithm that can be used to simultaneously implement k objects (instead of just one object), with the guarantee that at least one of the k constructed objects progresses forever. While Herlihy’s universal construction relies on atomic registers and consensus objects, a k-universal construction relies on atomic registers and k-simultaneous consensus objects (which are wait-free equivalent to k-set agreement objects in the read/write system model). This paper significantly extends the universality results introduced by Herlihy and Gafni–Guerraoui.

    In particular, we present a k-universal construction which satisfies the following five desired properties, which are not satisfied by the previous k-universal construction: (1) among the k objects that are constructed, at least l objects (and not just one) are guaranteed to progress forever; (2) the progress condition for processes is wait-freedom, which means that each correct process executes an infinite number of operations on each object that progresses forever; (3) if any of the k constructed objects stops progressing, all its copies (one at each process) stop in the same state; (4) the proposed construction is contention-aware, in the sense that it uses only read/write registers in the absence of contention; and (5) it is generous with respect to the obstruction-freedom progress condition, which means that each process is able to complete any one of its pending operations on the k objects if all the other processes hold still long enough. The proposed construction, which is based on new design principles, is called a (k,l)-universal construction. It uses a natural extension of k-simultaneous consensus objects, called (k,l)-simultaneous consensus objects ((k,l)-SC). Together with atomic registers, (k,l)-SC objects are shown to be necessary and sufficient for building a (k,l)-universal construction, and, in that sense, (k,l)-SC objects are (k,l)-universal.

    Keywords: Keywords Asynchronous read/write system, Universal construction, Consensus, Distributed computability, k-Set agreement, k-Simultaneous consensus, Wait-freedom, Obstruction-freedom, Contention-awareness, Crash failures, State machine replication.

  • Signature-Free Asynchronous Byzantine Systems: From Multivalued to Binary Consensus with t<n/3, O(n²) Messages, and Constant Time and no Signature,
    Achour Mostéfaoui and Michel Raynal
    Acta Informatica, Vol. 54, No. 5, Pages 501-520, Springer, 2017. DOI:10.1007/s00236-016-0269-y
    [ bibtex | paper ]

    This paper presents a new algorithm that reduces multivalued consensus to binary consensus in an asynchronous message-passing system made up of n processes where up to t may commit Byzantine failures. This algorithm has the following noteworthy properties: it assumes t<n/3 (and is consequently optimal from a resilience point of view), uses O(n^2) messages, has a constant time complexity, and uses neither signatures nor additional computational power (such as random numbers, failure detectors, additional scheduling assumption, or additional synchrony assumption). The design of this reduction algorithm relies on two new all-to-all communication abstractions. The first one allows the non-faulty processes to reduce the number of proposed values to c, where c is a small constant. The second communication abstraction allows each non-faulty process to compute a set of (proposed) values satisfying the following property: if the set of a non-faulty process is a singleton containing value v, the set of any non-faulty process contains v. Both communication abstractions have an O(n^2) message complexity and a constant time complexity. The reduction of multivalued Byzantine consensus to binary Byzantine consensus is then a simple sequential use of these communication abstractions. To the best of our knowledge, this is the first asynchronous message-passing algorithm that reduces multivalued consensus to binary consensus with O(n^2) messages and constant time complexity (measured with the longest causal chain of messages) in the presence of up to t<n/3 Byzantine processes, and without using cryptography techniques. Moreover, this reduction algorithm uses a single instance of the underlying binary consensus, and tolerates message re-ordering by Byzantine processes.

    An extended abstract of this paper appeared in SIROCCO 2015.

    Keywords: Asynchronous message-passing system, Broadcast abstraction, Byzantine process, Consensus, Distributed algorithm, Intrusion tolerance, Multivalued consensus, Optimal resilience, Randomized binary consensus, Signature-free algorithm.

  • Intrusion-tolerant broadcast and agreement abstractions in the presence of Byzantine processes,
    Achour Mostéfaoui and Michel Raynal
    IEEE Transactions on Parallel and Distributed Systems, Vol. 27, No. 4, Pages 1085-1098, 2016. DOI:10.1109/TPDS.2015.2427797
    [ bibtex | paper ]

    A process commits a Byzantine failure when its behavior does not comply with the algorithm it is assumed to execute. Considering asynchronous message-passing systems, this paper presents distributed abstractions, and associated algorithms, that allow non-faulty processes to correctly cooperate, despite the uncertainty created by the net effect of asynchrony and Byzantine failures. These abstractions are broadcast abstractions (namely, no-duplicity broadcast, reliable broadcast, and validated broadcast), and agreement abstraction (namely, consensus). While no-duplicity broadcast and reliable broadcast are well-known one-to-all communication abstractions, validated broadcast is a new all-to-all communication abstraction designed to address agreement problems. After having introduced these abstractions, the paper presents an algorithm implementing validated broadcast on top of reliable broadcast. Then the paper presents two consensus algorithms, which are reductions of multivalued consensus to binary consensus. The first one is a generic algorithm, that can be instantiated with unreliable broadcast or no-duplicity broadcast, while the second is a consensus algorithm based on validated broadcast. Finally, a third algorithm is presented that solves the binary consensus problem. This algorithm is a randomized algorithm based on validated broadcast and a common coin. The presentation of all the abstractions and their algorithms is done incrementally.

    Keywords: Signature-free algorithm, Abstraction level, Agreement, Asynchronous message-passing system, Broadcast abstraction, Byzantine process, Common coin, Consensus, Distributed algorithm, Faulttolerance, Intrusion-tolerance, Message validation, Reliable broadcast.

  • Revisiting Immediate Snapshot,
    Carole Delporte-Gallet, Hugues Fauconnier, Sergio Rajsbaum and Michel Raynal
    Suomela, Jukka (Ed.) : 23rd International Colloquium, SIROCCO 2016, LNCS 9988, Helsinki, Finland, July 19-21, 2016.
    [ bibtex | paper ]

    Immediate snapshot is the basic communication object on which relies the read/write distributed computing model made up of n crash-prone asynchronous processes, called iterated distributed model. Each iteration step (usually called a round) uses a new immediate snapshot object, which allows the processes to communicate and cooperate. More precisely, the x-th immediate snapshot object can be used by a process only when it executes the x-th round. An immediate snapshot object can be implemented by an (n−1)-resilient algorithm, i.e. an algorithm that tolerates up to (n−1) process crashes (also called wait-free algorithm). Considering a t-crash system model (i.e. a model in which up to t processes are allowed to crash), this paper is on the construction of an extension of immediate snapshot objects to t-resiliency. In the t-crash system model, at each round each process may be ensured to get values from at least n−t processes, and t-immediate snapshot has the properties of classical immediate snapshot (1-immediate snapshot) but ensures that each process will get values form at least n−t processes. Its main result is the following. While there is a (deterministic) t-resilient read/write-based algorithm implementing t-immediate snapshot in a t-crash system when t=n−1, there is no t-resilient algorithm in a t-crash model when t∈[1..(n−2)]. This means that the notion of t-resilience is inoperative when one has to implement immediate snapshot for these values of t: the model assumption “at most t<n−1 processes may crash” does not provide us with additional computational power allowing for the design of genuine t-resilient algorithms (genuine meaning that such a t-resilient algorithm would work in the t-crash model, but not in the (t+1)-crash model). To show these results, the paper relies on well-known distributed computing agreement problems such as consensus and k-set agreement.

    Keywords: Asynchronous system, Atomic read/write register, Consensus, Distributed computability, Immediate snapshot, Impossibility, Iterated model, k-Set Agreement, Linearizability, Process crash failure, Snapshot object, t-Resilience, Wait-freedom.

  • Signature-Free Asynchronous Byzantine Systems: From Multivalued to Binary Consensus with t<n/3, O(n²) Messages, and Constant Time and no Signature,
    Achour Mostéfaoui and Michel Raynal
    C. Scheideler (Ed.): SIROCCO 2015, LNCS 9439, Pages 194–208, 2015.
    [ bibtex | paper ]

    This paper presents a new algorithm that reduces multivalued consensus to binary consensus in an asynchronous message-passing system made up of n processes where up to t may commit Byzantine failures. This algorithm has the following noteworthy properties: it assumes t<n/3 (and is consequently optimal from a resilience point of view), uses O(n²) messages, has a constant time complexity, and does not use signatures. The design of this reduction algorithm relies on two new all-to-all communication abstractions. The first one allows the non-faulty processes to reduce the number of proposed values to c, where c is a small constant. The second communication abstraction allows each non-faulty process to compute a set of (proposed) values such that, if the set of a non-faulty process contains a single value, then this value belongs to the set of any non-faulty process.

    Both communication abstractions have an O(n²) message complexity and a constant time complexity.The reduction of multivalued Byzantine consensus to binary Byzantine consensus is then a simple sequential use of these communication abstractions. To the best of our knowledge, this is the first asynchronous message-passing algorithm that reduces multivalued consensus to binary consensus with O(n²) messages and constant time complexity (measured with the longest causal chain of messages) in the presence of up to t<n/3 Byzantine processes, and without using cryptography techniques. Moreover, this reduction algorithm uses a single instance of the underlying binary consensus, and tolerates message re-ordering by Byzantine processes.

    Keywords: Asynchronous message-passing system, Broadcast abstraction, Byzantine process, Consensus, Distributed algorithm, Intrusion tolerance, Multivalued consensus, Optimal resilience, Randomized binary consensus, Signature-free algorithm.

  • Minimal Synchrony for Byzantine Consensus,
    Zohir Bouzid, Achour Mostéfaoui and Michel Raynal
    ACM PODC’15, Pages 461-470, July 21–23, 2015, Donostia-San Sebastián, Spain.
    [ bibtex | paper ]

    Solving the consensus problem requires in one way or another that the underlying system satisfies some synchrony assumption. Considering an asynchronous message-passing system of n processes where (a) up to t <n/3 may commit Byzantine failures, and (b) each pair of processes is connected by two uni-directional channels (with possibly different timing properties), this paper investigates the synchrony assumption required to solve consensus, and presents a signature-free consensus algorithm whose synchrony requirement is the existence of a process that is an eventual (t+1)-bisource.

    Such a process p is a correct process that eventually has (a) timely input channels from t correct processes and (b) timely output channels to t correct processes (these input and output channels can connect p to different subsets of processes). As this syn-chrony condition was shown to be necessary and sufficient in the stronger asynchronous system model (a) enriched with message au-thentication, and (b) where the channels are bidirectional and have the same timing properties in both directions, it follows that it is also necessary and sufficient in the weaker system model considered in the paper. In addition to the fact that it closes a long-lasting problem related to Byzantine agreement, a noteworthy feature of the proposed algorithm lies in its design simplicity, which is a first-class property.

    Keywords: Adopt-commit, Asynchronous message-passing, Byzantine process, Consensus, Distributed algorithm, Eventual timely channel, Feasibility condition, Lower bound, Optimal resilience, Reliable broadcast, Signature-free algorithm, Synchrony assumption.

  • Stabilizing Server-Based Storage in Byzantine Asynchronous Message-Passing Systems,
    Silvia Bonomi, Shlomi Dolev, Maria Potop-Butucaru and Michel Raynal
    ACM PODC’15, Pages 472-479, July 21–23, 2015, Donostia-San Sebastián, Spain.
    [ bibtex | paper ]

    A stabilizing Byzantine single-writer single-reader (SWSR) regular register, which stabilizes after the first invoked write operation, is first presented. Then, new/old ordering inversions are eliminated by the use of a (bounded) sequence number for writes, obtaining a practically stabilizing SWSR atomic register. A practically stabilizing Byzantine single-writer multi-reader (SWMR) atomic register is then obtained by using several copies of SWSR atomic registers. Finally, bounded time-stamps, with a time-stamp per writer, together with SWMR atomic registers, are used to construct a practically stabilizing Byzantine multi-writer multi-reader (MWMR) atomic register.

    In a system of n servers implementing an atomic register, and in addition to transient failures, the constructions tolerate t<n/8 Byzantine servers if communication is asynchronous, and t<n/3 Byzantine servers if it is synchronous. The noteworthy feature of the proposed algorithms is that (to our knowledge) these are the first that build an atomic read/write storage on top of asynchronous servers prone to transient failures, and where up to t of them can be Byzantine.

    Keywords: Asynchronous message-passing system, Atomic read/write register, Byzantine server, Clients/servers architecture, Distributed algorithm, Fault-tolerance, Read/write register, Regular register, Self-stabilization, Transient failures.

  • Specifying Concurrent Problems: Beyond Linearizability and up to Tasks,
    Armando Castaneda, Sergio Rajsbaum and Michel Raynal
    29th International Symposium on Distributed Computing, Springer LNCS 9363, Pages 420-435, 2015.
    [ bibtex | paper ]

    Tasks and objects are two predominant ways of specifying distributed problems. A task specifies for each set of processes (which may run concurrently) the valid outputs of the processes. An object specifies the outputs the object may produce when it is accessed sequentially. Each one requires its own implementation notion, to tell when an execution satisfies the specification. For objects linearizability is commonly used, while for tasks implementation notions are less explored.

    Sequential specifications are very convenient, especially important is the locality property of linearizability, which states that linearizable objects compose for free into a linearizable object. However, most well-known tasks have no sequential specification. Also, tasks have no clear locality property.

    The paper introduces the notion of interval-sequential object. The corresponding implementation notion of interval-linearizability generalizes linearizability. Interval-linearizability allows to specify any task. However, there are sequential one-shot objects that cannot be expressed as tasks, under the simplest interpretation of a task. The paper also shows that a natural extension of the notion of a task is expressive enough to specify any interval-sequential object.

    Keywords: Concurrent object, Task, Linearizability, Sequential specification.

  • Modular Randomized Byzantine k-Set Agreement in Asynchronous Message-passing Systems,
    Achour Mostéfaoui, Hamouma Moumen and Michel Raynal
    ACM ICDCN'16, Singapore, January 2016.
    [ bibtex | paper ]

    k-Set agreement is a central problem of fault-tolerant distributed computing. Considering a set of n processes, where up to t may commit failures, let us assume that each process proposes a value. The problem consists in defining an algorithm such that each non-faulty process decides a value, at most k different values are decided, and the decided values satisfy some context-depending validity condition. Synchronous message-passing algorithms solving k-set agreement have been proposed for different failure models (mainly process crashes, and process Byzantine failures). Differently, k-set agreement cannot be solved in failure-prone asynchronous message-passing systems when t≥k. To circumvent this impossibility an asynchronous system must be enriched with additional computational power.

    Assuming t≥k, this paper presents a distributed algorithm that solves k-set agreement in an asynchronous message-passing system where up to t processes may commit Byzantine failures. To that end, each process is enriched with randomization power. While randomized k-set agreement algorithms exist for the asynchronous process crash failure model where t≥k, to our knowledge the proposed algorithm is the first that solves k-set agreement in the presence of up to t≥k Byzantine processes. Interestingly, this algorithm is signature-free, and ensures that no value proposed only by Byzantine processes can be decided by a non-faulty process. Its design is based on a modular construction which rests on a “no-duplicity” one-to-all broadcast abstraction, and two all-to-all communication abstractions.

    Keywords: Asynchronous system, Broadcast abstraction, Byzantine process, Coin, Distributed algorithm, k-Set agreement, Message-passing system, Randomized algorithm, Signature-free algorithm.

  • Grasping the Gap between Blocking and Non-Blocking Transactional Memories,
    Petr Kuznetsov and Srivatsan Ravi
    29th International Symposium on Distributed Computing, Springer LNCS 9363, Pages 232-247, 2015.
    [ bibtex | paper ]

    Transactional memory (TM) is an inherently optimistic synchronization abstraction: it allows concurrent processes to execute sequences of shared-data accesses (transactions) speculatively, with an option of aborting them in the future. Early TM designs avoided using locks and relied on non-blocking synchronization to ensure obstruction-freedom: a transaction that encounters no step contention is not allowed to abort. However, it was later observed that obstruction-free TMs perform poorly and, as a result, state-of-the-art TM implementations are nowadays blocking, allowing aborts because of data conicts rather than step contention.

    In this paper, we explain this shift in the TM practice theoretically, via complexity bounds. We prove a few important lower bounds on obstruction-free TMs. Then we present a lock-based TM implementation that beats all of these lower bounds. In sum, our results exhibit a considerable complexity gap between non-blocking and blocking TM implementations.

  • Inherent Limitations of Hybrid Transactional Memory,
    Dan Alistarh, Justin Kopinsky, Petr Kuznetsov, Srivatsan Ravi and Nir Shavit
    29th International Symposium on Distributed Computing, Springer LNCS 9363, Pages 185-199, 2015.
    [ bibtex | paper ]

    Several Hybrid Transactional Memory (HyTM) schemes have recently been proposed to complement the fast, but best-effort nature of Hardware Transactional Memory (HTM) with a slow, reliable software backup. However, the costs of providing concurrency between hardware and software transactions in HyTM are still not well understood.

    In this paper, we propose a general model for HyTM implementations, which captures the ability of hardware transactions to buffer memory accesses. The model allows us to formally quantify and analyze the amount of overhead (instrumentation) caused by the potential presence of software transactions. We prove that (1) it is impossible to build a strictly serializable HyTM implementation that has both uninstrumented reads and writes, even for very weak progress guarantees, and (2) the instrumentation cost incurred by a hardware transaction in any progressive opaque HyTM may get linear in the transaction's data set. We further describe two implementations that, for two different progress conditions, exhibit optimal instrumentation costs. In sum, this paper captures for the first time an inherent trade-off between the degree of hardware-software TM concurrency and the amount of incurred instrumentation overhead.

  • Brief Announcement: On the Uncontended Complexity of Anonymous Consensus,
    Claire Capdevielle, Colette Johnen, Petr Kuznetsov and Alessia Milani
    29th International Symposium on Distributed Computing, Springer LNCS 9363, Pages 667-668, 2015.
    [ bibtex | paper ]
  • Brief Announcement: A Concurrency-Optimal List-Based Set,
    Vincent Gramoli, Petr Kuznetsov, Srivatsan Ravi and Di Shang
    29th International Symposium on Distributed Computing, Springer LNCS 9363, Pages 659-660, 2015.
    [ bibtex | paper ]

    Multicore applications require highly concurrent data structures. Yet, the very notion of concurrency is vaguely defined, to say the least. What is meant by a “highly concurrent” data structure implementing a given high-level object type? Generally speaking, one could compare the concurrency of algorithms by running a game where an adversary decides on the schedules of shared memory accesses from different processes. At the end of the game, the more schedules the algorithm would accept without hampering high-level correctness, the more concurrent it would be. The algorithm that accepts all correct schedules would then be considered concurrency-optimal.

  • The Weakest Failure Detector for Eventual Consistency,
    Swan Dubois, Rachid Guerraoui, Petr Kuznetsov, Franck Petit and Pierre Sens
    ACM PODC’15, Pages 375-384, July 21–23, 2015, Donostia-San Sebastián, Spain.
    [ bibtex | paper ]

    In its classical form, a consistent replicated service requires all replicas to witness the same evolution of the service state. Assuming a message-passing environment with a majority of correct processes, the necessary and sufficient information about failures for implementing a general state machine replication scheme ensuring consistency is captured by the Ω failure detector.

    This paper shows that in such a message-passing environment, Ω is also the weakest failure detector to implement an eventually consistent replicated service, where replicas are expected to agree on the evolution of the service state only after some (a priori unknown) time.

    In fact, we show that Ω is the weakest to implement eventual consistency in any message-passing environment, i.e., under any assumption on when and where failures might occur. Ensuring (strong) consistency in any environment requires, in addition to Ω, the quorum failure detector ∑. Our paper thus captures, for the first time, an exact computational difference between building a replicated state machine that ensures consistency and one that only ensures eventual consistency.

  • On the Space Complexity of Set Agreement,
    Carole Delporte-Gallet, Hugues Fauconnier and Petr Kuznetsov and Eric Ruppert
    ACM PODC’15, Pages 271-280, July 21–23, 2015, Donostia-San Sebastián, Spain.
    [ bibtex | paper ]

    The k-set agreement problem is a generalization of the classical consensus problem in which processes are permitted to output up to k different input values. In a system of n processes, an m-obstruction-free solution to the problem requires termination only in executions where the number of processes taking steps is eventually bounded by m. This family of progress conditions generalizes wait-freedom (m = n) and obstruction-freedom (m = 1). In this paper, we prove upper and lower bounds on the number of registers required to solve m-obstruction-free k-set agreement, considering both one-shot and repeated formulations. In particular, we show that repeated k set agreement can be solved using n + 2 m-k registers and establish a nearly matching lower bound of n + 2 m-k.

  • Progressive Transactional Memory in Time and Space,
    Petr Kuznetsov and Srivatsan Ravi
    13th International Conference on Parallel Computing Technologies, Pages 410-425, August 31 - September 4, 2015, Petrozavodsk, Russia.
    [ bibtex | paper ]

    Transactional memory (TM) allows concurrent processes to organize sequences of operations on shared data items into atomic transactions. A transaction may commit, in which case it appears to have executed sequentially or it may abort, in which case no data item is updated.

    The TM programming paradigm emerged as an alternative to conventional fine-grained locking techniques, offering ease of programming and compositionality. Though typically themselves implemented using locks, TMs hide the inherent issues of lock-based synchronization behind a nice transactional programming interface.

    In this paper, we explore inherent time and space complexity of lock-based TMs, with a focus of the most popular class of progressive lock-based TMs. We derive that a progressive TM might enforce a read-only transaction to perform a quadratic (in the number of the data items it reads) number of steps and access a linear number of distinct memory locations, closing the question of inherent cost of read validation in TMs. We then show that the total number of remote memory references (RMRs) that take place in an execution of a progressive TM in which n concurrent processes perform transactions on a single data item might reach Ω(nlog(n)), which appears to be the first RMR complexity lower bound for transactional memory.

    Keywords: Transactional memory, Mutual exclusion, Step complexity.

  • Topology of scrambled simplices,
    Dmitry N. Kozlov
    arXiv preprint arXiv:1609.00505, 2016.
    [ bibtex | paper ]

    In this paper we define a family of topological spaces, which vastly generalizes the higher-dimensional Dunce hats. Our definition is purely combinatorial, and is phrased in terms of identifications of boundary simplices of an n-simplex. By virtue of construction, the obtained spaces may be indexed by words, and they automatically carry the structure of a Δ-complex.

    As our main result, we completely determine the homotopy type of these spaces. In fact, somewhat surprisingly, we are able to prove that each of them is either contractible or homotopy equivalent to an odd-dimensional sphere. We develop the language to determine the homotopy type directly from the combinatorics of the indexing word.

    As added benefit of our investigation, we are able to emulate the Dunce hat phenomenon, and to obtain a large family of simplicial complexes, which are contractible, but not collapsible.

  • All binomial identities are orderable,
    Dmitry N. Kozlov
    European Journal of Combinatorics, Vol. 61, Pages 276-281, 2017. DOI:10.1016/j.ejc.2016.11.008
    [ bibtex | paper ]

    The main result of this paper is to show that all binomial identities are orderable. This is a natural statement in the combinatorial theory of finite sets, which can also be applied in distributed computing to derive new strong bounds on the round complexity of the weak symmetry breaking task.

    Furthermore, we introduce the notion of a fundamental binomial identity and find an infinite family of values, other than the prime powers, for which no fundamental binomial identity can exist.

  • Structure theory of flip graphs with applications to Weak Symmetry Breaking,
    Dmitry N. Kozlov
    arXiv preprint arXiv:1511.00457, 2015.
    [ bibtex | paper ]

    This paper is devoted to advancing the theoretical understanding of the iterated immediate snapshot (IIS) complexity of the Weak Symmetry Breaking task (WSB). Our rather unexpected main theorem states that there exist infinitely many values of n, such that WSB for n~processes is solvable by a~certain explicitly constructed 3-round IIS protocol. In particular, the minimal number of rounds, which an~IIS protocol needs in order to solve the WSB task, does not go to infinity, when the number of processes goes to infinity. Our methods can also be used to generate such values of~n. We phrase our proofs in combinatorial language, while avoiding using topology. To this end, we study a~certain class of graphs, which we call flip graphs. These graphs encode adjacency structure in certain subcomplexes of iterated standard chromatic subdivisions of a~simplex. While keeping the geometric background in mind for an additional intuition, we develop the structure theory of matchings in flip graphs in a~purely combinatorial way. Our bound for the IIS complexity is then a~corollary of this general theory.

    As an afterthought of our result, we suggest to change the overall paradigm. Specifically, we think, that the bounds on the IIS complexity of solving WSB for n processes should be formulated in terms of the size of the solutions of the associated Diophantine equation, rather than in terms of the value n itself.

  • Combinatorial topology of the standard chromatic subdivision and Weak Symmetry Breaking for 6 processes,
    Dmitry N. Kozlov
    arXiv preprint arXiv:1506.03944, 2015.
    [ bibtex | paper ]

    In this paper we study a family of discrete configuration spaces, the so-called protocol complexes, which are of utmost importance in theoretical distributed computing. Specifically, we consider questions of the existance of compliant binary labelings on the vertices of iterated standard chromatic subdivisions of an n-simplex. The existance of such labelings is equivalent to the existance of distributed protocols solving Weak Symmetry Breaking task in the standard computational model.

    As a part of our formal model, we introduce function sb(n), defined for natural numbers n, called the symmetry breaking function. From the geometric point of view sb(n) denotes the minimal number of iterations of the standard chromatic subdivision of an (n-1)-simplex, which is needed for the compliant binary labeling to exist. From the point of distributed computing, the function sb(n) measures the minimal number of rounds in a protocol solving the Weak Symmetry Breaking task.

    In addition to the development of combinatorial topology, which is applicable in a broader context, our main contribution is the proof of new bounds for the function sb(n). Accordingly, the bulk of the paper is taken up by in-depth analysis of the structure of adjacency graph on the set of n-simplices in iterated standard chromatic subdivision of an n-simplex. On the algorithmic side, we provide the first distributed protocol solving Weak Symmetry Breaking task in the layered immediate snapshot computational model for some number of processes.

    It is well known, that the smallest number of processes for which Weak Symmetry Breaking task is solvable is 6. Based on our analysis, we are able to find a very fast explicit protocol, solving the Weak Symmetry Breaking for 6 processes using only 3 rounds. Furthermore, we show that no protocol can solve Weak Symmetry Breaking in fewer than 2 rounds.

  • Topology of the immediate snapshot complexes,
    Dmitry N. Kozlov
    Topology and its Applications, Vol. 178, Pages 160-184, 2014. DOI:10.1016/j.topol.2014.08.032
    [ bibtex | paper ]

    The immediate snapshot complexes were introduced as combinatorial models for the protocol complexes in the context of theoretical distributed computing. In the previous work we have developed a formal language of witness structures in order to define and to analyze these complexes.

    In this paper, we study topology of immediate snapshot complexes. It was known that these complexes are always pure and that they are pseudomanifolds. Here we prove two further independent topological properties. First, we show that immediate snapshot complexes are collapsible. Second, we show that these complexes are homeomorphic to closed balls. Specifically, given any immediate snapshot complex $P(\tr)$, we show that there exists a homeomorphism $\varphi:\da^{|\supp\tr|-1}\ra P(\tr)$, such that φ(σ) is a subcomplex of $P(\tr)$, whenever σ is a simplex in the simplicial complex $\da^{|\supp\tr|-1}$.

    Keywords: Collapses, Distributed computing, Combinatorial algebraic topology, Immediate snapshot, Protocol complexes.

  • Witness structures and immediate snapshot complexes,
    Dmitry N. Kozlov
    arXiv preprint arXiv:1402.4707, 2014.
    [ bibtex | paper ]

    In this paper we consider a family of abstract simplicial complexes which we call immediate snapshot complexes. Their definition is motivated by theoretical distributed computing. Specifically, these complexes appear as protocol complexes in the general immediate snapshot execution model.

    In order to define and to analyze the immediate snapshot complexes we use the novel language of witness structures. We develop the rigorous mathematical theory of witness structures and use it to prove several combinatorial as well as topological properties of the immediate snapshot complexes. In particular, we prove that these complexes are simplicially homeomorphic to simplices.

  • Weak symmetry breaking and abstract simplex paths,
    Dmitry N. Kozlov
    Mathematical Structures in Computer Science, vol. 25, no 06, Pages 1432-1462, 2015.
    [ bibtex | paper ]

    Motivated by questions in theoretical distributed computing, we develop the combinatorial theory of abstract simplex path subdivisions. Our main application is a short and structural proof of the theorem of Castaneda and Rajsbaum. This theorem in turn implies the solvability of the weak symmetry breaking task in the immediate snapshot wait-free model in the case when the number of processes is not a power of a prime number.

  • Topology of the view complex,
    Dmitry N. Kozlov
    Homology, Homotopy and Applications, vol. 17, no 01, 2015.
    [ bibtex | paper ]

    In this paper we consider a family of simplicial complexes, which we call the view complexes. Our choice of objects of study is motivated by theoretical distributed computing, since the view complex is a key simplicial construction used for protocol complexes in the snapshot computational model. We show that the view complex $\view$ can be collapsed to the well-known complex χ(Δ^n), called standard chromatic subdivision of a simplex, and that χ(Δ^n) is itself collapsible. Furthermore, we show that the collapses can be performed simultaneously in entire orbits of the natural symmetric group action. Our results yield a purely combinatorial and constructive understanding of the topology of view complexes, at the same time as they enhance our knowledge about the standard chromatic subdivision of a simplex.

  • Solving k-Set Agreement Using Failure Detectors in Unknown Dynamic Networks,
    Denis Jeanneau, Thibault Rieutord, Luciana Arantes, and Pierre Sens
    IEEE Transactions on Parallel and Distributed Systems, 2016.
    [ bibtex | paper ]

    The failure detector abstraction has been used to solve agreement problems in asynchronous systems prone to crash failures, but so far it has mostly been used in static and complete networks. This paper aims to adapt existing failure detectors in order to solve agreement problems in unknown, dynamic systems. We are specifically interested in the k-set agreement problem. The problem of k-set agreement is a generalization of consensus where processes can decide up to k different values. Although some solutions to this problem have been proposed in dynamic networks, they rely on communication synchrony or make strong assumptions on the number of process failures.

    In this paper we consider unknown dynamic systems modeled using the formalism of Time-Varying Graphs, and extend the definition of the existing ΠΣx,y failure detector to obtain the ΠΣ⊥,x,y failure detector, which is sufficient to solve k-set agreement in our model. We then provide an implementation of this new failure detector using connectivity and message pattern assumptions. Finally, we present an algorithm using ΠΣ⊥⊥,x,y to solve k-set agreement.

    Keywords: Distributed systems, Dynamic networks, Failure detectors, k-Set agreement.