tags : Distributed Systems, Task Queue, Message Queue, Data Replication

Guarantees

See The Dodgy State of Delivery Guarantees - YouTube

at-most-once

Send request, don’t retry, update may not happen

at-least-once

Retry request until acknowledged, may repeat update

exactly-once

  • exactly-once = at-least-once + Idempotence
  • exactly-once = (retry + idempotence) or deduplication
  • It’s similar to TCP/IP. You implement a control protocol on top of an unreliable underlying protocol.
  • “You are talking about distributed systems, nobody expects to read “exactly once” delivery.”
  • “In my experience, the purist version of “exactly-once” exists as a vague, wishy-washy mental model in the brains of developers who have never thought hard about this stuff”
  • “Within the context of a distributed system, you cannot have exactly-once message delivery.”
  • How it’s achieved is via sort of faking it
    • idempotency: Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects
    • de-duplication: Or we remove the need for idempotency through deduplication of messages in the queue.
    • transactional processing: If processing of the message and then updating its status back to the queue is done in the same transaction(can be rolled back) that too can be a way I think. The transaction can be distributed or not.
  • Even if something offers this gurantee, it’s better to write code in an idempotent manner

De-Duplication

  • SQS FIFO queues have a 5-minute deduplication period. AWS is hashing the message body, saving the hash (for 5 minutes) and then each inbound message is hashed and a lookup is done on existing hashes to see if this one is a ‘duplicate’. This is how AWS SQS is probably able to offer exactly-once gurantee.
    • For such de-duplication service to be actually viable we have to have some kind of availability guarantee. With some kind of quorum this is possible. (See Consensus Protocols)

Challenges

Ordering

Idempotency

Data Delivery for Message/Task queues

Some comments from one of my queue implementations in pg that i did for a client

This was inspired by ideas of river queue but we did not have the requirement of transactional enqueueing as this was part of a data pipeline and not userfacing stuff.

Queuing behavior

  • queueing is not idempotent, you’re allowed to queue the same item again and again and it’ll keep creating duplicates which will mutate the state of the queue.
  • We could however have de-duplication by adding some identity hash etc but guess not worth the effort atm
  • function which will push items into is responsible for checking if the items need to be queued or not. Once item is in the queue it’ll be processed.

Processing guarantees

  • processing of items from the queue is guranteed at-least-once execution,
  • there is possibility of re-runs. processing of message should be idempotent.
  • When updating the state/staus of the message, it can be done in either:
    • A: different db transactions(sometimes this is unavoidable)
    • B: same db transactions(ensures that re-runs always start in fresh state)
    • eitherway, this should not be a problem if the entire nature of processing of a message is idempotent