tags : Human Computer Interaction ( HCI ), peer-to-peer, Alternative Internet, WebAssembly, Synchronization
I like how Kyle Mathews describes local first software, most of this doc is extracting things out of his blogpost:
“local-first” as shifting reads and writes to an embedded database in each client via“sync engines” that facilitate data exchange between clients and servers.
Useful for applications that demand stuff to be either of but not limited to real-time, collaborative(multiplayer), or offline.
Did people have to be online to collaborate online? Or could they work offline and collaborate peer-to-peer?
- nice ecosystem review: GitHub - arn4v/offline-first: A list of projects in the offline-first storage, sync & realtime collaboration/multiplayer space.
Related/Main Ideas
Sync Engines
See Synchronization
- Robust database-grade syncing technology to ensure that data is consistent and up-to-date.
- To fully replace client-server APIs, sync engines need
- Robust support for fine-grained access control
- Complex write validation
Issues
- Clients tend to have unrestricted write access and updates are immediately synced to other clients. While this is generally fine for text collaboration or multiplayer drawing, this wouldn’t work for a typical ecommerce or SaaS application.
- Sync engines can drive consistency within a system but real-world systems also need an authoritative server which can enforce consistency within external constraints and systems.
CRDT based
- See crdt
Distributed state machine / Replicated state machine (RSM) / State machine replication
- State machine replication - Wikipedia
- See Signals and Threads | State Machine Replication, and Why You Should Care
- This is a variant of Paxos (?)
- Towards “Handle writes that need an authoritative server”
- By emulating API request/response patterns through: A distributed state machine running on a replicated object.
- i.e we write interactions w external services in a way so that requests/responses have the same multiplayer, offline, real-time sync properties as the rest of the app.
- This synchronization can be at the application, network or other levels of the stack
2 Primary idea that makes synchronization easy
- A reliable, ordered message stream: Every machine in the system sees the messages in the same order.
- A fully-deterministic compute environment: Given the same inputs always result in the same outputs
Partially replication
- Query-based sync to partially replicate
Basics
- Instead of always assuming that the server is the authortative source, we assumed that the user’s local device is the authoritative source of information
- The default consistency mode is eventual consistency
- This means that state and compute can naturally exist at the edge
- Only brought to the “center” when there is a need for strong consistency
Challenges
From Why SQLite? Why Now? 🐇 - Tantamanlands
- How much data can you store locally?
- How do you signal to the user that their local set of data could be incomplete from the perspective of other peers?
- How do we bless certain peers (or servers) as authoritative sources of certain sets of information?
- What CRDTs are right for which use cases?
Approaches
Replicated protocols
- This is what Replicache currently does, client JS library along with a replication protocol.
Projects
- Services
- Replicache (Replicated protocol): sync engine is “some assembly required”
Replicated Data Structures
- Building block Data Structures
- Provide APIs similar to native Javascript maps and arrays
- Guarantee state updates are replicated to other clients and to the server.
- Most replicated data structures rely on crdt algorithms to merge concurrent and offline edits from multiple clients.
- If not a replicated data structure, we’d have to pass that info though websockets/requests/messaging services etc.
Projects
- OSS
- Services: Liveblocks, Partykit, Triplit, Ditto etc.
Replicated Database
Write to your database while offline. I can write to mine while offline. We can then both come online and merge our databases together, without conflict. See Data Replication. Also see Riffle Systems
Example databases
-
Postgres-SQlite
Write to PostgreSQL and replicate to a client side db such as sqlite
- ElectricSQL (write back, partial replication)
- powersync (write back, partial replication)
- PowerSync supports syncing from multiple databases.
- sqledge (readonly? from the creators of ably)
-
sqlite - sqlite
- cr-sqlite
- https://github.com/orbitinghail/sqlsync
- Only supports full db sync (no partial replication)
- Sync engine is simpler
- Provides a custom storage layer to SQLite that keeps everything in sync.
- Mycelial
Synchronization with PostgreSQL
- Postgres sequences can commit out-of-order
- Don’t these wrap around after 2 billion transactions? How do you handle that?
- xmin does, the snapshot one is u64, so you are good.
- Don’t these wrap around after 2 billion transactions? How do you handle that?
Schema Evolution
Examples / Uncategorized
These are basically approaches that i’ve yet to go through and categorize further
- ServerFree Architecture: run the “backend code” and the DB (SQLite) in the browser | Lobsters
- What Happens When You Put a Database in Your Browser?
- Resilient Sync for Local First | Hacker News 🌟
- Similar to Delta Lake’s consistency model
War stories
HN Comment 1
- Initially we tried to use IndexDB to give us more of the local-first features by caching between loads, but it was more hassle than it was worth.
- Instead we settled on live queries using Hasrua (we were a very early user / paying customer). We preload all the data that the user is going to need on app boot and then selectively load large chunks as they open their projects. These are then keeping mobx models up to date.
- For mutating data we have a simple transaction system that you wrap around models to update them. It records and sends the mutations and makes sure that outstanding mutations are replayed over model changes locally.
Others
- Offline support just kinda happened for free. Once I added a service worker to serve the app code offline, Automerge can just persist writes to local IndexedDB and then sync when network is back again, not a big deal. Classic local-first win