How to implement a Sync Engine for the Web

I implemented a sync engine for the web: reactive client local storage, web workers for background sync, socket, CRDT, server storage. This is how it all works.

Sandro Maglione

Contact me

1846 words

・

Users want apps to be fast, privacy-friendly, multi device, and working offline. Developers want to reduce complexity while still providing a delightful user experience.

A sync engine is the key that unlocks both requirements ⚡️

I implemented a sync engine for the web, end-to-end from client(s) to server. This article is an overview of how a sync engine for the web works and how you can implement one for yourself.

By the end of the article, you will understand why and how all the components of the diagram below work together to create a superior user and developer experience.

At the end of the article you will understand how all the pieces fit together, and how you can implement your own sync engine for the web.

Store data on the client

Most of the complexity of client code is caused by network requests:

Handling all possible errors (missing connection, encoded/decode requests, error responses)
Manage loading states
Build HTTP requests with correct token, headers, parameters
Handle asynchronous requests inside the UI

It's also where the user experience starts to degrade:

Long waiting times
Unclear error messages when something bad happens
No offline support

When the client reads and writes on local storage both the user and developer experiences become better and faster.

All these issues disappear when the client writes and reads data locally:

Fast (even synchronous)
Privacy-friendly
Offline by default
Persistent

Let's start from this simple idea: the client (UI) always writes and reads locally.

Local-only: the best developer experience

When the data is stored locally, it's possible to implement an "observable" that automatically re-render the UI when data changes.

A live query provides data to a component and re-renders it when the requested data changes.

This reduces the responsibility of the UI to mutating data, just like a simple useState in React.

Local-only apps store data on some form of local storage and use live queries to automatically re-render the UI when data changes.

Some local storage options currently available on the web are:

IndexedDB (DexieJs with liveQuery)
Postgres (PgLite's live queries)
SQLite (using WASM and something like wa-sqlite)

With live queries there is no need of any store-based state management libraries (Redux, Jotai, Zustand).

You also don't need TanStack Query, since all the data is stored locally.

import { useLiveQuery } from "dexie-react-hooks";
import { db } from "./db";

export function FriendList() {
  // Automatically re-renders when data changes ⚡️
  const friends = useLiveQuery(() =>
    db.friends.where("age").between(50, 75).toArray()
  );

  return (
    <>
      <h2>Friends</h2>
      <ul>
        {friends?.map((friend) => (
          <li key={friend.id}>
            {friend.name}, {friend.age}
          </li>
        ))}
      </ul>
    </>
  );
}

Syncing data between clients

Storing data locally has many advantages, but one fundamental drawback: the data is trapped inside the user's device.

No long-term persistence
No collaboration with other users
No way to share data between multiple devices

The core requirement therefore becomes:

How to keep all the advantages of local data storage, while also allowing collaboration and multi-device support?

This is where you introduce a Sync Engine:

A sync engine synchronizes the data between clients while allowing each client to read and write locally.

Nothing changes from the perspective of the UI, data is still stored locally. A sync engine acts on the background to make sure the local data is in sync between clients.

The aim is to keep all the advantages of client-only, while also sharing data between clients:

Each client only cares about its own local data
The sync engine makes sure the local data is in sync between all clients

From the perspective of the UI code nothing changes when working client-only or with a sync engine.

The UI keeps mutating data and using live queries for listening to local data changes: fast, persistent, offline.

A sync engine works "in the background" to update the local data to include changes from other clients.

The server (remote storage) is connected with multiple clients. It collects all changes, resolves them, and sync them with all clients.

Web Worker for syncing

Web workers are ideal to keep syncing independent of the UI:

A Web Worker allows running script in background threads on the web.

A syncing web worker performs 2 roles:

Push: Listen for changes on the local storage (using live queries) and push those changes to the server
Pull: Listen for updates from the server and commit those inside local storage

Pulling updates from the server is achieved by creating a web socket connection, so that the server can send changes as soon as they become available.

WebSocket makes it possible to open a two-way interactive communication session between the user's browser and a server.

A Web Worker works independently of the UI and in the background. Web Worker and UI share the same local storage and live query API. The Web Worker creates a socket connection with the server to exchange updates (sync engine).

Syncing updates on the server

The server is responsible for merging changes from multiple independent clients such that the final state is consistent between all clients (eventual consistency).

This objective is achieved using a CRDT (Conflict-free Replicated Data Type).

A CRDT has the following properties:

Clients can update their local data independently, concurrently and without coordinating with other clients
An algorithm (part of the data type) automatically resolves any inconsistencies
Although local states may differ at any particular point in time, they are guaranteed to eventually converge

Libraries like Loro or Yjs implement CRDTs in TypeScript. They both allow to export a CRDT in binary format (Uint8Array) that can be sent over the network for syncing.

/** Client 1️⃣ */
const doc = new LoroDoc();
doc.getText("text").insert(0, "Hello world!");

const bytes: Uint8Array = doc.export({ mode: "update" });


/**
 * 👆
 * Send `bytes` to another device (`Uint8Array`)
 * 👇
 */


/** Client 2️⃣ */
const doc = new LoroDoc();
doc.getText("text").insert(0, "Hi!");

// CRDT algorithm to merge the changes
doc.import(bytes);

I suggest using a library for resolving change between clients. Implementing your own CRDT or using another strategy like event sourcing is where I found the most complexity.

In my implementation I used Loro. The rest of the article is based on the Loro API (LoroDoc).

Handle multiple formats

LoroDoc is the central data structure we use when applying mutations (insert/update/delete).

For persistence, LoroDoc can be converted to Uint8Array (by calling export). Since Uint8Array is serializable (as number[]), it can be stored in IndexedDB and also sent to the server.

On the UI we provide plain JSON values. This can be done by converting a LoroDoc to JSON using toJSON().

The sync engine converts between multiple formats based on their usage: JSON in the UI, LoroDoc for mutations, Uint8Array/number[] for storage and presistence.

The server stores the Loro CRDT in bytes (number[]). When it receives updates from a client, it can merge them by converting number[] to Uint8Array, Uint8Array to LoroDoc, and then import the new changes into the current value from storage.

By merging changes with import from LoroDoc, we use the CRDT algorithm internal in LoroDoc to make sure changes are consistent.

The client sends number[] bytes over the network. The server imports both the client changes and the current storage as LoroDoc, and then exports it again as number[] when sending updates to other clients.

Live updates on the server

The server needs a similar push/pull mechanism as the client:

Pull: An API receives changes from a client, resolves the state using a CRDT (import), and writes it into storage (export)
Push: The socket connection listen for storage changes and wires them to each connected client

Just like in the client, the server also needs a "live query" mechanism that listens for changes in storage, to then send them live to other clients through the socket connection.

Server and clients are connected through a socket. A client pushes changes that are stored on the server. Another socket listens for changes in storage and wires them live to other clients.

Listening for live updates on a database like Postgres requires setting up Data Replication, which uses a Write-ahead Log (WAL):

WAL is a file that stores all changes made to the database, such as inserts, updates and deletes in sequence.

This requires creating a Publication (CREATE PUBLICATION), a Subscription (CREATE SUBSCRIPTION), and a Replication Slot.

Another more simple solution would be storing everything inside a file and using NodeJs watch API.

"Storage" can be anything as long as it supports "live" updates.

Stream changes to the client

As we saw previously, the socket connection interacts with a Web Worker on the client.

The Web Worker keeps running on the background, waiting for updates.

The Web Worker receives a Stream of updates from the server. When it receives a new update from the server, it stores it in the client's local storage.

Since the changes comes from a trusted server, the client can assume that the data is valid and therefore replace everything with the new value.

The Web Worker keeps running on the background waiting for changes streaming from the socket. Each new change is stored inside local storage.

Offline mode

When a connection with the server cannot be established, the client can still update its local data.

We store local changes in a separate location inside storage to mark them as "waiting syncing" (e.g. another table inside IndexedDB).

Local changes are applied to the current client, but still unverified. The client can keep making changes even when offline.

When the device comes back online, all the local changes are synced on the server, that then responds with a validated snapshot.

At this point, the client can remove all local changes and instead rely on the data from the server.

Local changes are unverified and stored in a separate local storage location. When the client syncs local changes successfully, then it can remove them and instead rely on the data from the server.

Bootstrap

On initial load, the client can make a single API request to pull the latest changes from the server.

This initial loading process is called bootstrapping.

Bootstrapping is necessary to move the client up to date with changes that happened since the last time the user opened the app.

Bootstrapping is necessary only on the initial load. After the initial bootstrapping, then the socket connection will make sure to stream live changes.

For bootstrapping, a single GET request to an HTTP endpoint is enough.

Manual sync requests

The same bootstrap endpoint can be used to implement manual syncing requests (triggered by the user).

In my implementation, I create a new Web Worker on the initial load, that performs the initial bootstrap.

This new Web Worker is different from the Web Worker for the socket connection.

The user can then click on a button to trigger a manual sync. Clicking the button sends a message to the worker to perform another sync (HTTP request).

A Web Worker is responsible for the initial bootstrap and to trigger manual syncing. This consists in a GET request to the server.

These are all the components for a sync engine on the web. It all starts from a simple requirement:

The client (UI) always writes and reads locally.

The UI reads and write with Local Storage, and doesn't care about other parts of the architecture.

A Web Worker acts on the background by creating a socket connection with the server. It sends and receives updates that are stored inside Local Storage.

On the server, state is resolved using a CRDT and stored inside Remote Storage.

Overview of the full architecture of the Sync Engine from client(s) to server. The UI interacts with local storage, which is also connected to a web worker. The web worker creates a socket connection with the server. The server resolves the state between multiple clients using a CRDT and store it in its storage.

Sync engine implementation

The details of the sync engine implementation depend on your tech stack and requirements. Some general guidelines:

If you are using a frontend framework, it works best for it to be client-only. Frameworks like next are not ideal because they include both client and server, while local-first apps are generally client-only
You can use any CRDT library you prefer, or even no CRDT at all (you may consider event sourcing instead)
You are not required to use any specific local or remote storage option

That's why the article focuses on the architecture of a sync engine, and not any specific implementation. You can choose libraries and technologies that best fit your project.

Here are the technologies I used for my own implementation:

TanStack Router (Client/frontend framework based on React)
Loro (CRDT)
IndexedDB with Dexie (Client storage)
effect (Backend server and socket connection)

TanStack Router runs on Vite, which supports Web Workers.

Other requirements

Migrations: by using Loro and storing data as bytes, in practice no data migration should be required. Loro will merge changes regardless of their schema
Authentication: In my implementation I organized the app in workspaces. You can join any workspace as long as you get access to its unique UUID. Below the schema of the table stored on the server

export class ServerTableMetadata extends Schema.Class<ServerTableMetadata>(
  "ServerTableMetadata"
)({
  serverId: Schema.UUID, // Unique id generated on the server
  clientId: Schema.UUID, // Used to deduplicate snapshot on the client

  workspaceId: Schema.UUID, // Auth with workspace unique id (generated by each client)
  ownerId: Schema.UUID, // Identify user who made each change
  table: Table, // Different syncing for each table, to avoid huge payloads

  snapshot: Schema.Uint8Array, // CRDT encoded in bytes
}) {}

Encryption: Encryption requires adding a layer inside the syncing web worker that encrypts the data before wiring it to the server. The same should happen when the server sends data back to the client (end-to-end encryption)
Querying: Since data is stored as bytes, making actual SQL queries on the server is not possible. However, this is by design. The server is meant for syncing, not as an API. Querying should instead be performed locally on the client by converting bytes to JSON

Not all apps benefit from this architecture. Local-first works best for apps that deal with user private data, with no centralized security logic (e.g. not ideal for banks) and no shared global feeds (e.g. not ideal for social networks).