File synchronization

Mutagen’s file synchronization uses a novel algorithm that combines the performance of the rsync algorithm with bidirectionality and low-latency filesystem watching. It can be used (for example) to synchronize code between your laptop and a remote container in effective real-time, allowing you to edit code with your editor of choice and have it pushed to the remote environment almost instantly. Because it uses differential transfers, Mutagen’s synchronization also works effectively for transferring large binary files such as images or build artifacts.

Design and architecture

Mutagen’s synchronization sessions each operate between an arbitrary pair of endpoints, termed alpha and beta. The reason that these endpoints aren’t termed “source” and “destination” is that Mutagen has multiple synchronization modes, including bidirectional modes, where these terms don’t necessarily apply. Thus, the order of endpoints provided to the mutagen sync create command simply establishes these identities (with the first endpoint being alpha and the second being beta), whereas the roles of these endpoints are determined by the synchronization mode.

Synchronization sessions are extremely flexible, allowing both endpoints to be local, one to be local and one to be remote, or both to be remote (in which case the local system is simply used as a proxy and controller for synchronization). This flexible topology, combined with the myriad synchronization configuration options, allows users to create arbitrarily complex synchronization topologies.

Internally, the synchronization algorithm uses a three-way merge to reconcile changes from both endpoints in a safe and controlled fashion. This means that synchronization sessions track the most-recently agreed-upon content and use that information to detect the changes that each endpoint has made, as well as any conflicts. This algorithm operates in short cycles, with every filesystem change triggering a cycle. Each cycle consists of a scan of both endpoints, a reconciliation of endpoint contents, staging of updated contents from one endpoint to another, and application of changes. These cycles are designed to be extremely efficient, to the point of being imperceptibly fast for most content and changes.

Modes

Mutagen provides four different synchronization modes:

two-way-safe (Default): In this bidirectional synchronization mode, both endpoints are treated with equal precedence, and conflicts are only automatically resolved if they don’t result in data loss (for example, modifications on one endpoint can can overwrite deletions of the corresponding content on the other endpoint). If conflicts can’t be automatically resolved, they are stored in the session state (and can be enumerated with the mutagen sync list command).
two-way-resolved: This is the same as two-way-safe, except that the alpha endpoint automatically wins all conflicts, including cases where alpha’s deletions would overwrite beta’s modifications or creations. No conflicts can occur in this synchronization mode.
one-way-safe: In this unidirectional synchronization mode, changes are only allowed to propagate from alpha to beta. Deletions on beta are overwritten by content from alpha (i.e. the content comes right back), but modifications and creations on beta can’t be overwritten by alpha (unless both endpoints have made the same modifications or created the same content). Conflicting contents on beta that can’t be overwritten will be recorded to the session state (and can be enumerated with the mutagen sync list command). Extra content on beta that doesn’t conflict with contents on alpha is simply ignored.
one-way-replica: In this unidirectional synchronization mode, beta becomes an exact replica of alpha. Any modifications or additional content on beta are instantly overwritten or removed, respectively. No conflicts can occur in this synchronization mode.

You can think of these modes as existing in a table:

	Safe	Auto-resolved
Bidirectional	`two-way-safe`	`two-way-resolved`
Unidirectional	`one-way-safe`	`one-way-replica`

“Safe” in this case means that conflicts are only automatically resolved if they don’t result in the loss of unsynchronized data. “Auto-resolved” means that the alpha endpoint wins any conflict, even if it involves deleting additional unsynchronized content from beta.

These modes can be specified on a per-session basis by passing the -m/--sync-mode=<mode> flag to the mutagen sync create command. These modes can be specified on a default per-session basis by including the following configuration in ~/.mutagen.yml:

sync:
  defaults:
    mode: "<mode>"

Conflict resolution

Conflicts (which can occur in two-way-safe and one-way-safe modes) can be resolved manually by deleting the content on the endpoint which you wish to have lose the conflict. Once deleted, the conflict will no longer exist since deletions can be overwritten.

Endpoint URLs

Synchronization endpoint URLs are nothing more than pointers to local or remote filesystem locations. The exact format for forwarding endpoint URLs is transport-dependent, but each contains a path component identifying the filesystem location to be synchronized. Filesystem locations can be directory hierarchies or individual files — Mutagen can synchronize either.

Mutagen only applies a few restrictions to synchronization endpoints:

Symbolic links aren’t allowed as synchronization roots, though they can exist as part of the parent path of a synchronization roots, and of course are allowed to exist inside of synchronization roots.
Synchronization of directory hierarchies that span filesystem boundaries is not allowed.

In both cases, Mutagen will detect and warn about the condition.

Configuration

Synchronization configuration is extensive, with configuration options controlling:

Session management

Synchronization sessions are managed using the mutagen sync commands, namely create, list, monitor, flush, pause, resume, and terminate. Example usage for these commands can be found in the Getting started guide. The create command comes with a number of flags that control the configuration of the sessions that it creates, and the other synchronization session management commands all include flags that control their behavior. For more information about a particular command, use:

# Show help about a particular synchronization session management command.
mutagen sync <command> --help