File synchronization
Mutagen’s file synchronization uses a novel algorithm that combines the performance of the rsync algorithm with bidirectionality and low-latency filesystem watching. It can be used (for example) to synchronize code between your laptop and a remote container in effective real-time, allowing you to edit code with your editor of choice and have it pushed to the remote environment almost instantly. Because it uses differential transfers, Mutagen’s synchronization also works effectively for transferring large binary files such as images or build artifacts.
Design and architecture
Mutagen’s synchronization sessions each operate between an arbitrary pair of
endpoints, termed alpha and beta. The reason that these endpoints aren’t
termed “source” and “destination” is that Mutagen has multiple synchronization
modes, including bidirectional modes, where these terms don’t necessarily apply.
Thus, the order of endpoints provided to the mutagen sync create
command
simply establishes these identities (with the first endpoint being alpha and
the second being beta), whereas the roles of these endpoints are determined by
the synchronization mode.
Synchronization sessions are extremely flexible, allowing both endpoints to be local, one to be local and one to be remote, or both to be remote (in which case the local system is simply used as a proxy and controller for synchronization). This flexible topology, combined with the myriad synchronization configuration options, allows users to create arbitrarily complex synchronization topologies.
Internally, the synchronization algorithm uses a three-way merge to reconcile changes from both endpoints in a safe and controlled fashion. This means that synchronization sessions track the most-recently agreed-upon content and use that information to detect the changes that each endpoint has made, as well as any conflicts. This algorithm operates in short cycles, with every filesystem change triggering a cycle. Each cycle consists of a scan of both endpoints, a reconciliation of endpoint contents, staging of updated contents from one endpoint to another, and application of changes. These cycles are designed to be extremely efficient, to the point of being imperceptibly fast for most content and changes.
Modes
Mutagen provides four different synchronization modes:
two-way-safe
(Default): In this bidirectional synchronization mode, both endpoints are treated with equal precedence, and conflicts are only automatically resolved if they don’t result in data loss (for example, modifications on one endpoint can can overwrite deletions of the corresponding content on the other endpoint). If conflicts can’t be automatically resolved, they are stored in the session state (and can be enumerated with themutagen sync list
command).two-way-resolved
: This is the same astwo-way-safe
, except that the alpha endpoint automatically wins all conflicts, including cases where alpha’s deletions would overwrite beta’s modifications or creations. No conflicts can occur in this synchronization mode.one-way-safe
: In this unidirectional synchronization mode, changes are only allowed to propagate from alpha to beta. Deletions on beta are overwritten by content from alpha (i.e. the content comes right back), but modifications and creations on beta can’t be overwritten by alpha (unless both endpoints have made the same modifications or created the same content). Conflicting contents on beta that can’t be overwritten will be recorded to the session state (and can be enumerated with themutagen sync list
command). Extra content on beta that doesn’t conflict with contents on alpha is simply ignored.one-way-replica
: In this unidirectional synchronization mode, beta becomes an exact replica of alpha. Any modifications or additional content on beta are instantly overwritten or removed, respectively. No conflicts can occur in this synchronization mode.
You can think of these modes as existing in a table:
Safe | Auto-resolved | |
---|---|---|
Bidirectional | two-way-safe | two-way-resolved |
Unidirectional | one-way-safe | one-way-replica |
“Safe” in this case means that conflicts are only automatically resolved if they don’t result in the loss of unsynchronized data. “Auto-resolved” means that the alpha endpoint wins any conflict, even if it involves deleting additional unsynchronized content from beta.
These modes can be specified on a per-session basis by passing the
-m/--sync-mode=<mode>
flag to the mutagen sync create
command. These modes
can be specified on a default per-session basis by including the following
configuration in ~/.mutagen.yml
:
sync:
defaults:
mode: "<mode>"
Conflict resolution
Conflicts (which can occur in two-way-safe
and one-way-safe
modes) can be
resolved manually by deleting the content on the endpoint which you wish to have
lose the conflict. Once deleted, the conflict will no longer exist since
deletions can be overwritten.
Endpoint URLs
Synchronization endpoint URLs are nothing more than pointers to local or remote filesystem locations. The exact format for forwarding endpoint URLs is transport-dependent, but each contains a path component identifying the filesystem location to be synchronized. Filesystem locations can be directory hierarchies or individual files — Mutagen can synchronize either.
Mutagen only applies a few restrictions to synchronization endpoints:
- Symbolic links aren’t allowed as synchronization roots, though they can exist as part of the parent path of a synchronization roots, and of course are allowed to exist inside of synchronization roots.
- Synchronization of directory hierarchies that span filesystem boundaries is not allowed.
In both cases, Mutagen will detect and warn about the condition.
Configuration
Synchronization configuration is extensive, with configuration options controlling:
- Synchronization modes
- Ignores
- Permissions
- Symbolic links
- Filesystem watching
- Filesystem probing and scanning
- File staging
- Size limits
Session management
Synchronization sessions are managed using the mutagen sync
commands, namely
create
, list
, monitor
, flush
, pause
, resume
, and terminate
.
Example usage for these commands can be found in the
Getting started guide. The
create
command comes with a number of flags that control the configuration of
the sessions that it creates, and the other synchronization session management
commands all include flags that control their behavior. For more information
about a particular command, use:
# Show help about a particular synchronization session management command.
mutagen sync <command> --help