Lsyncd: How to sync files across multiple servers
Table of Contents
What is lsyncd
? #
Lsyncd watches a local directory trees event monitor interface (inotify or fsevents). It aggregates and combines events for a few seconds and then spawns one (or more) process(es) to synchronize the changes. By default this is rsync. Lsyncd is thus a light-weight live mirror solution that is comparatively easy to install not requiring new filesystems or block devices and does not hamper local filesystem performance.
Lsyncd is designed to synchronize a local directory tree with low profile of expected changes to a remote mirror. Lsyncd is especially useful to sync data from a secure area to a not-so-secure area.
Features #
With lsyncd, you can kick off scripts using Lua, a lightweight, high-performance programming language that is commonly used for scripting and embedding in other applications. It’s designed to be easy to learn, easy to use, and easy to embed. It has a small footprint, efficient memory management, and a simple and consistent syntax. Lua is also considered fast, as it uses a just-in-time (JIT) compiler, which compiles Lua code to machine code, improving execution speed. Lua is often used in game engines, embedded systems, and automation systems, as well as some other applications and platforms.
Through this framework, lsyncd provides:
-
Near-real-time file monitoring: As stated earlier, Lsyncd uses inotify to monitor file changes in a local directory and compare those changes to a target directory. The level of “real-time” can be configured such that changes are detected at given intervals (every minute or 5 minutes or whatever value you set in the conf file); lsyncd detects changes as they happen. If set to a low value, this allows for near-instantaneous file synchronization. Yet, network latency, file size, number of files, and other factors play a role in the level of “real-time”.
-
Incremental file transfers: Lsyncd can call rsync to transfer files incrementally, where only the delta or change portion of the file is transferred. This means that file and directory changes can be sent efficiently between 2 systems, instead of having to send the entire file. That said, there may be problems sending incremental deltas of larger files.
-
Include/exclude: Like regex, Lsyncd allows for the use of filters to exclude files or directories from sync jobs. Most contemporary sync solutions support this, but this is useful in environments where a subset of files within a directory tree need to be included or excluded from the transfer or synchronization job.
-
Hub-and-trickle distribution: In a hub-and-spoke topology, Lsyncd can be configured to synchronize files to multiple locations sequentially, making it useful for creating backups or keeping multiple servers in sync. But there are caveats with hub-and-spoke. Each spoke pairs with the hub; thus, changes are only propagated between the hub and the one spoke. This is by no means real-time nor scalable, but may suffice for basic file backup use cases.
-
Scripting: Lsyncd is written in Lua, a lightweight, high-level scripting language that is commonly used for complex scripting and embedding in other applications. This allows for custom scripts and actions to be executed before and after synchronizing files. When multiple, interdependent programs need to be run, Lua provides an efficient framework for that.
-
Multiple operation modes: Depending on your use case, Lsyncd can operate in different modes like mirroring, propagating (basic file copy), or one-way sync. It can also use different methods of offloading the synchronization process like rsync or rsync-over-ssh.
-
Logging: In
/var/log/lsyncd/lsyncd.log
and/var/log/lsyncd/lsyncd.status
Lsyncd provides logging functionality and can output the logs to a file, syslog or both. It also provides debugging options that can be used to troubleshoot any errors that may occur.
Install lsyncd #
# Ubuntu/Debian
apt -y install lsyncd
# or CentOS
yum -y install lsyncd
create logs paths #
mkdir /var/log/lsyncd
touch /var/log/lsyncd/lsyncd.{log,status}
Settings directive #
Example: #
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
statusInterval = 30
}
- logfile the following code will instruct lsyncd to log into /var/log/lsyncd/lsyncd.log
- statusFile periodically update the file /var/log/lsyncd/lsyncd.status with its status
- statusInterval write the status file at shortest after 30 seconds has passed (default: 10)
Valid keys for settings are #
key | type | description |
---|---|---|
logfile | FILENAME | logs into this file |
pidfile | FILENAME | logs PID into this file |
nodaemon | BOOL | does not detach |
statusFile | FILENAME | periodically writes a status report to this file |
statusInterval | NUMBER | writes the status file at shortest after this number of seconds has passed (default: 10) |
logfacility | STRING | syslog facility, default “user” |
logident | STRING | syslog identification (tag), default “lsyncd” |
insist | BOOL | keep running at startup although one or more targets failed due to not being reachable. |
inotifyMode | STRING | Specifies on inotify systems what kind of changes to listen to. Can be “Modify”, “CloseWrite” (default) or “CloseWrite or Modify”. |
maxProcesses | NUMBER | Lysncd will not spawn more than these number of processes. This adds across all syncs. |
Sync directive #
You can simply choose from a set of three default implementations which are: default.rsync
, default.rsyncssh
and default.direct
.
default.direct can be used to keep two local directories in sync with better performance than using default.rsync. default.direct uses (just like default.rsync) rsync on startup to initially synchronize the target directory with the source directory.
Using default.default #
However, during normal operation default.direct uses /bin/cp, /bin/rm and /bin/mv to sync directories.
The default.direct configuration file example, located at /etc/lsyncd.conf.lua:
-- User configuration file for lsyncd.
-- Simple example for default rsync, but executing moves through on the target.
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
statusInterval = 30
}
sync {
default.direct,
source = "/home/e.orlov/source_dir",
target = "/home/e.orlov/target_dir"
}
Ensure that you specify the full path for both the source and destination directories when synchronizing directories.
Using default.rsync #
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
statusInterval = 30
}
sync {
default.rsync,
source = "/home/e.orlov/source_dir",
target = "eorlov.org:~/target_dir",
delay = 15,
exclude = {'.git/', 'vendor/', 'web/node_modules', 'bundled/'},
rsync = {
archive = true,
compress = true
}
}
- archive: it’s a quick way of saying you want recursion and want to preserve almost everything
- compress: with this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmitted - something that is useful over a slow connection.
As for the rsync{} settings (you can refer to the rsync documentation for further options, or read here):
Using default.rsyncssh #
Different to default.rsync it does not take an uniform target parameter, but needs host and targetdir separated. Here is a sample lsyncd config file, we use default.rsyncssh under sync section instead of default.rsync
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
statusInterval = 30
}
sync {
default.rsyncssh,
source = "/home/e.orlov/source_dir",
host="eorlov.org",
excludeFrom="/etc/lsyncd.exclude",
target = "~/target_dir",
rsync = {
archive = true,
compress = true,
whole_file = false
},
ssh = {
port = 22
}
}
- whole_file Transfer only entire files: issues/256
Tips #
Synching in both directions #
For synching in both directions, the required parameter is temp_dir
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
statusInterval = 30
}
sync {
default.rsyncssh,
source="/path/to/dir",
host="eorlov.org",
targetdir="/path/to/dir/new_server",
excludeFrom="/etc/lsyncd.exclude",
delay=10,
delete = true,
rsync = {
archive = true,
compress = false,
whole_file = false,
sparse = true,
update = true,
temp_dir="/tmp/",
links = true,
times = true,
protect_args = false,
acls = true,
verbose = true
},
ssh = {
port = 22,
_extra = {"/usr/bin/ssh -l user -p 22 -i /home/user/.ssh/id_rsa -o StrictHostKeyChecking=no"}
}
}
Single file sync #
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
statusInterval = 30
}
sync {
default.rsyncssh,
source="/etc/security",
host="eorlov.org",
targetdir="/etc/security",
delay=10,
delete=false,
rsync = {
archive = true,
compress = false,
whole_file = false,
sparse = true,
update = true,
temp_dir="/tmp/",
links = true,
times = true,
protect_args = false,
acls = true,
verbose = true,
_extra = {
"--include=limits.conf",
"--exclude=*"
}
},
ssh = {
port = 22
}
}
Sync across multiple servers #
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
statusInterval = 30
maxProcesses = 3,
}
targetlist = {
"1.0.0.2:/var/www/html",
"1.0.0.3:/var/www/html"
}
for _, server in ipairs(targetlist) do
sync{ default.rsync,
source="/var/www/html",
rsyncOps="-rltvupgo"
target=server
}
end
lsyncd delete files while syncing directories #
By default Lsyncd will delete files on the target that are not present at the source since this is a fundamental part of the idea of keeping the target in sync with the source. However, many users requested exceptions for this, for various reasons, so all default implementations take delete as an additional parameter.
Valid values for delete are
- delete = true Default. Lsyncd will delete on the target whatever is not in the source. At startup and what’s being deleted during normal operation.
- delete = false Lsyncd will not delete any files on the target. Not on startup nor on normal operation. (Overwrites are possible though)
- delete = ‘startup’ Lsyncd will delete files on the target when it starts up but not on normal operation.
- delete = ‘running’ Lsyncd will not delete files on the target when it starts up but will delete those that are removed during normal operation.
Optimizations #
Make sure that you have enough inotify watches for lsyncd to work. There is no hard and fast rule, you can start the service and it exits you should increase the inotify watches (respective of your RAM amount). Bear in mind that a inotify watch consumes 10KB of RAM.
sudo sysctl fs.inotify
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 500000
The above is a sample response (with 50000 watches). To change the number of watches you can use:
sudo sysctl -w fs.inotify.max_user_watches=50000
Errors #
ssh_exchange_identification: read: Connection reset by peer #
When dealing with a large number of files to synchronize, we make modifications to the /etc/ssh/sshd_config
file on the receiving server.
MaxSessions 100
MaxStartups 100:30:1000
MaxSessions sets the maximum number of open sessions allowed for a single network connection. The default value is 10.
MaxStartups the parameter configuration is in the form of “start:rate:full”. In our case, it means disconnecting with a 30% probability when there are 100 unauthenticated connections, with a linear increase in probability up to 100% when it reaches 1000.