66 Comments
- geronimo, on 10/12/2007, -1/+23Why don't you start leading by example by never posting again on digg since digg runs on linux.
- sancho, on 10/12/2007, -0/+16scp is fine, but it will transfer every file, every time, regardless of whether or not it changed. For large directory structures with many files that don't change often, this will reduce the amount of data you have to send over the network tremendously.
- MattElmore, on 10/12/2007, -1/+13I didn't realize using rsync over ssh with public key authentication was a novel concept?
- VyPR, on 10/12/2007, -2/+10There is no spoon.
- ricksite, on 10/12/2007, -0/+7If you install cygwin in your windows environment, it will give you rsync and a bunch of other *nix tools.
- sancho, on 10/12/2007, -0/+5Here are a few additions/tips:
1) Using sed in such a simple way, you can keep yourself from having to escape the / and make the whole line more readable by using something like: "s#/##" The character right after the 's' is always the delimiter for the regular expression, though you want to avoid using special characters if possible.
2) Why bother pre-building the directory list? Put the build process in the script so that if you add users later, you don't have to go back and repeat that step.
3) Some error checking in the script would be nice. If rsync errors, save it to a file in /tmp and mail that output to the user at the end. The bash variable $? will give you the return code of the last command run. Most "success" returns will be 0, so if this is 1, something probably went wrong.
Also, I don't see anything particularly wrong with letting anyone and his brother attempt SSH connections to your server (particularly if you want to be able to access the server from dynamic IP ranges while you're on-the-go). You can use only SSH keys if you want to avoid brute force password attacks, or use dynamic firewall rules to block addresses which make too many SSH attempts within a certain time period (iptables and pf both support this). - anastrophe, on 10/12/2007, -3/+7Basic UNIX Practices Make Front Page of Digg: Next up: 1,000 Uses For A Spoon.
- splinter, on 10/12/2007, -0/+4I use rsync in windows all the time using cwRsync.
Link: http://www.itefix.no/phpws/index.php?module=pagemaster&PAGE_user_op=view_page&PAGE_id=6&MMN_position=23:23
It works fantastic. - inactive, on 10/12/2007, -0/+3We do nightly syncs between multiple sites over T1s from each site and do not see these kinds of delays with rsync. BTW, we're running on all NetApp filers. What kind of crappy server do you have? SATA raid arrays really don't come close to these, even with 15k RPM Western Digital RAID Edition drives in RAID-6. The system doing the sync is a Sun Ultra 60 with 2GB RAM. We're syncing all night long. How could you be exhausting memory?
And if it takes you that long to sync, why not be smart and split the job up to parallelize the sync, directly by directory? You could have smaller throw-away systems with less memory backing up portions of the filesystem. Obviously more critical parts can be prioritized ahead of other areas. - inactive, on 10/12/2007, -0/+3If you're copying files frequently between two locations you should probably have a VPN between the sites anyway and just rsync over that. Otherwise this is the way to do it.
If you're doing a server to server migration, you DO want to use rsync, because the copy is recoverable. If the copy dies in the middle, just rerun the command and rsync will only copy the differences. If there is a file that was only partially copied, it will still be replaced by a good copy. - stuartcw, on 10/12/2007, -0/+3Google recruiters take note. There's a couple of interview questions lurking in this post. :-)
- tigro, on 10/12/2007, -1/+4cygwin: POSIX envrioment for windows
http://www.cygwin.com/
make sure you select the rsync package on install - NTolerance, on 10/12/2007, -1/+4bury
- heffae, on 10/12/2007, -0/+3FWIW Lifehacker had a decent write up on using rsync. I belive it was between OSX and Windows. http://lifehacker.com/software/rsync/geek-to-live-mirror-files-across-systems-with-rsync-196122.php
- zoom1928, on 10/12/2007, -0/+3shokk wrote:
> How could you be exhausting memory?
Just do the math. I gave you the numbers in my post. From the rsync FAQ at:
http://samba.anu.edu.au/rsync/FAQ.html
"Yes, rsync uses a lot of memory. The majority of the memory is used to hold the list of files being transferred. This takes about 100 bytes per file, so if you are transferring 800,000 files then rsync will consume about 80M of memory."
In my experience it is closer to 200 than 100. With just 100 bytes per file, rsync uses 2.24 GBytes of RAM before it even begins to start the transfer with the server I mentioned. Even the creators of the program admit it uses huge amounts of RAM. If you're not seeing problems with RAM usage then you're using it on a trivial system. Stop disagreeing with the people that know better. We have "been there, done that." We know rsync is completely unsuitable for anything but a trivial system.
Because you asked, the system has six Seagate 15k SCSI drives connected to a Mylex RAID card on a Pentium D 2.66GHz w/ 2GB of RAM. An ls -lR takes almost a day to complete. Even w/o the network communication, rsync would still take at least 24 hours of work before it could start copying files with its current architecture. It is broken by design.
> split the job up to parallelize the sync
I tried that but the disk seeks make it very slow. Our current solution is to write a list of files that are changed then use that list with a PHP program to feed to scp. It sucks, but unlike rsync, it mostly works. - krux, on 10/12/2007, -0/+3rsnapshot is another good utility which lets you specify which directories to copy in a config, keep multiple snapshots with hard links so you only have to store the changes between snapshots, etc.. good tool.. one thing the way he called ssh was needlessly complex
rsync -a sourcedir user@desthost:destdir
will use ssh by default. - inactive, on 10/12/2007, -0/+2No. For the one reason that NFS sucks over a WAN.
Tell, me you're not a sysadmin. - lolwtfhaha, on 10/12/2007, -0/+2@anastrophe
We use noatime and it does not make a helluva difference. Certainly not make-or-break. Besides he states ls -lR takes all day, and ls does not update access times anyway. I'm not even sure if rsync updates the atimes of files that are not transferred since it relies on file size and date by default (chksum is optional). In any case, raid5 is not the problem, having 24 million files IS, and that's where rsync is simply not appropriate-- and the developers agree!.
Now, I have to disagree with zoom, the article IS accurate in a certain instance; large amounts of large files. If you had a few thousand files that were a gigabyte each (movies?) then rsync would work well and qualify for the use of "massive" :-) - 022A, on 10/12/2007, -0/+2This article misses...
There isn't enough handholding and too many complicated lines for someone brand new to the concept. On the other hand, anyone comfortable with all those lines probably already knows how to do this, faster and easier.
Also, there's no mention of one of the main reasons for using rsync in the first place, delta copies. - bevans, on 10/12/2007, -1/+3The script he wrote is in bash.
- lolwtfhaha, on 10/12/2007, -0/+2zoom knows his ***** listen to him. We used rsync daily for ~5 million files it literally took all night and created a huge spike in memory and disk io. It ended up being more efficient to just use tar since it didn't create the huge memory spike. The disk IO still sucked though. We then switched to using tar over nfs instead of ssh since #1 the encryption was unnecessary and #2 the bottleneck of slow small file access w/ nfs became an advantage-- takes 1/2 the day now though but no noticeable performance impact. We're about to switch to a system where the web applications actually maintain a filelist of changed items that we can feed to cpio.
backups suck - corneliusroot, on 10/12/2007, -0/+2The intermediate file was done because I often go back and use it for other things as well. You could just as easily put something like
users=`cd /home/; ls -1d */ | sed 's////`
In my case I am migrating from an old server to a new, and there are no new users being put on the old system and so I do not have to worry about generating the variable dynamically.
I'm no 'sed' genius so I defer to the comment above about using sed 's#/##'. Good idea!
Lastly this script is for *syncing* data that has already been copied, in my case. I should have made that clear in the article.
Otherwise I hope it helps somebody out. - geronimo, on 10/12/2007, -0/+2If you use LVM you can get a snapshot very quickly and it's consistent with the filesystem. The only downside is it's not a remote backup solution so you still need something to do a remote backup.
- inactive, on 10/12/2007, -0/+2Delta copies is THE reason to use rsync. This makes the copy recoverable. If your process is interrupted and you're using another method, you're going to end up starting the whole thing over again. This saves bandwidth.
rsync -avzt --stats
Easy. - greyfade, on 10/12/2007, -0/+2@sancho: fuse+sshfs
:D - jellyroll713, on 10/12/2007, -8/+9You're a dick.
- pkulak, on 10/12/2007, -0/+1This is how I backup my Mac to my Linux server. It's great. I can add a CD to my iTunes library, run the backup app (just a small AppleScript) and it finds the new songs, plus some iTunes config files that have changed, and sends just them over the network. Takes no time at all. I love rsync.
- Nocturnal, on 10/12/2007, -0/+1For those who use Windows, Microsoft's SyncToy is a Godsend. I use it to do data backup for clients and I'm talking transferring pretty much their entire drive (just so I don't miss a file or two that they may need after I've completed the back up). I used to use the command line and something from the Windows 2003 toolkit.
- inactive, on 10/12/2007, -0/+1Unison is also good for bidirectional syncs, which is its biggest selling point in my opinion.
- heffae, on 10/12/2007, -0/+1Robocopy is very good if you are in a windows world but I think rsync is better. If it just something you are going to do once and you are in a windows world use robocopy. If this something you are going to do a lot I find setting up cygwin and rsync is worth the hassle.
This is purely subjective but I've found that rsync handles copying large amounts of data over unreliable or slow connections (such as an VPN tunnel between two sites on different continents) better than robocopy. - AngryBoy, on 10/12/2007, -1/+2Not only is it less than novel, it's also less than the most efficient way to do it. The extra encryption layer SSH adds gives a slight performance hit. If you're moving "mass amounts of data", it can become a non-trivial amount.
If you're just doing a server to server migration, there's far more efficient methods. - RealHyperX, on 10/12/2007, -0/+1Can someone tell me how to use rsync or something similar in windows? I have an 80 gig file that gets generated daily (exchange backup) and I keep writing it to a nas daily. I would love to send it over to some off site server, and the only thing that will do it is double take for windows. Anyone used rsync in windows before?
- NTolerance, on 10/12/2007, -0/+1That makes sense since I was copying to a Linux server.
- rvprasad, on 10/12/2007, -0/+1Unison (http://www.cis.upenn.edu/~bcpierce/unison/) can be a useful tool to setup synchronization configurations and use them repetitively. It comes with a useful GUI. Currently, it is available for Unix, Windows, and Mac.
- krux, on 10/12/2007, -0/+1well you're using windows... robocopy is good for windows to windows. rsync can handle high-ascii filenames, but you can run into some translation problems between the way windows handles them and the way linux handles them.
- alexvalentine, on 10/12/2007, -1/+2Umm how is this news and why make it so complicated? rsync -avz source destination
- NTolerance, on 10/12/2007, -0/+1I tried using rsync under Cygwin and then I found out that rsync can't handle high-ASCII filenames and mangled a bunch of my mp3s. I gave up and started using Robocopy.
- anastrophe, on 10/12/2007, -0/+1"No digg, because rsync is not suitable for mass amounts of data. Not at all. "
yet what you describe is not a shortcoming having to do with moving mass amounts of data. it's a shortcoming having to do with moving mass numbers of files. that's not a trivial difference in meaning.
furthermore, if you're using RAID-5, then that's a potentially big part of the bottleneck, if you don't have your filesystem mounted -noatime. you're forcing a write of the attributes on each and every one of those files when you run rsync on the filesystem if atimes are being updated - and in raid five, that's an enormous penalty. - rileyschuit, on 10/12/2007, -0/+1word
- lpmusix, on 10/12/2007, -0/+1@anastrophe
Trolling? I asked a question. Is there something _wrong_ with that? - anastrophe, on 10/12/2007, -0/+1don't be a dork. or should i use your method: are you trying to be a trolling, provocative, dimwitted, out-of-touch jerkoff? if so, you sir, are a bloody idiot, who thinks that presenting an asked-and-answered troll question isn't trolling.
get it? i didn't *actually* call you any of those things, i merely implied them as fact. see how lame your method is? - inactive, on 10/12/2007, -0/+1GoodSync is free and its better.
- Khabi, on 10/12/2007, -0/+1For the record, when sending files over ssh/scp/rsync using blowfish will help speed things up. If they're older machines turning off compression may help as well.
- inactive, on 10/12/2007, -0/+1It seems to me that people who cry that all the time might be the ones having trouble getting the good stuff.
I, frankly, have no problem in that department.
So I guess the next time someone needs a solution for copying data over a WAN, you're the one *NOT* to go to. - bofu, on 10/12/2007, -0/+1dmscp2 anyone?? anyone?
- lpmusix, on 10/12/2007, -1/+2@anastrophe
Are you suggesting not keeping backups if you have raid? If so, you my friend, are a bloody idiot. - inactive, on 10/12/2007, -0/+1Deltacopy http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp is an rsync implementation for windows .
- lolwtfhaha, on 10/12/2007, -0/+1god help you if you have meta data
- nailer, on 10/12/2007, -0/+1Angryboy:
bash 3, when called as sh, will act as Boune shell. No functions, no arrays, no cool file descriptor stuff.
As well as not working with bash 3, bash scripts that are called as Bourne shell will make old Unix wizards angry. - anastrophe, on 10/12/2007, -1/+1it's worth noting also that the title of the article is wrong. bash has nothing to do with it at all. it's like saying "copying mass amounts of data over a network with a keyboard, rsync, and ssh".
unless you're suggesting that the steps you outlined won't work if someone is using tcsh.
sheesh. -
Show 51 - 66 of 66 discussions



What is Digg?
Check out the new & improved