Multi-file, multi-line find/replace with Perl

November 27, 2012 under Main

A customer recently contacted me for assistance. Their PC had contracted a virus which had installed a keylogger, which in turn had been used to steal their FTP password. The attacker logged into their FTP account and infected almost every file in the site with a couple lines of Javascript which attempted to install a trojan on the PC of anyone visiting their site.

Unfortunately they didn’t have a clean copy of the site files and since the infection had happened some time ago, our own backups only had infected copies of their files. This left no option but to try to clean up their site by removing the infection.

Thankfully the attacker had thoughtfully surrounded every line of inserted code with HTML comments. This made the cleanup easier as we could attempt a find/replace across all files replacing everything between the start and end comments with an empty string, thus removing the code from the site. An example:

<!--7e5a0c-->
<script type="text/javascript" language="javascript" >nefarious code here</script>
<!--/7e5a0c-->

My first thought was to use a GUI editor like BBEdit on Mac which has a great find/replace function, but the site had many thousands of files totaling 2.6GB. Downloading, cleaning and re-uploading would take hours.

Instead I turned to my trusty friend Perl in collaboration with find and xargs. The solution ended up to be very simple, just one line on the Linux terminal:

find /path/to/webroot -type f -print0 | xargs -0 perl -0777 -i -pe ‘BEGIN{undef $/;} s/7e5a0c.*7e5a0c//smg’

Breaking this down, we get:

find: find everything of type file in the given directory and print this list, using the null character as a separator instead of newline so xargs doesn’t choke in files with spaces in their names.

xargs: feed the results of the find command as an argument to the following command. -0 says expect null as separator instead of newline.

perl: -0777 sets slurp mode, ie. read the input file in one go. -i says replace in-place (don’t write out a new file), -p says iterate over given files in a sed-like manner, and -e says execute the perl code given as the following argument. The code has 2 parts: 1. undef the record separator which defaults to newline so we can match mutliple lines. 2. A find/replace regular expression (s = substitute). Modifiers (after theĀ  regex): m = treat string as multi-line, s = treat string as a single line, g = global matching. See perlre for more details.

The whole thing took just a few seconds to complete, searching 11028 files and removing all instances of the infection. Success!

comments: 0 »
Subscribe