=head1 NAME File::FilterFuncs - specify filter functions for files =head1 SYNOPSIS use File::FilterFuncs qw(filters); filters('source.txt', sub { $_ = uc $_; 1 }, 'dest.txt' ); =head1 INTRODUCTION C makes it easy to perform transformations on files. When you use this module, you specify a group of filter functions that perform transformations on the lines in a source file. Those transformed lines are written to the destination file that you specify. For example, this code converts an entire file to upper-case, line-by-line: use File::FilterFuncs qw(filters); filters('source.txt', sub { $_ = uc $_; 1 }, 'dest.txt' ); The "1" at the end of the filter subroutine tells C to keep all the lines. The filter subroutine should return 1 for any lines that should be kept, and it should return 0 for any lines that should be ignored. This program copies only lines that contain something besides just whitespace: use File::FilterFuncs qw(filters); filters('source.txt', sub { /\S/ }, 'dest.txt' ); The entire source file is not read into memory. Instead it is read one line at a time, and the destination file is written one line at a time. Just as Perl's concept of a line can be changed by setting C<$/>, so the C function's idea of a line can also be changed by specifying a value for C<$/> in the call to C: my $pad = "\0" x 2; filters('source.dat', '$/' => 1022, sub { $_ .= $pad; 1 }, 'dest.dat' ); Filter functions are invoked in the order in which they are seen. This code upper-cases then puts inside parenthses every line in 'source.txt' and copies the output to 'dest.txt': filters ('source.txt', sub { $_ = uc $_; 1 }, sub { chomp $_; $_ = "($_)\n"; 1 }, 'dest.txt' ); Obviously, the current line that is being worked on is in C<$_>. The C subroutine expects its first argument to be the name of the source file, and the last argument should be the name of the destination file. The function C will C if either one of the file names is missing or if they are inaccessible. =head1 OPTIONS A few options determine how the C subroutine works. =over =item C C lets you specify a layer to be used for the input data. For example, this will read a utf-8 file and write the data using the default output layer: filters ( 'source.txt', binmode => ':utf8', 'dest.txt', ); =item C C lets the programmer specify a layer to be used for writing the output data. For example, this code on a Linux platform should read text data using the Linux end-of-line format and write it using the DOS (CRLF) end-of-line format: filters ( 'source.txt', boutmode => ':crlf', 'dest.txt', ); =item C<$/> Setting C<$/> lets you determine how an end-of-line is recognized. Set this option to the same value that you would set the C<$/> variable to in a program. For example, suppose a file contains this: ABCDEFGHIJKL The following program should write three letters at a time to the output file: filters ( 'source.txt', '$/' => \3, sub { $_ = "$_\n"; 1 }, 'dest.txt', ); =back =head1 NOTES =over =item Alternate function name If you consider the function name C to be too generic, you can import the name C instead. =item Convenience return values For the programmer's convenience and to facilitate self-documenting code, the values C<$KEEP_LINE> and C<$IGNORE_LINE> can be exported. As an example, this is another program to filter out lines containing only whitespace: use File::FilterFuncs qw(filters $IGNORE_LINE); filters('source.txt', sub { return $IGNORE_LINE unless /\S/ }, 'dest.txt' ); =back =head1 BUGS The source and destination files cannot be the same. If the source and destionation files have the same name and path, C dies with an appropriate error message. If symbolic or hard-linking is used to give the same file two different names, the results are undefined. E-mail bug reports to mumia.w.18.spam+nospam [at] earthlink.net . =head1 CREDITS Thanks go to Uri Guttman for several helpful suggestions including enabling the slurp and paragraph modes and dealing with filtering a file onto itself. Andy also commented on the need to explain or simplify the use of the callback filter functions. =head1 TODO =over =item * Allow file handles to be used for input and output. =back =head1 AUTHOR Copyright 2007 Mumia Wotse Mumia Wotse This program is under the General Public License (GPL). =cut vim: wm=15 et