Cygwin Commands for Text Manipulation
DOS has a few commands like find, findstr and sort which can be used to manipulate text. ($ help <cmd> shows the usage help for all these commands.) However, their usage is very different from those that a Unix user may be familiar with.
One of the best things about Cygwin is that these commands useful for “quick and dirty” work on large chunks of text data are easy to learn, are consistent across all Unix-based OSs, have a number of options, and are quite extensive. When used in combination, they can save a lot of programming.
($ <command> <option> <inputs> is the general format for almost any of these commands. $ <command> –help shows the usage information.)
Note that all the commands below work on text files, not just any files with text content in them. e.g. a .DOC or a .HTML contains many other headers which are also taken into account when processed by these commands.
wc: wc is short for “word count”, but it counts more than just words. It can count chars (-m), lines (-l) apart from words (-w).
uniq: When files have a large number of lines with many identical ones, uniq returns only once all lines among the input files after discarding repetitions. –c also gives the number of times each line has been repeated, –u returns only the unique lines discarding any line which has been repeated more than once, –i considers two lines non-unique if they are only differing the case.
sort: Say you have several files with numbers or words that need to be sorted. Sort can sort a concatenation of several files at once, where each line is considered a separate value to be sorted. The sorting order can be specified as in –d means dictionary-order, –f means ignore case, –n means numeric sort, –u means sort only unique lines.
grep: Short for “get regular expression”, grep can be used to search for a specific string or string pattern (called regular expression) in given text. $ grep “grep” <this-post> shows all the lines with the string “grep” in them. –c is used to count the number of occurrences, –i is used to ignore case, –v is used to invert the result.
sed: sed is used primarily to find and replace a string or string pattern from files. The most common usage is $ sed ‘s/<find-string>/<replace-with-string>/g’ which replaces all occurrences of <find-string> with <replace-with-string>.














