Lexicographically sorting large files in Linux

When I hear the word “sort” my first thought is usually “Hadoop”! Yes, sorting is one thing that Hadoop does well, but if you’re working with large files in Linux the built-in sort command is often all you need.

Let’s say you have a large file on a host with 2GB or more of main memory free. The following sort command is a efficient way to lexicographically-order large files.

LC_COLLATE=C sort --buffer-size=1G --temporary-directory=./tmp --unique bigfile.txt

Let’s break this command down and examine each part in detail.


Warning: count(): Parameter must be an array or an object that implements Countable in /var/www/vhosts/shan.info/httpdocs/templates/gk_publisher/html/com_k2/templates/default/item.php on line 169

Notice: Only variables should be assigned by reference in /var/www/vhosts/shan.info/httpdocs/templates/gk_publisher/html/com_k2/templates/default/item.php on line 478
back to top