SORT(1) User Commands SORT(1)

NAME


sort - sort, merge, or sequence check text files

SYNOPSIS


/usr/bin/sort [-bcdfimMnru] [-k keydef] [-o output]
[-S kmem] [-t char] [-T directory] [-y [kmem]]
[-z recsz] [+pos1 [-pos2]] [file]...


/usr/xpg4/bin/sort [-bcdfimMnru] [-k keydef] [-o output]
[-S kmem] [-t char] [-T directory] [-y [kmem]]
[-z recsz] [+pos1 [-pos2]] [file]...


DESCRIPTION


The sort command sorts lines of all the named files together and writes
the result on the standard output.


Comparisons are based on one or more sort keys extracted from each line
of input. By default, there is one sort key, the entire input line. Lines
are ordered according to the collating sequence of the current locale.

OPTIONS


The following options alter the default behavior:

/usr/bin/sort
-c
Checks that the single input file is ordered as specified by the
arguments and the collating sequence of the current locale. The
exit code is set and no output is produced unless the file is out
of sort.


/usr/xpg4/bin/sort
-c
Same as /usr/bin/sort except no output is produced under
any circumstances.


-m
Merges only. The input files are assumed to be already
sorted.


-o output
Specifies the name of an output file to be used instead
of the standard output. This file can be the same as one
of the input files.


-S kmem
Specifies the maximum amount of swap-based memory used
for sorting, in kilobytes (the default unit). kmem can
also be specified directly as a number of bytes (b),
kilobytes (k), megabytes (m), gigabytes (g), or terabytes
(t); or as a percentage (%) of the installed physical
memory.


-T directory
Specifies the directory in which to place temporary
files.


-u
Unique: suppresses all but one in each set of lines
having equal keys. If used with the -c option, checks
that there are no lines with duplicate keys in addition
to checking that the input file is sorted.


-y kmem
(obsolete). This option was used to specify the amount of
main memory initially used by sort. Its functionality is
not appropriate for a virtual memory system; memory usage
for sort is now specified using the -S option.


-z recsz
(obsolete). This option was used to prevent abnormal
termination when lines longer than the system-dependent
default buffer size are encountered. Because sort
automatically allocates buffers large enough to hold the
longest line, this option has no effect.


Ordering Options


The default sort order depends on the value of LC_COLLATE. If LC_COLLATE
is set to C, sorting is in ASCII order. If LC_COLLATE is set to en_US,
sorting is case insensitive except when the two strings are otherwise
equal and one has an uppercase letter earlier than the other. Other
locales have other sort orders.


The following options override the default ordering rules. When ordering
options appear independent of any key field specifications, the requested
field ordering rules are applied globally to all sort keys. When attached
to a specific key (see Sort Key Options), the specified ordering options
override all global ordering options for that key. In the obsolescent
forms, if one or more of these options follows a +pos1 option, it affects
only the key field specified by that preceding option.

-d
Dictionary order: only letters, digits, and blanks (spaces and
tabs) are significant in comparisons.


-f
Folds lower-case letters into upper case.


-i
Ignores non-printable characters.


-M
Compares as months. The first three non-blank characters of the
field are folded to upper case and compared. For example, in
English the sorting order is "JAN" < "FEB" < ... < "DEC". Invalid
fields compare low to "JAN". The -M option implies the -b option
(see below).


-n
Restricts the sort key to an initial numeric string, consisting of
optional blank characters, optional minus sign, and zero or more
digits with an optional radix character and thousands separators
(as defined in the current locale), which is sorted by arithmetic
value. An empty digit string is treated as zero. Leading zeros
and signs on zeros do not affect ordering.


-r
Reverses the sense of comparisons.


Field Separator Options


The treatment of field separators can be altered using the following
options:

-b
Ignores leading blank characters when determining the starting
and ending positions of a restricted sort key. If the -b
option is specified before the first sort key option, it is
applied to all sort key options. Otherwise, the -b option can
be attached independently to each -k field_start, field_end,
or +pos1 or -pos2 option-argument (see below).


-t char
Use char as the field separator character. char is not
considered to be part of a field (although it can be included
in a sort key). Each occurrence of char is significant (for
example, <char><char> delimits an empty field). If -t is not
specified, blank characters are used as default field
separators; each maximal non-empty sequence of blank
characters that follows a non-blank character is a field
separator.


Sort Key Options


Sort keys can be specified using the options:

-k keydef
The keydef argument is a restricted sort key field
definition. The format of this definition is:

-k field_start [type] [,field_end [type] ]


where:

field_start and field_end

define a key field restricted to a portion of the
line.


type

is a modifier from the list of characters bdfiMnr.
The b modifier behaves like the -b option, but
applies only to the field_start or field_end to
which it is attached and characters within a field
are counted from the first non-blank character in
the field. (This applies separately to
first_character and last_character.) The other
modifiers behave like the corresponding options,
but apply only to the key field to which they are
attached. They have this effect if specified with
field_start, field_end or both. If any modifier
is attached to a field_start or to a field_end, no
option applies to either.

When there are multiple key fields, later keys are
compared only after all earlier keys compare equal.
Except when the -u option is specified, lines that
otherwise compare equal are ordered as if none of the
options -d, -f, -i, -n or -k were present (but with -r
still in effect, if it was specified) and with all
bytes in the lines significant to the comparison.

The notation:

-k field_start[type][,field_end[type]]


defines a key field that begins at field_start and
ends at field_end inclusive, unless field_start falls
beyond the end of the line or after field_end, in
which case the key field is empty. A missing field_end
means the last character of the line.

A field comprises a maximal sequence of non-separating
characters and, in the absence of option -t, any
preceding field separator.

The field_start portion of the keydef option-argument
has the form:

field_number[.first_character]


Fields and characters within fields are numbered
starting with 1. field_number and first_character,
interpreted as positive decimal integers, specify the
first character to be used as part of a sort key. If
.first_character is omitted, it refers to the first
character of the field.

The field_end portion of the keydef option-argument
has the form:

field_number[.last_character]


The field_number is as described above for
field_start. last_character, interpreted as a non-
negative decimal integer, specifies the last character
to be used as part of the sort key. If last_character
evaluates to zero or .last_character is omitted, it
refers to the last character of the field specified by
field_number.

If the -b option or b type modifier is in effect,
characters within a field are counted from the first
non-blank character in the field. (This applies
separately to first_character and last_character.)


[+pos1 [-pos2]]
(obsolete). Provide functionality equivalent to the
-kkeydef option.

pos1 and pos2 each have the form m.n optionally
followed by one or more of the flags bdfiMnr. A
starting position specified by +m.n is interpreted to
mean the n+1st character in the m+1st field. A missing
.n means .0, indicating the first character of the
m+1st field. If the b flag is in effect n is counted
from the first non-blank in the m+1st field; +m.0b
refers to the first non-blank character in the m+1st
field.

A last position specified by -m.n is interpreted to
mean the nth character (including separators) after
the last character of the mth field. A missing .n
means .0, indicating the last character of the mth
field. If the b flag is in effect n is counted from
the last leading blank in the m+1st field; -m.1b
refers to the first non-blank in the m+1st field.

The fully specified +pos1 -pos2 form with type
modifiers T and U:

+w.xT -y.zU


is equivalent to:

undefined (z==0 & U contains b & -t is present)
-k w+1.x+1T,y.0U (z==0 otherwise)
-k w+1.x+1T,y+1.zU (z > 0)


Implementations support at least nine occurrences of
the sort keys (the -k option and obsolescent +pos1 and
-pos2) which are significant in command line order. If
no sort key is specified, a default sort key of the
entire line is used.


OPERANDS


The following operand is supported:

file
A path name of a file to be sorted, merged or checked. If no file
operands are specified, or if a file operand is -, the standard
input is used.


USAGE


See largefile(7) for the description of the behavior of sort when
encountering files greater than or equal to 2 Gbyte ( 2^31 bytes).

EXAMPLES


In the following examples, first the preferred and then the obsolete way
of specifying sort keys are given as an aid to understanding the
relationship between the two forms.

Example 1: Sorting with the Second Field as a sort Key




Either of the following commands sorts the contents of infile with the
second field as the sort key:


example% sort -k 2,2 infile
example% sort +1 -2 infile


Example 2: Sorting in Reverse Order




Either of the following commands sorts, in reverse order, the contents of
infile1 and infile2, placing the output in outfile and using the second
character of the second field as the sort key (assuming that the first
character of the second field is the field separator):


example% sort -r -o outfile -k 2.2,2.2 infile1 infile2
example% sort -r -o outfile +1.1 -1.2 infile1 infile2


Example 3: Sorting Using a Specified Character in One of the Files




Either of the following commands sorts the contents of infile1 and
infile2 using the second non-blank character of the second field as the
sort key:


example% sort -k 2.2b,2.2b infile1 infile2
example% sort +1.1b -1.2b infile1 infile2


Example 4: Sorting by Numeric User ID




Either of the following commands prints the passwd(5) file (user
database) sorted by the numeric user ID (the third colon-separated
field):


example% sort -t : -k 3,3n /etc/passwd
example% sort -t : +2 -3n /etc/passwd


Example 5: Printing Sorted Lines Excluding Lines that Duplicate a Field




Either of the following commands prints the lines of the already sorted
file infile, suppressing all but one occurrence of lines having the same
third field:


example% sort -um -k 3.1,3.0 infile
example% sort -um +2.0 -3.0 infile


Example 6: Sorting by Host IP Address




Either of the following commands prints the hosts(5) file (IPv4 hosts
database), sorted by the numeric IP address (the first four numeric
fields):


example$ sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n /etc/hosts
example$ sort -t . +0 -1n +1 -2n +2 -3n +3 -4n /etc/hosts


Since '.' is both the field delimiter and, in many locales, the decimal
separator, failure to specify both ends of the field leads to results
where the second field is interpreted as a fractional portion of the
first, and so forth.


ENVIRONMENT VARIABLES


See environ(7) for descriptions of the following environment variables
that affect the execution of sort: LANG, LC_ALL, LC_COLLATE, LC_MESSAGES,
and NLSPATH.

LC_CTYPE
Determine the locale for the interpretation of sequences of
bytes of text data as characters (for example, single-
versus multi-byte characters in arguments and input files)
and the behavior of character classification for the -b,
-d, -f, -i and -n options.


LC_NUMERIC
Determine the locale for the definition of the radix
character and thousands separator for the -n option.


EXIT STATUS


The following exit values are returned:

0
All input files were output successfully, or -c was specified and
the input file was correctly sorted.


1
Under the -c option, the file was not ordered as specified, or if
the -c and -u options were both specified, two input lines were
found with equal keys.


>1
An error occurred.


FILES


/var/tmp/stm???
Temporary files


ATTRIBUTES


See attributes(7) for descriptions of the following attributes:

/usr/bin/sort


+---------------+-----------------+
|ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+---------------+-----------------+
|CSI | Enabled |
+---------------+-----------------+

/usr/xpg4/bin/sort


+--------------------+-----------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+--------------------+-----------------+
|CSI | Enabled |
+--------------------+-----------------+
|Interface Stability | Standard |
+--------------------+-----------------+

SEE ALSO


comm(1), join(1), uniq(1), nl_langinfo(3C), strftime(3C), hosts(5),
passwd(5), attributes(7), environ(7), largefile(7), standards(7)

DIAGNOSTICS


Comments and exits with non-zero status for various trouble conditions
(for example, when input lines are too long), and for disorders
discovered under the -c option.

NOTES


When the last line of an input file is missing a new-line character, sort
appends one, prints a warning message, and continues.


sort does not guarantee preservation of relative line ordering on equal
keys.


One can tune sort performance for a specific scenario using the -S
option. However, one should note in particular that sort has greater
knowledge of how to use a finite amount of memory for sorting than the
virtual memory system. Thus, a sort invoked to request an extremely large
amount of memory via the -S option could perform extremely poorly.


As noted, certain of the field modifiers (such as -M and -d) cause the
interpretation of input data to be done with reference to locale-specific
settings. The results of this interpretation can be unexpected if one's
expectations are not aligned with the conventions established by the
locale. In the case of the month keys, sort does not attempt to
compensate for approximate month abbreviations. The precise month
abbreviations from nl_langinfo(3C) or strftime(3C) are the only ones
recognized. For printable or dictionary order, if these concepts are not
well-defined by the locale, an empty sort key might be the result,
leading to the next key being the significant one for determining the
appropriate ordering.

November 19, 2001 SORT(1)