UHTML(1) UHTML(1)
NAME
uhtml - convert foreign character set HTML file to unicode
SYNOPSIS
uhtml [ -p ] [ -c charset ] [ file ]
DESCRIPTION
HTML comes in various character set encodings and has spe-
cial forms to encode characters. To make it easier to pro-
cess html, uhtml is used to normalize it to a unicode only
form.
Uhtml detects the character set of the html input file and
calls tcs(1) to convert it to utf replacing html-entity
forms by ther unicode character representations except for
lt gt amp quot and apos . The converted html is written to
standard output. If no file was given, it is read from stan-
dard input. If the -p option is given, the detected charac-
ter set is printed and the program exits without conversion.
In case character set detection fails, the default (utf) is
assumed. This default can be changed with the -c option.
SOURCE
/sys/src/cmd/uhtml.c
SEE ALSO
tcs(1)
Page 1 Plan 9 (printed 12/16/25)