UHTML(1) UHTML(1) NAME uhtml - convert foreign character set HTML file to unicode SYNOPSIS uhtml [ -p ] [ -c charset ] [ file ] DESCRIPTION HTML comes in various character set encodings and has spe- cial forms to encode characters. To make it easier to pro- cess html, uhtml is used to normalize it to a unicode only form. Uhtml detects the character set of the html input file and calls tcs(1) to convert it to utf replacing html-entity forms by ther unicode character representations except for lt gt amp quot and apos . The converted html is written to standard output. If no file was given, it is read from stan- dard input. If the -p option is given, the detected charac- ter set is printed and the program exits without conversion. In case character set detection fails, the default (utf) is assumed. This default can be changed with the -c option. SOURCE /sys/src/cmd/uhtml.c SEE ALSO tcs(1) Page 1 Plan 9 (printed 12/4/24)