Seth Woolley's Man Viewer

Encoding(3) - Tcl_CreateEncoding, Tcl_ExternalToUtf, Tcl_ExternalToUtfDString, Tcl_FreeEncoding, Tcl_GetDefaultEncodingDir, Tcl_GetEncoding, Tcl_GetEncodingName, Tcl_GetEncodingNames, Tcl_SetDefaultEncodingDir, Tcl_SetSystemEncoding, Tcl_UtfToExternal, Tcl_UtfToExternalDString, Tcl_WinTCharToUtf, Tcl_WinUtfToTChar, Tcl_CreateEncoding, Tcl_ExternalToUtf, Tcl_ExternalToUtfDString, Tcl_FreeEncoding, Tcl_GetDefaultEncodingDir, Tcl_GetEncoding, Tcl_GetEncodingName, Tcl_GetEncodingNames, Tcl_SetDefaultEncodingDir, Tcl_SetSystemEncoding, Tcl_UtfToExternal, Tcl_UtfToExternalDString, Tcl_WinTCharToUtf, Tcl_WinUtfToTChar - procedures for creating and using encodings - man 3 Encoding

([section] manual, -k keyword, -K [section] search, -f whatis)
man plain no title

Tcl_GetEncoding(3)          Tcl Library Procedures          Tcl_GetEncoding(3)



NAME
       Tcl_GetEncoding,       Tcl_FreeEncoding,      Tcl_ExternalToUtfDString,
       Tcl_ExternalToUtf,     Tcl_UtfToExternalDString,     Tcl_UtfToExternal,
       Tcl_WinTCharToUtf,  Tcl_WinUtfToTChar, Tcl_GetEncodingName, Tcl_SetSys-
       temEncoding, Tcl_GetEncodingNames, Tcl_CreateEncoding,  Tcl_GetDefault-
       EncodingDir,  Tcl_SetDefaultEncodingDir  -  procedures for creating and
       using encodings.

SYNOPSIS
       #include <tcl.h>

       Tcl_Encoding
       Tcl_GetEncoding(interp, name)

       void
       Tcl_FreeEncoding(encoding(3,n))

       char *
       Tcl_ExternalToUtfDString(encoding(3,n), src, srcLen, dstPtr)

       int
       Tcl_ExternalToUtf(interp, encoding(3,n), src, srcLen, flags, statePtr, dst, dstLen, srcReadPtr, dstWrotePtr,
            dstCharsPtr)

       char *
       Tcl_UtfToExternalDString(encoding(3,n), src, srcLen, dstPtr)

       int
       Tcl_UtfToExternal(interp, encoding(3,n), src, srcLen, flags, statePtr, dst, dstLen, srcReadPtr, dstWrotePtr,
            dstCharsPtr)

       char *
       Tcl_WinTCharToUtf(tsrc, srcLen, dstPtr)

       TCHAR *
       Tcl_WinUtfToTChar(src, srcLen, dstPtr)

       CONST char *
       Tcl_GetEncodingName(encoding(3,n))

       int
       Tcl_SetSystemEncoding(interp, name)

       void
       Tcl_GetEncodingNames(interp)

       Tcl_Encoding
       Tcl_CreateEncoding(typePtr)

       CONST char *
       Tcl_GetDefaultEncodingDir(void)

       void
       Tcl_SetDefaultEncodingDir(path)



ARGUMENTS
       Interpreter to use for error(8,n) reporting, or NULL if(3,n) no  error(8,n)  reporting
       is desired.  Name of encoding(3,n) to load.  The encoding(3,n) to query, free, or
       use for converting text.  If  encoding(3,n)  is  NULL,  the  current  system
       encoding(3,n)  is  used.   For  the Tcl_ExternalToUtf functions, an array of
       bytes in(1,8) the specified encoding(3,n) that are to be converted to UTF-8.  For
       the  Tcl_UtfToExternal  and  Tcl_WinUtfToTChar  functions,  an array of
       UTF-8 characters to be converted to the specified encoding.   An  array
       of Windows TCHAR characters to convert to UTF-8.  Length of src or tsrc
       in(1,8) bytes.  If the length is negative, the encoding-specific  length  of
       the string(3,n) is used.  Pointer to an uninitialized or free Tcl_DString in(1,8)
       which the converted result will be stored.   Various  flag  bits  OR-ed
       together.   TCL_ENCODING_START  signifies that the source buffer is the
       first block in(1,8) a (potentially multi-block) input  stream,  telling  the
       conversion  routine  to  reset(1,7,1 tput) to an initial state and perform any ini-
       tialization that needs to occur before the  first  byte  is  converted.
       TCL_ENCODING_END  signifies that the source buffer is the last block in(1,8)
       a (potentially multi-block) input stream, telling the  conversion  rou-
       tine  to  perform  any  finalization that needs to occur after the last
       byte is converted and then to reset(1,7,1 tput) to an  initial  state.   TCL_ENCOD-
       ING_STOPONERROR  signifies  that  the  conversion routine should return
       immediately upon reading a source character that doesn't exist  in(1,8)  the
       target  encoding(3,n); otherwise a default fallback character will automati-
       cally be substituted.  Used when converting a (generally long or indef-
       inite  length) byte stream in(1,8) a piece by piece fashion.  The conversion
       routine stores its current state in(1,8) *statePtr  after  src  (the  buffer
       containing  the  current piece) has been converted; that state informa-
       tion must be passed back when converting the next piece of  the  stream
       so  the  conversion routine knows what state it was in(1,8) when it left off
       at the end of the last piece.  May be NULL, in(1,8)  which  case  the  value
       specified for flags is ignored and the source buffer is assumed to con-
       tain the complete string(3,n) to convert.  Buffer  in(1,8)  which  the  converted
       result  will  be  stored.   No more than dstLen bytes will be stored in(1,8)
       dst.  The maximum length of the output buffer  dst  in(1,8)  bytes.   Filled
       with  the  number of bytes from src that were actually converted.  This
       may be less(1,3) than the original source length if(3,n) there was a problem con-
       verting  some  source characters.  May be NULL.  Filled with the number
       of bytes that were actually stored in(1,8) the output buffer as a result  of
       the  conversion.   May  be  NULL.  Filled with the number of characters
       that correspond to the number of bytes stored  in(1,8)  the  output  buffer.
       May be NULL.  Structure that defines a new type of encoding.  A path to
       the location of the encoding(3,n) file.

INTRODUCTION
       These routines convert between Tcl's internal character representation,
       UTF-8,  and character representations used by various operating systems
       or file(1,n) systems, such as Unicode, ASCII, or Shift-JIS.  When  operating
       on  strings, such as such as obtaining the names of files or displaying
       characters using international fonts, the strings  must  be  translated
       into one or possibly multiple formats that the various system calls can
       use.  For instance, on a Japanese Unix workstation, a user might obtain
       a  filename  represented in(1,8) the EUC-JP file(1,n) encoding(3,n) and then translate
       the characters to the jisx0208 font encoding(3,n) in(1,8) order  to  display  the
       filename  in(1,8)  a  Tk  widget.  The purpose of the encoding(3,n) package is to
       help bridge the translation gap.  UTF-8 provides an intermediate  stag-
       ing  ground  for all the various encodings.  In the example above, text
       would be translated into UTF-8 from whatever file(1,n) encoding(3,n) the  operat-
       ing system is using.  Then it would be translated from UTF-8 into what-
       ever font encoding(3,n) the display routines require.

       Some basic encodings are compiled into Tcl.  Others can be  defined  by
       the  user or dynamically loaded from encoding(3,n) files in(1,8) a platform-inde-
       pendent manner.

DESCRIPTION
       Tcl_GetEncoding finds an encoding(3,n) given its name.  The name  may  refer
       to  a builtin Tcl encoding(3,n), a user-defined encoding(3,n) registered by call-
       ing Tcl_CreateEncoding, or a dynamically-loadable encoding(3,n)  file.   The
       return value is a token that represents the encoding(3,n) and can be used in(1,8)
       subsequent calls to procedures such as Tcl_GetEncodingName, Tcl_FreeEn-
       coding,  and Tcl_UtfToExternal.  If the name did not refer to any known
       or loadable encoding(3,n), NULL is returned and an error(8,n) message is returned
       in(1,8) interp.

       The encoding(3,n) package maintains a database of all encodings currently in(1,8)
       use.  The first time(1,2,n) name is seen, Tcl_GetEncoding returns an  encoding(3,n)
       with  a  reference  count  of 1.  If the same name is requested further
       times, then the reference count for that encoding(3,n) is incremented  with-
       out  the  overhead  of allocating a new encoding(3,n) and all its associated
       data structures.

       When an encoding(3,n) is no longer needed, Tcl_FreeEncoding should be called
       to release it.  When an encoding(3,n) is no longer in(1,8) use anywhere (i.e., it
       has been freed as many times as it has  been  gotten)  Tcl_FreeEncoding
       will  release all storage the encoding(3,n) was using and delete it from the
       database.

       Tcl_ExternalToUtfDString converts a source buffer src from  the  speci-
       fied  encoding(3,n)  into  UTF-8.  The converted bytes are stored in(1,8) dstPtr,
       which is then  null-terminated.   The  caller  should  eventually  call
       Tcl_DStringFree  to  free  any information stored in(1,8) dstPtr.  When con-
       verting, if(3,n) any of the characters in(1,8) the source buffer cannot be repre-
       sented  in(1,8)  the  target  encoding(3,n), a default fallback character will be
       used.  The return value is  a  pointer  to  the  value  stored  in(1,8)  the
       DString.

       Tcl_ExternalToUtf  converts  a  source  buffer  src  from the specified
       encoding(3,n) into UTF-8.  Up to srcLen bytes are converted from the  source
       buffer  and  up  to  dstLen  converted bytes are stored in(1,8) dst.  In all
       cases, *srcReadPtr is filled with the number of bytes  that  were  suc-
       cessfully converted from src and *dstWrotePtr is filled with the corre-
       sponding number of bytes that were stored in(1,8) dst.  The return value  is
       one of the following:

              TCL_OK                       All bytes of src were converted.

              TCL_CONVERT_NOSPACE          The   destination  buffer  was  not
                                           large enough for all  of  the  con-
                                           verted  data; as many characters as
                                           could fit were converted though.

              TCL_CONVERT_MULTIBYTE        The last fews bytes in(1,8)  the  source
                                           buffer  were  the  beginning  of  a
                                           multibyte sequence, but more  bytes
                                           were   needed   to   complete  this
                                           sequence.  A subsequent call to the
                                           conversion  routine  should  pass a
                                           buffer containing  the  unconverted
                                           bytes  that  remained  in(1,8)  src plus
                                           some further bytes from the  source
                                           stream to properly convert the for-
                                           merly split-up multibyte  sequence.

              TCL_CONVERT_SYNTAX           The   source  buffer  contained  an
                                           invalid character  sequence.   This
                                           may  occur  if(3,n) the input stream has
                                           been damaged or if(3,n) the input encod-
                                           ing(3,n) method was misidentified.

              TCL_CONVERT_UNKNOWN          The source buffer contained a char-
                                           acter that could not be represented
                                           in(1,8)    the   target   encoding(3,n)   and
                                           TCL_ENCODING_STOPONERROR was speci-
                                           fied.

       Tcl_UtfToExternalDString  converts  a source buffer src from UTF-8 into
       the specified encoding(3,n).  The converted  bytes  are  stored  in(1,8)  dstPtr,
       which  is  then terminated with the appropriate encoding-specific null.
       The caller should eventually call Tcl_DStringFree to free any  informa-
       tion  stored  in(1,8)  dstPtr.  When converting, if(3,n) any of the characters in(1,8)
       the source buffer cannot be  represented  in(1,8)  the  target  encoding(3,n),  a
       default fallback character will be used.  The return value is a pointer
       to the value stored in(1,8) the DString.

       Tcl_UtfToExternal converts a source buffer  src  from  UTF-8  into  the
       specified  encoding(3,n).   Up to srcLen bytes are converted from the source
       buffer and up to dstLen converted bytes are  stored  in(1,8)  dst.   In  all
       cases,  *srcReadPtr  is  filled with the number of bytes that were suc-
       cessfully converted from src and *dstWrotePtr is filled with the corre-
       sponding  number  of  bytes that were stored in(1,8) dst.  The return values
       are the same as the return values for Tcl_ExternalToUtf.

       Tcl_WinUtfToTChar and Tcl_WinTCharToUtf  are  Windows-only  convenience
       functions for converting between UTF-8 and Windows strings.  On Windows
       95 (as with the Macintosh and  Unix  operating  systems),  all  strings
       exchanged  between  Tcl  and the operating system are "char" based.  On
       Windows NT, some strings exchanged between Tcl and the operating system
       are  "char"  oriented  while  others are in(1,8) Unicode.  By convention, in(1,8)
       Windows a TCHAR is a character in(1,8) the ANSI code page on Windows 95  and
       a Unicode character on Windows NT.

       If  you planned to use the same "char" based interfaces on both Windows
       95   and   Windows   NT,   you   could   use   Tcl_UtfToExternal    and
       Tcl_ExternalToUtf  (or  their Tcl_DString equivalents) with an encoding(3,n)
       of NULL (the current system encoding(3,n)).   On  the  other  hand,  if(3,n)  you
       planned to use the Unicode interface when running on Windows NT and the
       "char" interfaces when running on Windows 95, you would have to perform
       the  following  type  of  test over and over in(1,8) your program (as repre-
       sented in(1,8) pseudo-code): if(3,n) (running NT) {
           encoding(3,n) <- Tcl_GetEncoding("unicode");
           nativeBuffer <- Tcl_UtfToExternal(encoding(3,n), utfBuffer);
           Tcl_FreeEncoding(encoding(3,n)); } else {
           nativeBuffer  <-   Tcl_UtfToExternal(NULL,   utfBuffer);   Tcl_Win-
       UtfToTChar and Tcl_WinTCharToUtf automatically handle this test and use
       the proper encoding(3,n) based on the current  operating  system.   Tcl_Win-
       UtfToTChar  returns  a pointer to a TCHAR string(3,n), and Tcl_WinTCharToUtf
       expects a TCHAR string(3,n) pointer as the  src  string.   Otherwise,  these
       functions    behave   identically   to   Tcl_UtfToExternalDString   and
       Tcl_ExternalToUtfDString.

       Tcl_GetEncodingName is roughly the inverse of  Tcl_GetEncoding.   Given
       an  encoding(3,n),  the  return  value is the name argument that was used to
       create the encoding.  The string(3,n)  returned  by  Tcl_GetEncodingName  is
       only  guaranteed  to persist until the encoding(3,n) is deleted.  The caller
       must not modify this string.

       Tcl_SetSystemEncoding sets the default encoding(3,n)  that  should  be  used
       whenever  the user passes a NULL value for the encoding(3,n) argument to any
       of the other encoding(3,n) functions.  If name is NULL, the system  encoding(3,n)
       is  reset(1,7,1 tput)  to the default system encoding(3,n), binary.  If the name did not
       refer to any known or loadable encoding(3,n), TCL_ERROR is returned  and  an
       error(8,n)  message is left in(1,8) interp.  Otherwise, this procedure increments
       the reference count of the new system encoding(3,n), decrements  the  refer-
       ence count of the old system encoding(3,n), and returns TCL_OK.

       Tcl_GetEncodingNames sets the interp result to a list consisting of the
       names of all the encodings that are currently defined or can be dynami-
       cally  loaded, searching the encoding(3,n) path specified by Tcl_SetDefault-
       EncodingDir.  This procedure does not ensure that the dynamically-load-
       able encoding(3,n) files contain valid data, but merely that they exist.

       Tcl_CreateEncoding  defines  a  new encoding(3,n) and registers the C proce-
       dures that are called back to convert between the encoding(3,n)  and  UTF-8.
       Encodings  created  by Tcl_CreateEncoding are thereafter visible in(1,8) the
       database used by Tcl_GetEncoding.  Just  as  with  the  Tcl_GetEncoding
       procedure, the return value is a token that represents the encoding(3,n) and
       can be used in(1,8) subsequent calls to other encoding(3,n) functions.   Tcl_Cre-
       ateEncoding  returns  an  encoding(3,n)  with  a reference count of 1. If an
       encoding(3,n) with the specified name already exists, then its entry in(1,8)  the
       database  is  replaced  with  the  new  encoding(3,n); the token for the old
       encoding(3,n) will remain valid and continue to behave as before, but  users(1,5)
       of the new token will now call the new encoding(3,n) procedures.

       The  typePtr  argument to Tcl_CreateEncoding contains information about
       the name of the encoding(3,n) and the procedures that will be called to con-
       vert between this encoding(3,n) and UTF-8.  It is defined as follows:

       typedef   struct  Tcl_EncodingType  {       CONST  char  *encodingName;
            Tcl_EncodingConvertProc  *toUtfProc;       Tcl_EncodingConvertProc
       *fromUtfProc;        Tcl_EncodingFreeProc   *freeProc;       ClientData
       clientData;      int nullSize; } Tcl_EncodingType;

       The encodingName provides a string(3,n) name for the encoding(3,n), by  which  it
       can  be  referred  in(1,8)  other  procedures  such as Tcl_GetEncoding.  The
       toUtfProc refers to a callback procedure to invoke to convert text from
       this  encoding(3,n) into UTF-8.  The fromUtfProc refers to a callback proce-
       dure to invoke to convert text from  UTF-8  into  this  encoding.   The
       freeProc refers to a callback procedure to invoke when this encoding(3,n) is
       deleted.  The freeProc field may be NULL.  The clientData  contains  an
       arbitrary one-word value passed to toUtfProc, fromUtfProc, and freeProc
       whenever they are called.  Typically, this  is  a  pointer  to  a  data
       structure  containing encoding-specific information that can be used by
       the callback procedures.  For instance, two very similar encodings such
       as ascii(1,7) and macRoman may use the same callback procedure, but use dif-
       ferent values of clientData to  control  its  behavior.   The  nullSize
       specifies  the  number of zero bytes that signify end-of-string in(1,8) this
       encoding.  It must be 1 (for single-byte or multi-byte  encodings  like
       ASCII  or  Shift-JIS)  or  2  (for double-byte encodings like Unicode).
       Constant-sized encodings with 3 or more bytes per  character  (such  as
       CNS11643) are not accepted.

       The callback procedures toUtfProc and fromUtfProc should match the type
       Tcl_EncodingConvertProc:

       typedef  int   Tcl_EncodingConvertProc(        ClientData   clientData,
            CONST  char *src,      int srcLen,      int flags,      Tcl_Encod-
       ing *statePtr,      char *dst,      int dstLen,       int  *srcReadPtr,
            int *dstWrotePtr,      int *dstCharsPtr);

       The   toUtfProc   and   fromUtfProc   procedures   are  called  by  the
       Tcl_ExternalToUtf or Tcl_UtfToExternal family of functions  to  perform
       the actual conversion.  The clientData parameter to these procedures is
       the same as the clientData field specified to  Tcl_CreateEncoding  when
       the encoding(3,n) was created.  The remaining arguments to the callback pro-
       cedures are the same as  the  arguments,  documented  at  the  top,  to
       Tcl_ExternalToUtf  or Tcl_UtfToExternal, with the following exceptions.
       If the srcLen argument to one of those high-level  functions  is  nega-
       tive,  the value passed to the callback procedure will be the appropri-
       ate encoding-specific string(3,n) length of src.  If any of the  srcReadPtr,
       dstWrotePtr,  or  dstCharsPtr  arguments to one of the high-level func-
       tions is NULL, the corresponding value passed to the callback procedure
       will be a non-NULL location.

       The  callback  procedure  freeProc,  if(3,n) non-NULL, should match the type
       Tcl_EncodingFreeProc: typedef void  Tcl_EncodingFreeProc(       Client-
       Data clientData);

       This  freeProc  function  is  called when the encoding(3,n) is deleted.  The
       clientData parameter is the same as the clientData field  specified  to
       Tcl_CreateEncoding when the encoding(3,n) was created.


       Tcl_GetDefaultEncodingDir  and Tcl_SetDefaultEncodingDir access(2,5) and set(7,n,1 builtins)
       the directory to use when locating the default encoding(3,n) files.  If this
       value  is not NULL, the TclpInitLibraryPath routine appends the path to
       the head of the search path, and uses this path as the first  place  to
       look(1,8,3 Search::Dict) into when trying to locate the encoding(3,n) file.


ENCODING FILES
       Space  would  prohibit  precompiling  into  Tcl every possible encoding(3,n)
       algorithm, so many encodings are stored on disk as dynamically-loadable
       encoding(3,n)  files.   This  behavior  also allows the user to create addi-
       tional encoding(3,n) files that can be  loaded  using  the  same  mechanism.
       These encoding(3,n) files contain information about the tables and/or escape
       sequences used to map between an external encoding(3,n)  and  Unicode.   The
       external  encoding(3,n)  may  consist of single-byte, multi-byte, or double-
       byte characters.

       Each dynamically-loadable encoding(3,n) is represented as a text file.   The
       initial  line  of the file(1,n), beginning with a ``#'' symbol, is a comment
       that provides a human-readable description of the file.  The next  line
       identifies  the  type of encoding(3,n) file.  It can be one of the following
       letters:

       [1]   S
              A single-byte encoding(3,n), where one character is always  one  byte
              long  in(1,8)  the  encoding.   An example is iso8859-1, used by many
              European languages.

       [2]   D
              A double-byte encoding(3,n), where one character is always two  bytes
              long  in(1,8)  the  encoding.   An  example is big5, used for Chinese
              text.

       [3]   M
              A multi-byte encoding(3,n), where one character may be either one  or
              two bytes long.  Certain bytes are a lead bytes, indicating that
              another byte must follow and that together the two bytes  repre-
              sent  one  character.  Other bytes are not lead bytes and repre-
              sent themselves.  An example is shiftjis, used by many  Japanese
              computers.

       [4]   E
              An  escape-sequence  encoding(3,n), specifying that certain sequences
              of bytes do not represent characters, but commands that describe
              how following bytes should be interpreted.

       The rest of the lines in(1,8) the file(1,n) depend on the type.

       Cases  [1],  [2],  and  [3] are collectively referred to as table-based
       encoding(3,n) files.  The lines in(1,8) a table-based encoding(3,n) file(1,n)  are  in(1,8)  the
       same  format  as this example taken from the shiftjis encoding(3,n) (this is
       not the complete file(1,n)): # Encoding file: shiftjis, multi-byte M 003F  0
       40  00 0000000100020003000400050006000700080009000A000B000C000D000E000F
       0010001100120013001400150016001700180019001A001B001C001D001E001F
       0020002100220023002400250026002700280029002A002B002C002D002E002F
       0030003100320033003400350036003700380039003A003B003C003D003E003F
       0040004100420043004400450046004700480049004A004B004C004D004E004F
       0050005100520053005400550056005700580059005A005B005C005D005E005F
       0060006100620063006400650066006700680069006A006B006C006D006E006F
       0070007100720073007400750076007700780079007A007B007C007D203E007F
       0080000000000000000000000000000000000000000000000000000000000000
       0000000000000000000000000000000000000000000000000000000000000000
       0000FF61FF62FF63FF64FF65FF66FF67FF68FF69FF6AFF6BFF6CFF6DFF6EFF6F
       FF70FF71FF72FF73FF74FF75FF76FF77FF78FF79FF7AFF7BFF7CFF7DFF7EFF7F
       FF80FF81FF82FF83FF84FF85FF86FF87FF88FF89FF8AFF8BFF8CFF8DFF8EFF8F
       FF90FF91FF92FF93FF94FF95FF96FF97FF98FF99FF9AFF9BFF9CFF9DFF9EFF9F
       0000000000000000000000000000000000000000000000000000000000000000
       0000000000000000000000000000000000000000000000000000000000000000     81
       0000000000000000000000000000000000000000000000000000000000000000
       0000000000000000000000000000000000000000000000000000000000000000
       0000000000000000000000000000000000000000000000000000000000000000
       0000000000000000000000000000000000000000000000000000000000000000
       300030013002FF0CFF0E30FBFF1AFF1BFF1FFF01309B309C00B4FF4000A8FF3E
       FFE3FF3F30FD30FE309D309E30034EDD30053006300730FC20152010FF0F005C
       301C2016FF5C2026202520182019201C201DFF08FF0930143015FF3BFF3DFF5B
       FF5D30083009300A300B300C300D300E300F30103011FF0B221200B100D70000
       00F7FF1D2260FF1CFF1E22662267221E22342642264000B0203220332103FFE5
       FF0400A200A3FF05FF03FF06FF0AFF2000A72606260525CB25CF25CE25C725C6
       25A125A025B325B225BD25BC203B301221922190219121933013000000000000
       000000000000000000000000000000002208220B2286228722822283222A2229
       000000000000000000000000000000002227222800AC21D221D4220022030000
       0000000000000000000000000000000000000000222022A52312220222072261
       2252226A226B221A223D221D2235222B222C0000000000000000000000000000
       212B2030266F266D266A2020202100B6000000000000000025EF000000000000

       The third line of the file(1,n) is three numbers.  The first number  is  the
       fallback  character  (in(1,8)  base 16) to use when converting from UTF-8 to
       this encoding.  The second number is a 1 if(3,n) this  file(1,n)  represents  the
       encoding(3,n)  for  a symbol font, or 0 otherwise.  The last number (in(1,8) base
       10) is how many pages of data follow.

       Subsequent lines in(1,8) the example above are pages that  describe  how  to
       map  from  the  encoding(3,n) into 2-byte Unicode.  The first line in(1,8) a page
       identifies the page number.  Following it are 256 double-byte  numbers,
       arranged  as 16 rows of 16 numbers.  Given a character in(1,8) the encoding(3,n),
       the high byte of that character is used to select(2,7,2 select_tut) which page,  and  the
       low  byte  of  that  character is used as an index to select(2,7,2 select_tut) one of the
       double-byte numbers in(1,8) that page - the value obtained being the  corre-
       sponding  Unicode  character.  By examination of the example above, one
       can see that the characters 0x7E and 0x8163 in(1,8) shiftjis map to 203E and
       2026 in(1,8) Unicode, respectively.

       Following  the first page will be all the other pages, each in(1,8) the same
       format as the first: one number identifying the page  followed  by  256
       double-byte Unicode characters.  If a character in(1,8) the encoding(3,n) maps to
       the Unicode character 0000, it means that the character  doesn't  actu-
       ally  exist.   If all characters on a page would map to 0000, that page
       can be omitted.

       Case [4] is the escape-sequence encoding(3,n) file.  The lines  in(1,8)  an  this
       type  of  file(1,n)  are  in(1,8)  the same format as this example taken from the
       iso2022-jp encoding:  #  Encoding  file:  iso2022-jp,  escape-driven  E
       init           {}        final          {}        iso8859-1      \x1b(B
       jis0201        \x1b(J    jis0208        \x1b$@    jis0208        \x1b$B
       jis0212        \x1b$(D gb2312         \x1b$A ksc5601        \x1b$(C

       In  the file(1,n), the first column represents an option and the second col-
       umn is the associated value.  init is a string(3,n) to emit or expect before
       the  first  character  is converted, while final is a string(3,n) to emit or
       expect after the last character.  All other options are names of table-
       based encodings; the associated value is the escape-sequence that marks
       that encoding.  Tcl syntax is used for the values; in(1,8) the  above  exam-
       ple, for instance, ``{}'' represents the empty string(3,n) and ``\x1b'' rep-
       resents character 27.

       When Tcl_GetEncoding encounters an encoding(3,n)  name  that  has  not  been
       loaded,  it  attempts to load(7,n) an encoding(3,n) file(1,n) called name.enc from the
       encoding(3,n) subdirectory of each directory specified in(1,8) the  library  path
       $tcl_libPath.   If the encoding(3,n) file(1,n) exists, but is malformed, an error(8,n)
       message will be left in(1,8) interp.

KEYWORDS
       utf, encoding(3,n), convert






Tcl                                   8.1                   Tcl_GetEncoding(3)

References for this manual (incoming links)