Multibyte support
From Swishewiki
Multibyte support is a sorely missing feature in SWISH-E. It is understood that adding multibyte support (UTF-8 is preferred) to Swish-e is a non-trivial task. Perhaps somehow the code in SWISH-E can slowly be massaged to support UTF-8.
This page will document ideas for a rational approach on how to add UTF-8 support to SWISH-E. Conceptually, perhaps the lowest levels can be rewritten first, progressively working up through the system removing non-UTF-8 assumptions.
Perhaps a step towards that goal is some sort of documentation explaning the internal architecture of Swish-e -- which files do what, and roughly how.
- Q: Is there any such documentation at this point?
- A: Nothing formal. Only some commenting in the source code.
To get an idea of what programming in a UTF-8-aware mode entails, have a look at this short Unicode C Example.
Multibyte Links and Reference Material
"man utf8" output from Fedora Core 1.
- "It can be hoped that in the foreseeable future, UTF-8 will replace ASCII and ISO 8859 at all levels" in POSIX. --Markus Kuhn, 2001
"man charsets" output from Fedora Core 1.
- "Linux is an international operating system."
UTF-8 History by "Rob 'Commander' Pike", 2003
Linux Unicode programming, by Thomas W. Burger, 2001
UTF-8 and Unicode FAQ for Unix/Linux, Marcus Kuhn, 1999-2005.
Website for the book The Unicode Standard, Version 4.0 By the Unicode Consortium, Et. al., 2003
The document The Gnu C Libraries "Introduction to Extended Characters" may prove useful for this discussion.
Perhaps it will also be useful to examine case studies of other software packages that have made the step from single-byte to UTF-8 aware. Ones that come to mind as having successfully made the transition include mysql and perl.