高级会员
注册日期: 06-11
帖子: 14579
精华: 1
现金: 224494 标准币
资产: 234494 标准币
|
【转帖】question about detect correct code-page
question about detect correct code_page
question about detect correct code_page
hello
my application handles texts in a utf-8 encoding, we are using japanese, chinese, germany, chirilic and more unicode text.
now my question is: is possible to detect from text which is encoded in utf-8, which is correct code page for converting to the multibyte encoding.
during conversion from utf-8 to multibyte i need correct code page that my conversion can be correctly finished. if i am using not correctly code page for this conversion, dwgdirect convert all these chars to u+xxxx notation which is not good for me, i need correct ansi representation !
if somebody have some advices for this my problem, please replay to this post !
regards
ervin
latest ac versions starting with ac2007 store texts as utf-8 in dxf and utf-16 in dwg.
dwgdirect api works with unicode (utf-16) strings.
do you need conversion to mbcs for exporting to pre-2007 format? if so than:
if you have german text you need use german code page, if chineese than chineese code page.
if you have mixed texts than it's impossible to represent characters from different code pages without \u+xxxx sequences.
sergey slezkin
hello
i know everything what you tell me. i need an advice if you can get me to for checking for code page from utf8 text encoding.
example:
if i have text encoded in utf8 and all chars from this text are, for example, from range of 932 code page. is it possible to get this information using dwgdirect apis ?
i understand that if text is mixed with chars from different code page, that is not possible to get good information about correct code page.
i need tips if you can get to me !
thanks for your time
ervin
dwgdirect contains no such functionality (to detect code page for given unicode character).
but each code page is assigned a range of unicode values:
0-127 - ascii (the same characters in each code page)
...
xxxx-yyyy - code page 1
....
uuuu-vvvv - code page 2
....
wwww-zzzz - ansi_932
....
these ranges are documented (i don't remeber where)
also note that not only first 127 characters present in multiple code pages. for example 932 (japanese) cp contain cyrillic characters. so cyrillic text can be converted to both 1251 and 932 code pages.
sergey slezkin
last edited by mmuratov; 6th february 2008 at 05:00 amfff">.
hi
thanks to you !
ervin
|