几何尺寸与公差论坛------致力于产品几何量公差标准GD&T (GDT:ASME)|New GPS(ISO)研究/CAD设计/CAM加工/CMM测量

2009-05-04, 05:53 PM	#1
yang686526 高级会员注册日期: 06-11 帖子: 14579 精华: 1 现金: 224494 标准币资产: 234494 标准币	【转帖】codepage query codepage query codepage query hi, found lot of posts regarding this on the forum, but i'd still like to have the following things clarified: 1. am i correct in assuming that systemcodepage() is only used for writing $dwgcodepage during export and for text conversion (only if ballowcpconversion is set to true) during import? are there any other uses of the value returned by this function? 2. is there no default implementation of systemcodepage()? 1. systemcodepage() is used to set dwgcodepage in newly created drawings (and save it to file). it also is used to convert drawing being loaded if dwgcodepage in file is different. 2. systemcodepage() is implemented in exsystemservices() under #ifdef _win32 3. you can't be sure that text contains no \m+xxxxx. if allowcpconversion is false you get what is saved to file. and both \+u and \+m may present in file. if allowcpconversion is true and system code page is ansi. imagine you are loading japanese file (cp 932). the only way to represent japanese characters in ansi cp is \m+xxxx. (and cyrillic by \u+xxxx). 4. if odcharmapper is not initialized on windows platform windows conversion is used. it may fail if some language support is not installed. for example you are loading jananese file on computer without japanese support installed. adinit.dat was not changed during last several years. sergey slezkin hi sergey, regarding point 3, i fail to understand why the only way to map some japanese characters to ascii is via \m+xxxxx. wouldn't there be a corresponding \u+xxxx representation also? or is such a representation not unique? also, for the text content of oddbtext, i am able to get the codepage from the text style. but for other strings passed to my callbacks (filenames etc.) how do i get the codepage? is it the codepage specified by the dwgcodepage system variable (for no conversion) and my systemcodepage() (for conversions)? thanks, varun theoretically any \m+xxxxx can be represented as \u+xxxx. the question is who is supposed to perform the conversion. imagine autocad on ansi_1252 computer is loading japanese file. and the computer has no japanese support installed. if a drawing of foreign codepage is being loaded and required support is not installed autocad gives warning about missing .nls files and attempts to convert text using its internal tables. these internal tables store unicode values for supported single byte code pages and such text comes as \u+xxxx. multibyte code pages have too many characters and autocad stores only lists of leading bytes for them (to be able to recognize single byte and 2-byte characters). so asian text comes as \m+xxxxx sequences. if ballowcpconversion is false you get strings in dwgcodepage. if true - in systemcodepage(). sergey slezkin here is what my final understanding is: if i specify ballowcpconversion == true and say i get a sequence \m+xxxxx in one of the strings, then that \m+xxxxx sequence is from the dwgcodepage, while the string itself is encoded in systemcodepage(). and if i specify ballowcpconversion == false, then both the string and the \m+xxxxx sequence are in dwgcodepage. am i correct? thanks, varun \m+xxxxx and \u+xxxx are special sequences of ansi characters which have the same encoding in all code pages. they represent symbols which are absent in drawing code page. if ballowcpconversion is false you get what was saved to file (strings and dwgcodepage) if ballowcpconversion is true strings are converted to systemcodepage and dwgcodepage is set to systemcodepage. you can get \u+ or \m+ from file of any code page. for example japanese file was loaded by american autocad and saved back. new file has ansi_1252 code page and japanese symbols in it are represented as \m+1xxxx. sergey slezkin correct me if i am wrong here, but to convert a multibyte sequence to unicode codepoints you need the codepage with respect to which this sequence was written. say, i have a drawing in japenese. i open it (requesting conversion) with system code page as cp_ascii. i get a sequence \m+xxxxx. now to interpret it correctly, i need the encoding of the original format(that is which japanese encoding scheme was used). how can i get that if on allowing conversion, the dwgcodepage is set to systemcodepage() as mentioned in your last post? thanks, varun the first digit after + indicates one of mbcs code pages supported by autocad. japanese characters (932) look like: \m+1xxxx from ac help the multibyte interchange format (mif) converts asian language character strings. the following string represents an asian character displayed on a system other than the native one: \m+nxxxx multibyte shape number. the n is a digit identifying the originating multibyte code page id. the xxxx is the hexadecimal value of the multibyte character. the code page identifications that autocad supports are listed in the following table. 1 (932 ) japanese (shift-jis) 2 (950) traditional chinese (big 5) 3 (949) wansung (ks c-5601-1987) 4 (1361) johab (ks c-5601-1992) 5 (936) simplified chinese (gb 2312-80) sergey slezkin got it. thanks a lot sergey. just one final small problem remains: comment above oddbtextiterator::nextchar() say quote: the returned character will be a unicode character except when the binbigfont flag is set in the currproperties() value. in this case, the returned character will be mbcs, corresponding to a \m+nxxxx character in the original string. how do i get the codepage from which this multibyte char comes from? is it the same as the codepage i passed to the oddbtextiterator::createobject()? if so then does that mean that this will happen only in the case that codepage is one of the five mentioned by you before? on a related note, can't we have multibytes from two different codepages (for instance, \m+1aaaa\m+2bbbb) in one text object? if so, how would they be handled? thanks, varun oddbtextiteratorptr::nextchar() may return multibyte, unicode or single byte (there are some shx fonts without support unicode) depend of font for text in text style (ptextstyle). in other words it is font coding. oddbtextiterator was designed for this purpose. as work around: odgitextstyle has method getfont() which returns odfont* and odfont has method getflags(). if (getflags() & kunifont10) \|\| ( getflags() & ktruetype) you get unicode. if currproperties().binbigfont you get multibyte. unfortunately you can't get the codepage of this char directly. quote: originally posted by varunsnair on a related note, can't we have multibytes from two different codepages (for instance, \m+1aaaa\m+2bbbb) in one text object? it is posible. but in case text style uses big font it is no sense as font contains support of one codepage so text can't be rendered right. in case text style uses true type font you get unicode. best regards, sergey z. can i get the codepage using this: code: uchar = textiter->nextchar(); if (textiter->currproperties().binbigfont) { odcodepageid codepageid = getcodepageidfromnum(*(textiter->currpos() - 5) - '0'); // maps 1 to cp_ansi_932, 2 to cp_ansi_950... etc. odcharmapper::codepagetounicode(uchar, codepageid, uchar); } // now uchar is always in unicode the assumption i am making is that that the underlying source string is ascii encoded (we set ballowcpconversion to true and systemcodepage() is cp_ascii) and each multibyte is represented as \m+nxxxx in that string. thanks, varun yes, you can get the codepage so. best regards, sergey z. new problems (on mac) we're using the vectorization framework with dd libs 1.14.02. we've initialized the odcharmapper with adinit.dat. the attached file has one text entity with text "\m+18fb0". -on windows with vc8 libs, oddbtextiterator::nextchar() returns 0x5e8a, which is the correct character, consistent with what autocad renders. -on mac with xcode libs on both mactel and ppc, oddbtextiterator::nextchar() returns 0x8fb0, which is incorrect and is rendered as some other character. binbigfont is false in both the cases (win and mac). dd guys, can you help figure out what the problem is? attached files (22.4 kb, 11 views) regards, varun quote: originally posted by varunsnair -on mac with xcode libs on both mactel and ppc, oddbtextiterator::nextchar() returns 0x8fb0, which is incorrect and is rendered as some other character. binbigfont is false in both the cases (win and mac). i suppose your application have not found font (arial.ttf) and substituted font doen't support unicode. so character isn't decoded to unicode. you can test odfont::getflags() to make sure about it ( see post above). see dwgdirect reference "developer's guide\font handling" topic about supporting fonts. best regards, sergey z.

GDT自动化论坛（仅游客可见）

主题工具	搜索本主题
显示可打印版本发送本页给好友	搜索本主题: 高级搜索
显示模式
切换到平板模式切换到混合模式树形模式

相似的主题
主题	主题发起者	论坛	回复	最后发表
【转帖】positional tolerance query	yang686526	American standards	0	2009-04-29 09:22 PM
【转帖】solidworks query - clsid	yang686526	SolidWorks二次开发	0	2009-04-13 02:33 PM
【转帖】opening foreign files query	yang686526	SolidWorks二次开发	0	2009-04-13 01:13 PM
坐标系的建立	huangyhg	PC-DMIS	0	2009-04-07 06:08 AM