几何尺寸与公差论坛------致力于产品几何量公差标准GD&T (GDT:ASME)|New GPS(ISO)研究/CAD设计/CAM加工/CMM测量  


返回   几何尺寸与公差论坛------致力于产品几何量公差标准GD&T (GDT:ASME)|New GPS(ISO)研究/CAD设计/CAM加工/CMM测量 » 仿射空间:CAX软件开发(三)二次开发与程序设计 » CAD二次开发 » AutoCAD二次开发 » DirectDWG
用户名
密码
注册 帮助 会员 日历 银行 搜索 今日新帖 标记论坛为已读


 
 
主题工具 搜索本主题 显示模式
旧 2009-05-04, 05:53 PM   #1
yang686526
高级会员
 
注册日期: 06-11
帖子: 14579
精华: 1
现金: 224494 标准币
资产: 234494 标准币
yang686526 向着好的方向发展
默认 【转帖】codepage query

codepage query
codepage query
hi,
found lot of posts regarding this on the forum, but i'd still like to have the following things clarified:
1. am i correct in assuming that systemcodepage() is only used for writing $dwgcodepage during export and for text conversion (only if ballowcpconversion is set to true) during import? are there any other uses of the value returned by this function?
2. is there no default implementation of systemcodepage()?
1. systemcodepage() is used to set dwgcodepage in newly created drawings (and save it to file).
it also is used to convert drawing being loaded if dwgcodepage in file is different.
2. systemcodepage() is implemented in exsystemservices() under #ifdef _win32
3. you can't be sure that text contains no \m+xxxxx.
if allowcpconversion is false you get what is saved to file. and both \+u and \+m may present in file.
if allowcpconversion is true and system code page is ansi. imagine you are loading japanese file (cp 932). the only way to represent japanese characters in ansi cp is \m+xxxx. (and cyrillic by \u+xxxx).
4. if odcharmapper is not initialized on windows platform windows conversion is used. it may fail if some language support is not installed. for example you are loading jananese file on computer without japanese support installed.
adinit.dat was not changed during last several years.
sergey slezkin
hi sergey,
regarding point 3, i fail to understand why the only way to map some japanese characters to ascii is via \m+xxxxx. wouldn't there be a corresponding \u+xxxx representation also? or is such a representation not unique?
also, for the text content of oddbtext, i am able to get the codepage from the text style. but for other strings passed to my callbacks (filenames etc.) how do i get the codepage?
is it the codepage specified by the dwgcodepage system variable (for no conversion) and my systemcodepage() (for conversions)?
thanks,
varun
theoretically any \m+xxxxx can be represented as \u+xxxx. the question is who is supposed to perform the conversion.
imagine autocad on ansi_1252 computer is loading japanese file. and the computer has no japanese support installed.
if a drawing of foreign codepage is being loaded and required support is not installed autocad gives warning about missing .nls files and attempts to convert text using its internal tables. these internal tables store unicode values for supported single byte code pages and such text comes as \u+xxxx.
multibyte code pages have too many characters and autocad stores only lists of leading bytes for them (to be able to recognize single byte and 2-byte characters).
so asian text comes as \m+xxxxx sequences.
if ballowcpconversion is false you get strings in dwgcodepage. if true - in systemcodepage().
sergey slezkin
here is what my final understanding is:
if i specify ballowcpconversion == true and say i get a sequence \m+xxxxx in one of the strings, then that \m+xxxxx sequence is from the dwgcodepage, while the string itself is encoded in systemcodepage().
and if i specify ballowcpconversion == false, then both the string and the \m+xxxxx sequence are in dwgcodepage.
am i correct?
thanks,
varun
\m+xxxxx and \u+xxxx are special sequences of ansi characters which have the same encoding in all code pages.
they represent symbols which are absent in drawing code page.
if ballowcpconversion is false you get what was saved to file (strings and dwgcodepage)
if ballowcpconversion is true strings are converted to systemcodepage and dwgcodepage is set to systemcodepage.
you can get \u+ or \m+ from file of any code page.
for example japanese file was loaded by american autocad and saved back. new file has ansi_1252 code page and japanese symbols in it are represented as \m+1xxxx.
sergey slezkin

correct me if i am wrong here, but to convert a multibyte sequence to unicode codepoints you need the codepage with respect to which this sequence was written.
say, i have a drawing in japenese. i open it (requesting conversion) with system code page as cp_ascii. i get a sequence \m+xxxxx. now to interpret it correctly, i need the encoding of the original format(that is which japanese encoding scheme was used). how can i get that if on allowing conversion, the dwgcodepage is set to systemcodepage() as mentioned in your last post?
thanks,
varun
the first digit after + indicates one of mbcs code pages supported by autocad.
japanese characters (932) look like:
\m+1xxxx
from ac help
the multibyte interchange format (mif) converts asian language character strings. the following string represents an asian character displayed on a system other than the native one:
\m+nxxxx
multibyte shape number. the n is a digit identifying the originating multibyte code page id. the xxxx is the hexadecimal value of the multibyte character.
the code page identifications that autocad supports are listed in the following table.
1 (932 ) japanese (shift-jis)
2 (950) traditional chinese (big 5)
3 (949) wansung (ks c-5601-1987)
4 (1361) johab (ks c-5601-1992)
5 (936) simplified chinese (gb 2312-80)
sergey slezkin
got it. thanks a lot sergey.
just one final small problem remains:
comment above oddbtextiterator::nextchar() say
quote:
the returned character will be a unicode character except when the binbigfont flag is set in the currproperties() value. in this case, the returned character will be mbcs, corresponding to a \m+nxxxx character in the original string.
how do i get the codepage from which this multibyte char comes from? is it the same as the codepage i passed to the oddbtextiterator::createobject()? if so then does that mean that this will happen only in the case that codepage is one of the five mentioned by you before?
on a related note, can't we have multibytes from two different codepages (for instance, \m+1aaaa\m+2bbbb) in one text object? if so, how would they be handled?
thanks,
varun
oddbtextiteratorptr::nextchar() may return multibyte, unicode or single byte (there are some shx
fonts without support unicode) depend of font for text in text style (ptextstyle). in other words it is font coding.
oddbtextiterator was designed for this purpose.
as work around:
odgitextstyle has method getfont() which returns odfont* and odfont has method getflags().
if (getflags() & kunifont10) || ( getflags() & ktruetype) you get unicode.
if currproperties().binbigfont you get multibyte. unfortunately you can't get the codepage of this char directly.
quote:
originally posted by varunsnair
on a related note, can't we have multibytes from two different codepages (for instance, \m+1aaaa\m+2bbbb) in one text object?
it is posible. but in case text style uses big font it is no sense as font contains support of one codepage so text can't be rendered right.
in case text style uses true type font you get unicode.
best regards,
sergey z.
can i get the codepage using this:
code:
uchar = textiter->nextchar();
if (textiter->currproperties().binbigfont)
{
odcodepageid codepageid = getcodepageidfromnum(*(textiter->currpos() - 5) - '0'); // maps 1 to cp_ansi_932, 2 to cp_ansi_950... etc.
odcharmapper::codepagetounicode(uchar, codepageid, uchar);
}
// now uchar is always in unicode
the assumption i am making is that that the underlying source string is ascii encoded (we set ballowcpconversion to true and systemcodepage() is cp_ascii) and each multibyte is represented as \m+nxxxx in that string.
thanks,
varun
yes, you can get the codepage so.
best regards,
sergey z.
new problems (on mac)
we're using the vectorization framework with dd libs 1.14.02. we've initialized the odcharmapper with adinit.dat.
the attached file has one text entity with text "\m+18fb0".
-on windows with vc8 libs, oddbtextiterator::nextchar() returns 0x5e8a, which is the correct character, consistent with what autocad renders.
-on mac with xcode libs on both mactel and ppc, oddbtextiterator::nextchar() returns 0x8fb0, which is incorrect and is rendered as some other character.
binbigfont is false in both the cases (win and mac).
dd guys, can you help figure out what the problem is?
attached files (22.4 kb, 11 views)

regards,
varun
quote:
originally posted by varunsnair
-on mac with xcode libs on both mactel and ppc, oddbtextiterator::nextchar() returns 0x8fb0, which is incorrect and is rendered as some other character.
binbigfont is false in both the cases (win and mac).
i suppose your application have not found font (arial.ttf) and substituted font doen't support unicode. so character isn't decoded to unicode. you can test odfont::getflags() to make sure about it ( see post above).
see dwgdirect reference "developer's guide\font handling" topic about supporting fonts.
best regards,
sergey z.
yang686526离线中   回复时引用此帖
GDT自动化论坛(仅游客可见)
 


主题工具 搜索本主题
搜索本主题:

高级搜索
显示模式

发帖规则
不可以发表新主题
不可以回复主题
不可以上传附件
不可以编辑您的帖子

vB 代码开启
[IMG]代码开启
HTML代码关闭

相似的主题
主题 主题发起者 论坛 回复 最后发表
【转帖】positional tolerance query yang686526 American standards 0 2009-04-29 09:22 PM
【转帖】solidworks query - clsid yang686526 SolidWorks二次开发 0 2009-04-13 02:33 PM
【转帖】opening foreign files query yang686526 SolidWorks二次开发 0 2009-04-13 01:13 PM
坐标系的建立 huangyhg PC-DMIS 0 2009-04-07 06:08 AM


所有的时间均为北京时间。 现在的时间是 05:13 AM.


于2004年创办,几何尺寸与公差论坛"致力于产品几何量公差标准GD&T | GPS研究/CAD设计/CAM加工/CMM测量"。免责声明:论坛严禁发布色情反动言论及有关违反国家法律法规内容!情节严重者提供其IP,并配合相关部门进行严厉查处,若內容有涉及侵权,请立即联系我们QQ:44671734。注:此论坛须管理员验证方可发帖。
沪ICP备06057009号-2
更多