Home Page › Forums › BizTalk 2004 – BizTalk 2010 › Encoding issue
- This topic has 6 replies, 1 voice, and was last updated 9 years, 2 months ago by
community-content.
-
AuthorPosts
-
-
June 2, 2008 at 11:46 PM #19837
We have a flat file schema;The code page is set to UTF-8. in the schema there is a string type; When we have an input file with the text containing a British Pound sign” the Biztalk schema fails to validate it. We thought the UTF-8 should accept the character.However, it works fine the moment we remove the symbol. (Looking at the schema XMl file, the top processing instruction says encoding is in UTF-16?) Would anyone please advise what is going wrong here? Regards,
-
June 3, 2008 at 1:29 AM #19838
The processing instruction at the top of the schema file defines the encoding of the schema file itself. It has no effect of any instance document.
These are the rules the FF Disassembler uses to set the encoding on documents:
When disassembling a flat file instance message, the following algorithm is used to determine and preserve encoding information:
- If the “Charset” in the body part is set, use it.
- Otherwise, if the envelope (or document) schema specifies a code page, use it.
- Otherwise, if a byte order mark is present, use it
- Otherwise, assume UTF-8.
I assume the British pound sign is a single byte character in the file and not the multi-byte character required by UTF-8
UTF-8 matches ASCII for the first 127 characters, then uses some escaping characters to specify other characters.I would suggest using the Western-European (1252) code page to maintain the British pound sign.
-
June 4, 2008 at 1:15 AM #19844
Hello Greg,
just to understand this a bit more clearly.The source document is in ASCII and has the pound sign in it.
This character does not fall in the standard 127 character ASCII range.
The FF schema is set to UTF 8 in the schema properties.
So when the Biztalk processes, it tries to convert the single byte character into the equivalent
UTF-8(multibyte) which turns out to be an invalid string character?
kindly assist.
Regards,-
June 4, 2008 at 4:22 AM #19846
Biztalk does not try to convert the single byte character to its UTF-8 equivalent. It simply reads the characters as they exist. The British pound sign ASCII character is invalid according to the UTF-8 spec.
The code page setting in the FF schema should match the actual encoding of the incoming document.
-
June 4, 2008 at 8:12 AM #19848
Hello Greg,
thanks a lot for your inputs.
however still one confusion.
when you say “The British pound sign ASCII character is invalid according to the UTF-8 spec”Would you please elaborate on this since i am sure the pound symbol can’t be left out of UTF-8?
I am a newbie so would appreaciate an explanation.thanks once again
-
June 4, 2008 at 1:31 PM #19851
In ASCII the British pound sign is 0xA3
In UTF16 it is 0x00A3
In UTF-8 it is 0xC2 0xA3
UTF-8 is an encoding that supports the tens of thousands of Unicode characters. The first 127 characters are the same as ASCII, after that all characters use 2,3 or 4 byte sequences to represent characters.
check out http://www.unicode.org for more information
-
June 4, 2008 at 10:26 PM #19854
Thank You Very Much, Greg 🙂
-
-
-
-
-
-
AuthorPosts
- The forum ‘BizTalk 2004 – BizTalk 2010’ is closed to new topics and replies.