libplist utf-8 conversion bugginess
Reported by Eric Monti | March 19th, 2013 @ 09:18 PM | in 1.2.0 Release
Environment:
Testing on osx snow leopard using libplist-1.10, also tested on
libplist-0.8
Problem:
Certain 'apparently legal' utf-8 sequences are not properly converted by libplist. These sequences convert correctly using apple's plutil -convert xml1, but not using plistutil (or the c library directly -- tested via python and ruby bindings)
The resulting plist conversion does not work when parsing by other apple tools.
To reproduce:
See the attached files. sample.bin.plist (original binary src plist) and sample.xml.plist (converted using apple's plutil).
Here's a plist dump of the sample
{
ProblematicValue = "\Ud83c\Udf1fCute utf-8 art\Ud83c\Udf1f";
}
Try converting sample.bin.plist to xml with plistutil and examining the output with apples plutil:
$ ./plistutil-1.10 -i sample.bin.plist -o sample.lpl.plist
$ plutil -p sample.bin.plist
{
"ProblematicValue" => "🌟Cute utf-8 art🌟"
}
$ plutil -p sample.xml.plist
{
"ProblematicValue" => "🌟Cute utf-8 art🌟"
}
binbag:plistutil monti$ plutil -p sample.lpl.plist
sample.lpl.plist: Unable to convert string to correct encoding
$ diff sample.xml.plist sample.lpl.plist
6c6
< <string>🌟Cute utf-8 art🌟</string>
---
> <string>??????Cute utf-8 art??????</string>
Note, you can't see it in this forum, but the string "Cute utf-8 art" is enclosed on either side by a utf-8 star icon. using the sequence: "\Ud83c\Udf1f"
Comments and changes to this ticket
-
Eric Monti March 19th, 2013 @ 09:47 PM
more context: found the offending codepage -- also note: surrounding values also exhibit the same problem
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=127775
-
Martin S. July 1st, 2013 @ 04:21 PM
- Tag set to conversion, encoding, libplist, utf-8
- Assigned user set to Nikias Bassen
I was able to reproduce the problem, however this is related to libxml2 freaking out on the characters:
parser error : Char 0xD83C out of allowed range parser error : PCDATA invalid Char value 55356 parser error : Char 0xDF1F out of allowed range parser error : PCDATA invalid Char value 57119
-
Shane Garrett October 2nd, 2013 @ 11:22 PM
Ran across the same issue, it looks like the problem is with the plist_utf16_to_utf8() function. It doesn't correctly handle UTF-8 encoding of UTF-16 surrogate pairs, which the sample plist has.
-
Martin S. October 19th, 2013 @ 10:10 AM
- State changed from new to resolved
- Milestone set to 1.2.0 Release
The issue is fixed in git HEAD master of libplist.
Downstream ticket:
https://github.com/libimobiledevice/libplist/pull/2
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
A project around supporting the iPhone in Linux.
See http://libimobiledevice.org