21 hours ago
- Link
Mike Fruchter, Brian Daniel Eisenberg, Chris Loft and 4 other people liked this
And thanks to GetSatisfaction, there's an answer for why this happens. Turns out special characters "count" as more than one character. I used the > character, which is stored as more than 1 character by Twitter. Updated the blog post for this information. - Hutch Carpenter
Ahhhh! Now that answers a question I've had for quite some time. Nice catch! - Brian Daniel Eisenberg
Thanks Brian. Yeah, this was really bugging me. - Hutch Carpenter
So what determines when a service such as Twitter stores the actual character, and when it stores an alternative such as ampersand-g-t instead? Or should I just assume that only numeric digits and English-language letters are stored as single characters? - Ontario Emperor
Ooooh...good question. Any HTML/ASCII/database experts in the house? - Hutch Carpenter
Ontario: it shouldn't be this complicated, but a good reference is: http://www.comp.lancs.ac.uk/co... I'd put money on quotation marks, ampersands, and greater than/less than signs counting, and I'd hedge on the other special characters listed on this page. - Mark Trapp
Ontario - saw the sallyfield URL as a traffic referral to the blog. Love your List. - Hutch Carpenter
This bug is bad. The HTML encoding shouldn't count against the user's 140 characters. Thanks for the heads up. - Alan Le
Mark's right. Any characters that are reserved chars in HTML or scripting languages are likely escaped. - Brian Daniel Eisenberg
Honestly, and this is the backseat programmer in me, the fact that it counts towards your total is pretty amazing. Most developers take any input and only escape during output, not the other way around, precisely because of situations like this; their database should accept UTF-8. I wonder why they thought this way was better. - Mark Trapp
Isn't the 140 char limit driven by SMS -- whatever it allows/dictates should be what Twitter uses - Brian Sullivan
Brian: characters like < count as one character in SMS messages. The more-than-one-character thing is specifically a function of HTML escaping. - Mark Trapp






