1726: "Unicode"

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

Ziggid
Posts: 5
Joined: Fri Jan 17, 2014 8:50 am UTC

1726: "Unicode"

Postby Ziggid » Mon Aug 29, 2016 12:22 pm UTC

Image

Alt-text: I'm excited about the proposal to add a "brontosaurus" emoji codepoint because it has the potential to bring together a half-dozen different groups of pedantic people into a single glorious internet argument.

If only IKEA would get into the business of flood control...

User avatar
HES
Posts: 4836
Joined: Fri May 10, 2013 7:13 pm UTC
Location: England

Re: 1726: "Unicode"

Postby HES » Mon Aug 29, 2016 1:18 pm UTC

Oh right, it is Monday today. Hurrah for public holidays!

Anyway, he's doing it wrong. If you want to steer a river using traffic signs, you use the signs to divert unsuspecting drivers into the river and dam it with their cars.
He/Him/His Image

User avatar
Flumble
Yes Man
Posts: 1990
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: 1726: "Unicode"

Postby Flumble » Mon Aug 29, 2016 1:34 pm UTC

HES wrote:Anyway, he's doing it wrong. If you want to steer a river using traffic signs, you use the signs to divert unsuspecting drivers into the river and dam it with their cars.

That sounds reasonable, so how does that translate to the unicode situation? Hack all chatbots to only use inappropriate emoji?

Wheeljack
Posts: 7
Joined: Mon Oct 27, 2014 8:48 am UTC

Re: 1726: "Unicode"

Postby Wheeljack » Mon Aug 29, 2016 1:35 pm UTC

I can think of someone who wants a raptor emoji...

Mercurywoodrose
Posts: 11
Joined: Fri May 14, 2010 4:34 am UTC

Re: 1726: "Unicode"

Postby Mercurywoodrose » Mon Aug 29, 2016 2:10 pm UTC

forget the Brontosaurus Emoji. how about a tiny picture of the "British Isles", aka "Britain and Ireland", "Atlantic Archipelago", "Anglo-Celtic Isles", the "British-Irish Isles", the Islands of the North Atlantic, or "these islands"(https://en.wikipedia.org/wiki/British_I ... ng_dispute), and my choice, "Airstrip One"(George Orwell's 1984)

User avatar
jules.LT
Posts: 1539
Joined: Sun Jul 19, 2009 8:20 pm UTC
Location: Paris, France, Europe

Re: 1726: "Unicode"

Postby jules.LT » Mon Aug 29, 2016 2:14 pm UTC

Is Unicode that unsuccessful?

From the wikipedia article, it seems that it is only being criticised for simplifying the vast variety of Asian characters by concentrating on the most common dialects.

That sounds more like they effectively mostly redirected the river, it's just that there's a significant stream left in the previous riverbed.
(and some flooding in the new one, to represent the insufficiency of 70 000 Han characters?)
Bertrand Russell wrote:Not to be absolutely certain is, I think, one of the essential things in rationality.
Richard Feynman & many others wrote:Keep an open mind – but not so open that your brain falls out

User avatar
cellocgw
Posts: 1845
Joined: Sat Jun 21, 2008 7:40 pm UTC

Re: 1726: "Unicode"

Postby cellocgw » Mon Aug 29, 2016 3:41 pm UTC

Wheeljack wrote:I can think of someone who wants a raptor emoji...


And while we're at it, a Sudoku emoji -- which, naturally, generates a new valid puzzle every time it's invoked.
Oh, all right, you can have a matching Sudoku-Solution emoji, too.
https://app.box.com/witthoftresume
Former OTTer
Vote cellocgw for President 2020. #ScienceintheWhiteHouse http://cellocgw.wordpress.com
"The Planck length is 3.81779e-33 picas." -- keithl
" Earth weighs almost exactly π milliJupiters" -- what-if #146, note 7

BrianX
Posts: 43
Joined: Sun Jul 22, 2007 6:03 am UTC
Location: Cape Cod, MA
Contact:

Re: 1726: "Unicode"

Postby BrianX » Mon Aug 29, 2016 5:22 pm UTC

jules.LT wrote:Is Unicode that unsuccessful?

From the wikipedia article, it seems that it is only being criticised for simplifying the vast variety of Asian characters by concentrating on the most common dialects.


What's actually going on is a bit more abstract than that. The unified-Hanzi thing is one aspect of it; it also affects Cyrillic Italics and a number of the Brahmic scripts. Basically, the Unicode Consortium's position is that a character is a character, not a glyph, and it's up to the font designers and app/library developers to make it look the way it's supposed to in any given context. So, basically, their response to someone saying that corresponding Chinese and Japanese characters should be treated as separate character codes is "Not our problem. Don't use a Japanese font on a Chinese document and this won't be an issue."

Celti
Posts: 1
Joined: Mon Aug 29, 2016 5:44 pm UTC

Re: 1726: "Unicode"

Postby Celti » Mon Aug 29, 2016 5:57 pm UTC

Alt-text: I'm excited about the proposal to add a "brontosaurus" emoji codepoint because it has the potential to bring together a half-dozen different groups of pedantic people into a single glorious internet argument.


Alright, which mailing list do I need to subscribe to in order to keep up with this argument? I've seen some glorious ones on various IETF mailing lists, but this promises to be exquisite.

User avatar
Flumble
Yes Man
Posts: 1990
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: 1726: "Unicode"

Postby Flumble » Mon Aug 29, 2016 6:22 pm UTC

BrianX wrote:So, basically, their response to someone saying that corresponding Chinese and Japanese characters should be treated as separate character codes is "Not our problem. Don't use a Japanese font on a Chinese document and this won't be an issue."

...which is a bit weird, considering that closer to home we do get separate codepoints for o, ο and о. Even worse, o in particular is the same glyph in at least 3 alphabets.

[edit]added quote, since quasi-ninjad by Celti's first post
Last edited by Flumble on Mon Aug 29, 2016 10:13 pm UTC, edited 1 time in total.

User avatar
somitomi
Posts: 526
Joined: Fri Nov 06, 2015 11:21 pm UTC
Location: can be found in Hungary
Contact:

Re: 1726: "Unicode"

Postby somitomi » Mon Aug 29, 2016 8:18 pm UTC

Flumble wrote:...which is a bit weird, considering that closer to home we do get separate codepoints for o, ο and о. Even worse, o in particular is the same glyph in at least 3 alphabets.

Maybe they should get some typewriter designers into the Unicode team. I have a typewriter with no 0 or 1 key, the uppercase O and the lowercase l can be used instead of those. Took me a while to figure that out...

I fixed a typo
Last edited by somitomi on Mon Aug 29, 2016 11:01 pm UTC, edited 1 time in total.
—◯-◯

niauropsaka
Posts: 82
Joined: Mon Apr 22, 2013 8:50 pm UTC

Re: 1726: "Unicode"

Postby niauropsaka » Mon Aug 29, 2016 9:07 pm UTC

Flumble wrote:...which is a bit weird, considering that closer to home we do get separate codepoints for o, ο and о. Even worse, o in particular is the same glyph in at least 3 alphabets.


I don't even know what you're doing there. I use Roman Oo & Greek Οο--is the other Cyrillic Оо? The first two actually kind of are different characters in Western usage, as weird as that is. And putting all the Greek letters together and all the Cyrillic letters together is convenient enough, and two codepoints spent on it cheap enough, that's it's not that surprising.

Chinese ideograms are far, far, far more numerous.

§

Anyway, I came into this thread because I find Randall's metaphor in the comic puzzling.

User avatar
jules.LT
Posts: 1539
Joined: Sun Jul 19, 2009 8:20 pm UTC
Location: Paris, France, Europe

Re: 1726: "Unicode"

Postby jules.LT » Mon Aug 29, 2016 9:13 pm UTC

BrianX wrote:What's actually going on is a bit more abstract than that. The unified-Hanzi thing is one aspect of it; it also affects Cyrillic Italics and a number of the Brahmic scripts. Basically, the Unicode Consortium's position is that a character is a character, not a glyph, and it's up to the font designers and app/library developers to make it look the way it's supposed to in any given context. So, basically, their response to someone saying that corresponding Chinese and Japanese characters should be treated as separate character codes is "Not our problem. Don't use a Japanese font on a Chinese document and this won't be an issue."

That was pretty interesting.
Thanks, BrianX!
Bertrand Russell wrote:Not to be absolutely certain is, I think, one of the essential things in rationality.
Richard Feynman & many others wrote:Keep an open mind – but not so open that your brain falls out

doomvox
Posts: 10
Joined: Fri Oct 29, 2010 10:21 pm UTC

Re: 1726: "Unicode"

Postby doomvox » Mon Aug 29, 2016 9:17 pm UTC

So, basically, their response to someone saying that corresponding Chinese and Japanese characters should be treated as separate character codes is "Not our problem. Don't use a Japanese font on a Chinese document and this won't be an issue."


What I'd really like to know though is why there's no system of hints you can embed in the text to let someone know if you're supposed to use, say, a Japanese font or a Chinese font with it... there used to be a way to embed a locale, but that was deprecated with Unicode 5.0.

Eutychus
Posts: 437
Joined: Mon Jan 25, 2010 6:01 am UTC
Location: France

Re: 1726: "Unicode"

Postby Eutychus » Mon Aug 29, 2016 9:34 pm UTC

All this talk of unicode is beyond me; what I noticed was how much Randall seems to be struggling here to depict perspective in comparison to his usual 2D landscapes.
Be very careful about rectilinear assumptions. Raptors could be hiding there - ucim

texttheater
Posts: 4
Joined: Wed Jul 03, 2013 8:58 am UTC

Re: 1726: "Unicode"

Postby texttheater » Mon Aug 29, 2016 9:46 pm UTC

Seems to me the Unicode people do a pretty good job at what they do (which is not, by the way, "governing" human "language"). Any idea what Randall actually meant, concretely, by the traffic sign simile?

User avatar
Flumble
Yes Man
Posts: 1990
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: 1726: "Unicode"

Postby Flumble » Mon Aug 29, 2016 10:11 pm UTC

niauropsaka wrote:
Flumble wrote:...which is a bit weird, considering that closer to home we do get separate codepoints for o, ο and о. Even worse, o in particular is the same glyph in at least 3 alphabets.


I don't even know what you're doing there. I use Roman Oo & Greek Οο--is the other Cyrillic Оо?

Correct. So you do know what I'm doing there. :wink: (if you select and "search for ..." those letters, the result page will also tell you about which o it is)

(also my response was supposed come directly after BrianX's, but apparently Celti posted in between though got their post approved later :oops: )

niauropsaka wrote:Chinese ideograms are far, far, far more numerous.

Sure, in absolute numbers they're more numerous, but relatively there are a lot of overlapping Latin, Greek and Cyrillic characters.
Also, unicode has enough unassigned codepoints to fit all those 'kind of' overlapping characters 10 times over, so at its core the discussion ought be a matter of principles.

User avatar
svenman
Posts: 511
Joined: Fri Jun 14, 2013 2:09 pm UTC
Location: 680 km NNE of the Château d'If

Re: 1726: "Unicode"

Postby svenman » Mon Aug 29, 2016 10:51 pm UTC

Flumble wrote:
niauropsaka wrote:
Flumble wrote:...which is a bit weird, considering that closer to home we do get separate codepoints for o, ο and о. Even worse, o in particular is the same glyph in at least 3 alphabets.

[...] Chinese ideograms are far, far, far more numerous.

Sure, in absolute numbers they're more numerous, but relatively there are a lot of overlapping Latin, Greek and Cyrillic characters.
Also, unicode has enough unassigned codepoints to fit all those 'kind of' overlapping characters 10 times over, so at its core the discussion ought be a matter of principles.

One important difference, though, is that Roman, Greek and Cyrillic characters are all elements of an alphabet with a canonical ordering, which Chinese ideograms aren't. If you'd use the same codepoint for Greek and Cyrillic o as for the Roman one (I'll assume the latter at least would remain untouched) the respective alphabets wouldn't be represented of a set of Unicode characters in direct sequence. That would take away the possibility to alphabetically sort strings consisting of Greek or Cyrillic characters by simply using their Unicode codepoints, which very probably many text-processing algorithms rely on.
Mostly active on the One True Thread.
If you need help understanding what's going on there, the xkcd Time Wiki may help.

Addams didn't die! But will Addams have a place to live? You can help!

Randallspeed to all blitzers on the One True Thread!

doomvox
Posts: 10
Joined: Fri Oct 29, 2010 10:21 pm UTC

Re: 1726: "Unicode"

Postby doomvox » Tue Aug 30, 2016 3:33 am UTC

niauropsaka wrote:Chinese ideograms are far, far, far more numerous.


Actually, that used to matter a lot back when they were trying to squeeze everything into 64k characters, but now that they've upped the ceiling to 21bits, making the Han-derived languages all share codepoints isn't really necessary... it's just a bit of legacy left over from the 16bit days.

My theory is the real reason the Japanese got upset with unicode is the silly joke of calling this the "Han Unification". Somehow, the idea of submitting to the Chinese empire doesn't go over so well over there.

Anyway, is this really what Randall was talking about? It's kind-of old news...

More recent events include a push to make the available characters gender-neutral... since there's a WOMAN WITH BUNNY EARS there has to be a MAN WITH BUNNY EARS, and so on.

niauropsaka
Posts: 82
Joined: Mon Apr 22, 2013 8:50 pm UTC

Re: 1726: "Unicode"

Postby niauropsaka » Tue Aug 30, 2016 4:04 am UTC

texttheater wrote:Seems to me the Unicode people do a pretty good job at what they do (which is not, by the way, "governing" human "language"). Any idea what Randall actually meant, concretely, by the traffic sign simile?

You put into words what bewilders me about this cartoon.

User avatar
ucim
Posts: 5999
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: 1726: "Unicode"

Postby ucim » Tue Aug 30, 2016 4:07 am UTC

My solution would be to make all (human) unicode characters be encoded with multiple bytes... er... "data-things" (I don't know what they are calling them nowadays).

So, "woman with bunny ears" would be {skin-color} {age} {sex} {with feature} for a total of four data-things. Then people wouldn't even have to invent emoji - they'd be built into the formula. It could be generalized further; "Irish Setter on leash" would be {breed of} {animal} {with feature}, and some standard for indicating whether there are more features could be added (high byte of 1 means it's the last feature).

Similarly, "e with an accent grave" shouldn't have been its own codepoint. It should have been a modification of the "e" codepoint, to add an accent grave, and in such a way that the accent could be disregarded (i.e. for searches and grammar shifts). And Chinese glyphs would be built up from their (morphemous) components.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

Brian-M
Posts: 85
Joined: Tue Jan 18, 2011 6:31 am UTC

Re: 1726: "Unicode"

Postby Brian-M » Tue Aug 30, 2016 4:35 am UTC

ucim wrote:So, "woman with bunny ears" would be {skin-color} {age} {sex} {with feature} for a total of four data-things. Then people wouldn't even have to invent emoji - they'd be built into the formula. It could be generalized further; "Irish Setter on leash" would be {breed of} {animal} {with feature}, and some standard for indicating whether there are more features could be added (high byte of 1 means it's the last feature).

They're actually starting to do that. Last year they added "skin tone" codepoints which can be used to select the skin tone of the following emoji.

This was immediately followed by accusations of racism.

One reason being that the default emoji (where no skin tone was selected) on products that supported this feature was bright yellow. People were thinking that it was intended to be making fun of of Asian people. But it was just supposed to be the same color as a smiley face :) and not representative of any real-life skin tone.

Another reason it was called racist is that when people tried sending emojis of black people to Apple phones made before they started including this feature, the receiving phone would show a picture of an alien and then a white-skinned emoji. People took this to mean that Apple was calling dark-skinned people aliens. But the reality was that Apple phones represent all codepoints in the emoji region that they don't recognize with an alien symbol. The point of this is to let the user (or programmer debugging the programs) know that it's trying to show an emoji that isn't in the database. Since the older phones didn't have the skin-tone emoji in their database, the alien symbol popped up instead.

User avatar
Wee Red Bird
Posts: 176
Joined: Wed Apr 24, 2013 11:50 am UTC
Location: In a tree

Re: 1726: "Unicode"

Postby Wee Red Bird » Tue Aug 30, 2016 7:07 am UTC

ucim wrote:So, "woman with bunny ears" would be {skin-color} {age} {sex} {with feature} for a total of four data-things. Then people wouldn't even have to invent emoji - they'd be built into the formula. It could be generalized further; "Irish Setter on leash" would be {breed of} {animal} {with feature}, and some standard for indicating whether there are more features could be added (high byte of 1 means it's the last feature).

I've been waiting for a colour modifier for the bird emoji as it appears a different colour on different platforms. It can be blue, red, green or even look like a pigeon. Do I look like a pigeon?

User avatar
Eternal Density
Posts: 5532
Joined: Thu Oct 02, 2008 12:37 am UTC
Contact:

Re: 1726: "Unicode"

Postby Eternal Density » Tue Aug 30, 2016 11:48 am UTC

Play the game of Time! castle.chirpingmustard.com Hotdog Vending Supplier But what is this?
In the Marvel vs. DC film-making war, we're all winners.

User avatar
Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 2963
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

Re: 1726: "Unicode"

Postby Soupspoon » Tue Aug 30, 2016 12:15 pm UTC

Brian-M wrote:One reason being that the default emoji (where no skin tone was selected) on products that supported this feature was bright yellow. People were thinking that it was intended to be making fun of of Asian people. But it was just supposed to be the same color as a smiley face :) and not representative of any real-life skin tone.

c.f. Lego (minifigs1, original 'maxifigs', 'Technifigs', and Duplo 'brick animals', to name just what I remember. I don't think I ever thought of yellow as anything other than 'skin' colour, until we actually started to get ones that weren't that colour (examples like the oldminifig statue in this set, excepted), and they weren't gendered by facial features (until given gender-assumptions hair/hat/helmet) until fairly recently, too...

Improvement, or otherwise? As a white male, I never felt disenfranchised enough to care, but maybe it would have been yet another (lego)brick in the wall if I hadn't been pefectly capable of accepting that my astronauts were as male and 'caucasian' as I doubtless expected them to be and that long hair did not mean a hippy but a girl.

Now that there are racial and gendered (and emotonal!) variations, what happens if you don't get a set with enough self-identifying features? Perhaps it was at least excusably 'unrepresentative' with the most basic ones...

Anyway, this is by way of comparison with emojis. Too much choice, but also too little? Shoulda stuck with ascii emoticons, eh? :)
^_^
8^/

Code: Select all

    \\\\\
   \\\\\\\__o
___\\\\\\\'/___


1 Traditionally, but film tie-in ranges reallly started the trend to ethicity/species-'appropriate' skin-tones, even whilst only expressions were being modified, and now I think they're going further. But as Technic is more my thing, at the moment (as it has been for the last three or four decades), I only really take note of the movie-themed minifigs.

User avatar
Wee Red Bird
Posts: 176
Joined: Wed Apr 24, 2013 11:50 am UTC
Location: In a tree

Re: 1726: "Unicode"

Postby Wee Red Bird » Tue Aug 30, 2016 1:43 pm UTC

Soupspoon wrote:c.f. Lego (minifigs1, original 'maxifigs', 'Technifigs', and Duplo 'brick animals', to name just what I remember. I don't think I ever thought of yellow as anything other than 'skin' colour, until we actually started to get ones that weren't that colour (examples like the oldminifig statue in this set, excepted), and they weren't gendered by facial features (until given gender-assumptions hair/hat/helmet) until fairly recently, too...

Not forgetting the immortal line: "The Simpsons did it."

User avatar
PinkShinyRose
Posts: 831
Joined: Mon Nov 05, 2012 6:54 pm UTC
Location: the Netherlands

Re: 1726: "Unicode"

Postby PinkShinyRose » Tue Aug 30, 2016 5:21 pm UTC

svenman wrote:
Flumble wrote:
niauropsaka wrote:
Flumble wrote:...which is a bit weird, considering that closer to home we do get separate codepoints for o, ο and о. Even worse, o in particular is the same glyph in at least 3 alphabets.

[...] Chinese ideograms are far, far, far more numerous.

Sure, in absolute numbers they're more numerous, but relatively there are a lot of overlapping Latin, Greek and Cyrillic characters.
Also, unicode has enough unassigned codepoints to fit all those 'kind of' overlapping characters 10 times over, so at its core the discussion ought be a matter of principles.

One important difference, though, is that Roman, Greek and Cyrillic characters are all elements of an alphabet with a canonical ordering, which Chinese ideograms aren't. If you'd use the same codepoint for Greek and Cyrillic o as for the Roman one (I'll assume the latter at least would remain untouched) the respective alphabets wouldn't be represented of a set of Unicode characters in direct sequence. That would take away the possibility to alphabetically sort strings consisting of Greek or Cyrillic characters by simply using their Unicode codepoints, which very probably many text-processing algorithms rely on.

This is only if you don't go as far as they did with cjk-languages. The analogous situation would be if there would be a single codepoint for the same character (etymologically) in each script as long as they still look somewhat similar. So a, α and а would share a codepoint. Of course they could sort cjk-characters by radical like they do western words by letter. Also, there are at least 2 codepoints for μ (or µ). And several codepoints for each Latin character with differences between codepoints in the layout (i.e. c, C, ©, C, c and ¢).

User avatar
Pfhorrest
Posts: 4412
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 1726: "Unicode"

Postby Pfhorrest » Tue Aug 30, 2016 6:15 pm UTC

Wee Red Bird wrote:Not forgetting the immortal line: "The Simpsons did it."

"Remember: I'm the sweet, perfect minister's daughter... and you're just yellow trash."
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)

User avatar
Solra Bizna
Posts: 52
Joined: Fri Dec 04, 2015 6:44 pm UTC

Re: 1726: "Unicode"

Postby Solra Bizna » Tue Aug 30, 2016 9:41 pm UTC

Flumble wrote:
BrianX wrote:So, basically, their response to someone saying that corresponding Chinese and Japanese characters should be treated as separate character codes is "Not our problem. Don't use a Japanese font on a Chinese document and this won't be an issue."

...which is a bit weird, considering that closer to home we do get separate codepoints for o, ο and о. Even worse, o in particular is the same glyph in at least 3 alphabets.

Those are separate characters in Unicode because they were separate characters in one or more character sets to which Unicode wished to preserve Round-Trip Compatibility. (This was a much bigger concern back when Unicode was new.)

svenman wrote:One important difference, though, is that Roman, Greek and Cyrillic characters are all elements of an alphabet with a canonical ordering, which Chinese ideograms aren't. If you'd use the same codepoint for Greek and Cyrillic o as for the Roman one (I'll assume the latter at least would remain untouched) the respective alphabets wouldn't be represented of a set of Unicode characters in direct sequence. That would take away the possibility to alphabetically sort strings consisting of Greek or Cyrillic characters by simply using their Unicode codepoints, which very probably many text-processing algorithms rely on.

Chinese ideograms have a canonical ordering too. In fact, they can have several, depending on which language you're writing.

Sorting using codepoint order is pretty much only correct for English text, and even then is incorrect in obscure cases. It can also result in two semantically identical pieces of text sorting differently because they're in different Normal Forms. Any application that does non-trivial Unicode text processing ought to use language / OS libraries to do it in a user locale friendly way.

niauropsaka wrote:
texttheater wrote:Seems to me the Unicode people do a pretty good job at what they do (which is not, by the way, "governing" human "language"). Any idea what Randall actually meant, concretely, by the traffic sign simile?

You put into words what bewilders me about this cartoon.

As someone who was on the losing side of the Emoji Inclusion War*, it resonated with my belief that emoji should've gone in the Private Use Area and been standardized by an outside body, since they are specifically outside Unicode's scope and that is what the Private Use Area is for! Now that they're included, the standard is ballooning in a way that makes us Anti-Emojists very upset.

(*Not that I had, or have, any actual influence on Unicode's direction.)

doomvox wrote:What I'd really like to know though is why there's no system of hints you can embed in the text to let someone know if you're supposed to use, say, a Japanese font or a Chinese font with it... there used to be a way to embed a locale, but that was deprecated with Unicode 5.0.

Unicode does not "like" having such metadata in plain text for a few reasons.

For one thing, there is then no obvious place to draw the line between "plain text" and "rich text". For example, in some sense, the various ways of typesetting Roman text are no more or less different than the various language-specific ways of typesetting Chinese characters. Should there be an Italic metacharacter? A Fraktur metacharacter? A small-caps metacharacter? And how do these metacharacters interact with rich metadata provided by the container?

For another, the standard deliberately avoids being "modal" wherever possible. Many, many issues begin to come into play when you have to search arbitrarily far backward in a text stream in order to correctly interpret a given character. Witness the horror and destruction that has been unleashed upon many a web-forum by U+202E RIGHT-TO-LEFT OVERRIDE.

doomvox wrote:My theory is the real reason the Japanese got upset with unicode is the silly joke of calling this the "Han Unification". Somehow, the idea of submitting to the Chinese empire doesn't go over so well over there.

The whole issue of Han Unification was incredibly politically charged. None of the countries involved were happy with it. Delegates sometimes acted like they were willing to go to war over what seemed, to a Western observer, to be trivial points. (Not that their respective countries would likely have gone along with them...)

sakeniwefu
Posts: 170
Joined: Sun May 11, 2008 8:36 pm UTC

Re: 1726: "Unicode"

Postby sakeniwefu » Tue Aug 30, 2016 10:59 pm UTC

The Unicode Consortium doesn't try to do anything.
Attempting requires rational thought.
Anyone who has any respect for that organization or their "work", just doesn't understand the issues involved.
Unicode should be discarded and replaced with an encoding that actually can encode all human languages minus the emojis because screw that. Unify them with Latin 'G','F' and 'Y' and change your font as appropriate or encode them in UTF-0.

commodorejohn
Posts: 1033
Joined: Thu Dec 10, 2009 6:21 pm UTC
Location: Placerville, CA
Contact:

Re: 1726: "Unicode"

Postby commodorejohn » Tue Aug 30, 2016 11:28 pm UTC

Au contraire, everything should be expressed in plain ASCII!

(Code page 437 may be used for emoticons as an emergency measure.)
"'Legacy code' often differs from its suggested alternative by actually working and scaling."
- Bjarne Stroustrup
www.commodorejohn.com - in case you were wondering, which you probably weren't.

User avatar
Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 2963
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

Re: 1726: "Unicode"

Postby Soupspoon » Wed Aug 31, 2016 1:56 am UTC

commodorejohn wrote:(Code page 437 may be used for emoticons as an emergency measure.)
Or to represent "Cheesemaker with guts hanging out running from a giant badger", of course.

ps.02
Posts: 378
Joined: Fri Apr 05, 2013 8:02 pm UTC

Re: 1726: "Unicode"

Postby ps.02 » Wed Aug 31, 2016 7:07 am UTC

Wee Red Bird wrote:Not forgetting the immortal line: "The Simpsons did it."

Can we now nest this joke one deep, by referring to any act of quoting that line as a moment of "South Park did it"?

texttheater
Posts: 4
Joined: Wed Jul 03, 2013 8:58 am UTC

Re: 1726: "Unicode"

Postby texttheater » Wed Aug 31, 2016 9:20 am UTC

sakeniwefu wrote:The Unicode Consortium doesn't try to do anything.
Attempting requires rational thought.
Anyone who has any respect for that organization or their "work", just doesn't understand the issues involved.
Unicode should be discarded and replaced with an encoding that actually can encode all human languages minus the emojis because screw that. Unify them with Latin 'G','F' and 'Y' and change your font as appropriate or encode them in UTF-0.


Let me answer that with a quote from a great essay from The Grumpy Programmer:

I do, you know, find Unicode to be truly frustrating. For a long time I've wanted to write a long loud flame about everything I dislike about Unicode. But, I can't. You see, every time I run across something that frustrates me about Unicode I make the mistake of researching it. If you want to really hate something you have to completely lack the intellectual ability, and honesty, to research that hateful thing and try to understand it. If you do the research, you just might find out that there are good reasons for what you hate and that you could not have done better. Jeez, that is frustrating.

User avatar
Solra Bizna
Posts: 52
Joined: Fri Dec 04, 2015 6:44 pm UTC

Re: 1726: "Unicode"

Postby Solra Bizna » Wed Aug 31, 2016 4:25 pm UTC

texttheater wrote:Let me answer that with a quote from a great essay from The Grumpy Programmer:

I do, you know, find Unicode to be truly frustrating. For a long time I've wanted to write a long loud flame about everything I dislike about Unicode. But, I can't. You see, every time I run across something that frustrates me about Unicode I make the mistake of researching it. If you want to really hate something you have to completely lack the intellectual ability, and honesty, to research that hateful thing and try to understand it. If you do the research, you just might find out that there are good reasons for what you hate and that you could not have done better. Jeez, that is frustrating.

I like that essay. Though his explanation for the presence of combining characters is incorrect. Encoding every possible combined character is a lot more feasible in the kinds of languages he learned in school than it is in some more "exotic" languages. Let's not forget how rapidly factorials increase in magnitude...</pedantmode>

As a recovering cycle hoarder myself I can definitely sympathize with his perspective.

User avatar
somitomi
Posts: 526
Joined: Fri Nov 06, 2015 11:21 pm UTC
Location: can be found in Hungary
Contact:

Re: 1726: "Unicode"

Postby somitomi » Thu Sep 01, 2016 9:52 am UTC

Solra Bizna wrote:As someone who was on the losing side of the Emoji Inclusion War*, it resonated with my belief that emoji should've gone in the Private Use Area and been standardized by an outside body, since they are specifically outside Unicode's scope and that is what the Private Use Area is for! Now that they're included, the standard is ballooning in a way that makes us Anti-Emojists very upset.

(*Not that I had, or have, any actual influence on Unicode's direction.)

Even though I already knew emojis are part of Unicode, it didn't occur to me how bewilderingly insanely stupid I find it, and how much I wish it weren't. I don't even get why "emoji" replaced the old "smiley" as a word, considering the most basic emojis are smileys (smilies?). For Bob's sake, they aren't used to represent text, they're just a wacky thing used exclusively in messengers and chatrooms.
—◯-◯

User avatar
Keyman
Posts: 261
Joined: Thu Jun 19, 2014 1:56 pm UTC

Re: 1726: "Unicode"

Postby Keyman » Thu Sep 01, 2016 1:20 pm UTC

texttheater wrote:Let me answer that with a quote from a great essay from The Grumpy Programmer:

I do, you know, find Unicode to be truly frustrating. For a long time I've wanted to write a long loud flame about everything I dislike about Unicode. But, I can't. You see, every time I run across something that frustrates me about Unicode I make the mistake of researching it. If you want to really hate something you have to completely lack the intellectual ability, and honesty, to research that hateful thing and try to understand it. If you do the research, you just might find out that there are good reasons for what you hate and that you could not have done better. Jeez, that is frustrating.

Ender?? Is that you??
A childhood spent walking while reading books has prepared me unexpectedly well for today's world.

User avatar
orthogon
Posts: 2813
Joined: Thu May 17, 2012 7:52 am UTC
Location: The Airy 1830 ellipsoid

Re: 1726: "Unicode"

Postby orthogon » Thu Sep 01, 2016 5:03 pm UTC

texttheater wrote:Let me answer that with a quote from a great essay from The Grumpy Programmer:

[...] If you do the research, you just might find out that there are good reasons for what you hate and that you could not have done better. [...]


The older I get, the more I think that's good advice for life in general. It's not always the case, but quite often you'll find that things are the way they are because it turned out to be the least worst way of doing it.
xtifr wrote:... and orthogon merely sounds undecided.


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: Sustainabilizer and 61 guests