Author Topic: Extended characters?  (Read 11395 times)

0 Members and 1 Guest are viewing this topic.

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22436
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Extended characters?
« on: August 26, 2016, 04:16:13 pm »
Did... I just see a mu?  :scared: :scared:

µ ?

( ?° ?? ?°)

Unicode?  Don't push my luck? ;D


Well... degrees and mu, it's a start I guess  :horse:


Tim
« Last Edit: August 26, 2016, 04:24:54 pm by T3sl4co1l »
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline Simon

  • Global Moderator
  • *****
  • Posts: 18065
  • Country: gb
  • Did that just blow up? No? might work after all !!
    • Simon's Electronics
Re: Extended characters?
« Reply #1 on: August 26, 2016, 04:22:43 pm »
Thats odd, in the notification email some of those carachters looked like a face but not in the post.
 

Offline Simon

  • Global Moderator
  • *****
  • Posts: 18065
  • Country: gb
  • Did that just blow up? No? might work after all !!
    • Simon's Electronics
Re: Extended characters?
« Reply #2 on: August 26, 2016, 04:23:35 pm »
and the ohm is a questionmark. Maybe because I am forcing a particular font on the browser.
 

Offline Galaxyrise

  • Frequent Contributor
  • **
  • Posts: 531
  • Country: us
Re: Extended characters?
« Reply #3 on: August 26, 2016, 04:26:08 pm »
µ has worked forever, I think.  I first noticed it with the µCurrent posts (like this one.) I don't know why µ is special... it's up in the same character range as ? (capital omega).
I am but an egg
 

Offline Simon

  • Global Moderator
  • *****
  • Posts: 18065
  • Country: gb
  • Did that just blow up? No? might work after all !!
    • Simon's Electronics
Re: Extended characters?
« Reply #4 on: August 26, 2016, 04:27:18 pm »
could it be something to do what is available in the ASCII set ? Or simply that all fonts go so far but not all.
 

Offline Aodhan145

  • Frequent Contributor
  • **
  • Posts: 403
  • Country: 00
Re: Extended characters?
« Reply #5 on: August 26, 2016, 04:29:45 pm »
? ohm seems to work for me.
[ahaha never mind it turned into a question mark]
 

Offline Galaxyrise

  • Frequent Contributor
  • **
  • Posts: 531
  • Country: us
Re: Extended characters?
« Reply #6 on: August 26, 2016, 04:30:58 pm »
? ohm seems to work for me.
[ahaha never mind it turned into a question mark]
Yep! It works when you're typing up the post, but get's elided by the forum software. Many people have fallen into that trap :)
I am but an egg
 

Offline Simon

  • Global Moderator
  • *****
  • Posts: 18065
  • Country: gb
  • Did that just blow up? No? might work after all !!
    • Simon's Electronics
Re: Extended characters?
« Reply #7 on: August 26, 2016, 04:47:29 pm »
Yet it shows on the notification email
 

Offline bitseeker

  • Super Contributor
  • ***
  • Posts: 9057
  • Country: us
  • Lots of engineer-tweakable parts inside!
Re: Extended characters?
« Reply #8 on: August 27, 2016, 12:04:51 am »
The page announces that it is outputting UTF-8

Quote
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

However, the data itself may be getting munged by the forum (i.e., converted to ASCII, Windows 1252, etc.).
TEA is the way. | TEA Time channel
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3680
  • Country: us
Re: Extended characters?
« Reply #9 on: August 27, 2016, 01:58:58 am »
The page announces that it is outputting UTF-8

Quote
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

However, the data itself may be getting munged by the forum (i.e., converted to ASCII, Windows 1252, etc.).

The forum loads a script that tries to convert anything, Lord to anything, Lord:

Code: [Select]
String.prototype.php_to8bit = function()
{
    if (smf_charset == 'UTF-8')
    {
        var n, sReturn = '';
        for (var i = 0, iTextLen = this.length; i < iTextLen; i++)
        {
            n = this.charCodeAt(i);
            if (n < 128)
                sReturn += String.fromCharCode(n)
            else if (n < 2048)
                sReturn += String.fromCharCode(192 | n>>6) + String.fromCharCode(128 | n & 63);
            else if (n < 65536)
                sReturn += String.fromCharCode(224 | n>>12) + String.fromCharCode(128 | n>>6 & 63) + String.fromCharCode(128 | n & 63);
            else
                sReturn += String.fromCharCode(240 | n>>18) + String.fromCharCode(128 | n>>12 & 63) + String.fromCharCode(128 | n>>6 & 63) + String.fromCharCode(128 | n & 63);
        }
        return sReturn;
    } else if (this.oCharsetConversion.from.length == 0)
    {
        switch (smf_charset)
        {
        case'ISO-8859-1':
            this.oCharsetConversion = {
                from: '\xa0-\xff',
                to: '\xa0-\xff'
            };
            break;
        case'ISO-8859-2':
            this.oCharsetConversion = {
                from: '\xa0\u0104\u02d8\u0141\xa4\u013d\u015a\xa7\xa8\u0160\u015e\u0164\u0179\xad\u017d\u017b\xb0\u0105\u02db\u0142\xb4\u013e\u015b\u02c7\xb8\u0161\u015f\u0165\u017a\u02dd\u017e\u017c\u0154\xc1\xc2\u0102\xc4\u0139\u0106\xc7\u010c\xc9\u0118\xcb\u011a\xcd\xce\u010e\u0110\u0143\u0147\xd3\xd4\u0150\xd6\xd7\u0158\u016e\xda\u0170\xdc\xdd\u0162\xdf\u0155\xe1\xe2\u0103\xe4\u013a\u0107\xe7\u010d\xe9\u0119\xeb\u011b\xed\xee\u010f\u0111\u0144\u0148\xf3\xf4\u0151\xf6\xf7\u0159\u016f\xfa\u0171\xfc\xfd\u0163\u02d9',
                to: '\xa0-\xff'
            };
            break;
        case'ISO-8859-5':
            this.oCharsetConversion = {
                from: '\xa0\u0401-\u040c\xad\u040e-\u044f\u2116\u0451-\u045c\xa7\u045e\u045f',
                to: '\xa0-\xff'
            };
            break;
        case'ISO-8859-9':
            this.oCharsetConversion = {
                from: '\xa0-\xcf\u011e\xd1-\xdc\u0130\u015e\xdf-\xef\u011f\xf1-\xfc\u0131\u015f\xff',
                to: '\xa0-\xff'
            };
            break;
        case'ISO-8859-15':
            this.oCharsetConversion = {
                from: '\xa0-\xa3\u20ac\xa5\u0160\xa7\u0161\xa9-\xb3\u017d\xb5-\xb7\u017e\xb9-\xbb\u0152\u0153\u0178\xbf-\xff',
                to: '\xa0-\xff'
            };
            break;
        case'tis-620':
            this.oCharsetConversion = {
                from: '\u20ac\u2026\u2018\u2019\u201c\u201d\u2022\u2013\u2014\xa0\u0e01-\u0e3a\u0e3f-\u0e5b',
                to: '\x80\x85\x91-\x97\xa0-\xda\xdf-\xfb'
            };
            break;
        case'windows-1251':
            this.oCharsetConversion = {
                from: '\u0402\u0403\u201a\u0453\u201e\u2026\u2020\u2021\u20ac\u2030\u0409\u2039\u040a\u040c\u040b\u040f\u0452\u2018\u2019\u201c\u201d\u2022\u2013\u2014\u2122\u0459\u203a\u045a\u045c\u045b\u045f\xa0\u040e\u045e\u0408\xa4\u0490\xa6\xa7\u0401\xa9\u0404\xab-\xae\u0407\xb0\xb1\u0406\u0456\u0491\xb5-\xb7\u0451\u2116\u0454\xbb\u0458\u0405\u0455\u0457\u0410-\u044f',
                to: '\x80-\x97\x99-\xff'
            };
            break;
        case'windows-1253':
            this.oCharsetConversion = {
                from: '\u20ac\u201a\u0192\u201e\u2026\u2020\u2021\u2030\u2039\u2018\u2019\u201c\u201d\u2022\u2013\u2014\u2122\u203a\xa0\u0385\u0386\xa3-\xa9\xab-\xae\u2015\xb0-\xb3\u0384\xb5-\xb7\u0388-\u038a\xbb\u038c\xbd\u038e-\u03a1\u03a3-\u03ce',
                to: '\x80\x82-\x87\x89\x8b\x91-\x97\x99\x9b\xa0-\xa9\xab-\xd1\xd3-\xfe'
            };
            break;
        case'windows-1255':
            this.oCharsetConversion = {
                from: '\u20ac\u201a\u0192\u201e\u2026\u2020\u2021\u02c6\u2030\u2039\u2018\u2019\u201c\u201d\u2022\u2013\u2014\u02dc\u2122\u203a\xa0-\xa3\u20aa\xa5-\xa9\xd7\xab-\xb9\xf7\xbb-\xbf\u05b0-\u05b9\u05bb-\u05c3\u05f0-\u05f4\u05d0-\u05ea\u200e\u200f',
                to: '\x80\x82-\x89\x8b\x91-\x99\x9b\xa0-\xc9\xcb-\xd8\xe0-\xfa\xfd\xfe'
            };
            break;
        case'windows-1256':
            this.oCharsetConversion = {
                from: '\u20ac\u067e\u201a\u0192\u201e\u2026\u2020\u2021\u02c6\u2030\u0679\u2039\u0152\u0686\u0698\u0688\u06af\u2018\u2019\u201c\u201d\u2022\u2013\u2014\u06a9\u2122\u0691\u203a\u0153\u200c\u200d\u06ba\xa0\u060c\xa2-\xa9\u06be\xab-\xb9\u061b\xbb-\xbe\u061f\u06c1\u0621-\u0636\xd7\u0637-\u063a\u0640-\u0643\xe0\u0644\xe2\u0645-\u0648\xe7-\xeb\u0649\u064a\xee\xef\u064b-\u064e\xf4\u064f\u0650\xf7\u0651\xf9\u0652\xfb\xfc\u200e\u200f\u06d2',
                to: '\x80-\xff'
            };
            break;
        default:
            this.oCharsetConversion = {
                from: '',
                to: ''
            };
            break;
        }
        var funcExpandString = function(sSearch) {
            var sInsert = '';
            for (var i = sSearch.charCodeAt(0), n = sSearch.charCodeAt(2); i <= n; i++)
                sInsert += String.fromCharCode(i);
            return sInsert;
        };
        this.oCharsetConversion.from = this.oCharsetConversion.from.replace(/.\-./g, funcExpandString);
        this.oCharsetConversion.to = this.oCharsetConversion.to.replace(/.\-./g, funcExpandString);
    }
    var sReturn = '', iOffsetFrom = 0;
    for (var i = 0, n = this.length; i < n; i++)
    {
        iOffsetFrom = this.oCharsetConversion.from.indexOf(this.charAt(i));
        sReturn += iOffsetFrom>-1 ? this.oCharsetConversion.to.charAt(iOffsetFrom) : (this.charCodeAt(i) > 127 ? '&#' + this.charCodeAt(i) + ';' : this.charAt(i));
    }
    return sReturn
}

Lord knows what they were thinking.
 

Offline bitseeker

  • Super Contributor
  • ***
  • Posts: 9057
  • Country: us
  • Lots of engineer-tweakable parts inside!
Re: Extended characters?
« Reply #10 on: August 27, 2016, 03:42:29 am »
Bizarre. It's UTF-8! It's made to handle lots of characters. :palm:
TEA is the way. | TEA Time channel
 

Offline Galaxyrise

  • Frequent Contributor
  • **
  • Posts: 531
  • Country: us
Re: Extended characters?
« Reply #11 on: August 27, 2016, 03:51:40 am »
Lord knows what they were thinking.
The SMF documentation on converting a database to UTF-8 talks about the database being in a code page if it's not in UTF-8 (apparently old enough SMF wasn't utf-8?)  So presumably that's related.

The only character sets in that script which I see support ? (\u03bc) are UTF-8 and windows-1253, both of which support upper-case Omega (\u03a9).  Since ? makes it through but capital omega gets lost, the data loss is somewhere else...
I am but an egg
 

Offline apelly

  • Supporter
  • ****
  • Posts: 1061
  • Country: nz
  • Probe
Re: Extended characters?
« Reply #12 on: August 27, 2016, 03:57:38 am »
Latex is installed on the board. You can \$\Omega\$ all you like, and it will display correctly
 
The following users thanked this post: Galaxyrise

Offline tautech

  • Super Contributor
  • ***
  • Posts: 29485
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: Extended characters?
« Reply #13 on: August 27, 2016, 04:33:21 am »
? ohm seems to work for me.
[ahaha never mind it turned into a question mark]
Yep! It works when you're typing up the post, but get's elided by the forum software. Many people have fallen into that trap :)
What's the trick with ohms then?
Dave's trying to sort it here:
https://www.eevblog.com/forum/chat/can-we-have-an-omega-smiley/
Avid Rabid Hobbyist.
Some stuff seen @ Siglent HQ cannot be shared.
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3680
  • Country: us
Re: Extended characters?
« Reply #14 on: August 27, 2016, 04:49:28 am »
What's the trick with ohms then?
The trick is to use two dollar signs (each preceded by a backslash), and put \Omega between them. This makes it render using MathJax, which has all of the greek and other math symbols (and is processed in the browser, so SMF scripts aren't involved).
You can see more commands here: http://meta.math.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference

\$ \Omega \$
\$ \mathfrak {Judas \; Priest} \$

Unfortunately, it causes the page to resize after rendering in an annoying way.
« Last Edit: August 27, 2016, 05:02:32 am by helius »
 
The following users thanked this post: DiligentMinds.com, tautech

Offline IanB

  • Super Contributor
  • ***
  • Posts: 12406
  • Country: us
Re: Extended characters?
« Reply #15 on: August 27, 2016, 06:43:34 am »
Here is a list of special characters which survive in posts without being converted to question marks:

£ ¥ § © ® « » ¬ ° ± ² ³ µ ¼ ½ ¾ ÷
 

Offline IanB

  • Super Contributor
  • ***
  • Posts: 12406
  • Country: us
Re: Extended characters?
« Reply #16 on: August 27, 2016, 06:48:40 am »
These, I think, will get turned into "?", but let me try anyway:

™ ? ? ? ? ? ? ? ? ?
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 38720
  • Country: au
    • EEVblog
Re: Extended characters?
« Reply #17 on: August 27, 2016, 07:04:02 am »
The SMF documentation on converting a database to UTF-8 talks about the database being in a code page if it's not in UTF-8 (apparently old enough SMF wasn't utf-8?)  So presumably that's related.

BTW, I have asked gnif (resident forum penguin master) about converting to UTF-8 and he does not recommend it, it never goes smoothly, and we have a massive back database of posts that it could mess up.
 

Offline Simon

  • Global Moderator
  • *****
  • Posts: 18065
  • Country: gb
  • Did that just blow up? No? might work after all !!
    • Simon's Electronics
Re: Extended characters?
« Reply #18 on: August 27, 2016, 07:12:06 am »
Here is a list of special characters which survive in posts without being converted to question marks:

£ ¥ § © ® « » ¬ ° ± ² ³ µ ¼ ½ ¾ ÷


Im Pretty sure those are all in the ASCII set
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2267
  • Country: 00
Re: Extended characters?
« Reply #19 on: August 27, 2016, 07:25:16 am »
Here is a list of special characters which survive in posts without being converted to question marks:

£ ¥ § © ® « » ¬ ° ± ² ³ µ ¼ ½ ¾ ÷


Im Pretty sure those are all in the ASCII set

No, they are not. ASCII is 7 bit only, hence has only 128 characters including some "invisible" control-characters.

I gues you are confused with "extended" ASCII which are unofficial 8 bit extensions.
Because of all the different extended ASCII tables, there's no guarantee your special character will look the same
on another computer.

Quote
The term extended ASCII (EASCII or high ASCII) refers to eight-bit or larger character encodings that include
the standard seven-bit ASCII characters, plus additional characters. The use of the term is sometimes criticized,
because it can be mistakenly interpreted to mean that the ASCII standard has been updated to include more
than 128 characters or that the term unambiguously identifies a single encoding, both of which are not the case.

https://en.wikipedia.org/wiki/Extended_ASCII



 

Offline Simon

  • Global Moderator
  • *****
  • Posts: 18065
  • Country: gb
  • Did that just blow up? No? might work after all !!
    • Simon's Electronics
Re: Extended characters?
« Reply #20 on: August 27, 2016, 07:34:18 am »
oh I see
 

Offline Galaxyrise

  • Frequent Contributor
  • **
  • Posts: 531
  • Country: us
Re: Extended characters?
« Reply #21 on: August 27, 2016, 10:26:47 pm »
Here is a list of special characters which survive in posts without being converted to question marks:

£ ¥ § © ® « » ¬ ° ± ² ³ µ ¼ ½ ¾ ÷


Im Pretty sure those are all in the ASCII set

Presuming the database is in ISO8859-1, then the extended characters should be:

  ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª «   ¬ ® ¯
° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ


ISO8859-1 leaves 33 undefined from 127-159, which must be where ™ ends up. So I sent myself a message with some other similar characters, and also found:

€ ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜  ™ š › œ ž Ÿ

 I wonder if these are part of why converting to UTF-8 is a pain. 
« Last Edit: August 27, 2016, 10:49:22 pm by Galaxyrise »
I am but an egg
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3680
  • Country: us
Re: Extended characters?
« Reply #22 on: August 28, 2016, 12:05:01 am »
Quote
ISO8859-1 leaves 33 undefined from 127-159, which must be where ™ ends up. So I sent myself a message with some other similar characters, and also found:

Those are Windows-1252 characters.
Oddly, I can't use "Insert Quote" to quote your message; some script gets wedged trying to translate the characters in it.
 

Offline bson

  • Supporter
  • ****
  • Posts: 2465
  • Country: us
Re: Extended characters?
« Reply #23 on: August 28, 2016, 12:27:14 am »
The page announces that it is outputting UTF-8

Yeah, but it gets stored in a database, and mysql and a lot of forum software have very poor multibyte support with unpredictable problems with things like strings that end mid-sequence or use invalid/undefined codes.  Apart from crashing or producing unpredictable results it's also an SQL injection attack vector.  So it's customary to scrub text before storing it.  That's why it can look correct in a forum preview, but then when you post it gets scrubbed when committed to the db, and when output again you get the scrubbed version.
 
The following users thanked this post: Brumby, newbrain

Offline Len

  • Frequent Contributor
  • **
  • Posts: 552
  • Country: ca
Re: Extended characters?
« Reply #24 on: August 28, 2016, 03:52:57 pm »
Does that mean forum software typically does not cater for multibyte code pages? What about SMF in particular? How long will it take to get mysql to properly support multi-byte languages?

Of course MySQL (recent versions) supports Unicode characters for all languages. But, of course, it also allows you to not use Unicode if you choose. The same is true of most other software (operating systems, web browsers, forums, etc.) It's easy to design a system that fails to properly support multi-lingual text from end to end.

I don't know about the SMF forum software in particular. It might just be a configuration problem in SMF or MySQL, or it might not be possible for an omega character to survive all the from your keyboard to my screen.
DIY Eurorack Synth: https://lenp.net/synth/
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf