facebook

Cannot safe japanese characters with ISO-8859-1

  1. MyEclipse Archived
  2.  > 
  3. Web Development (HTML, CSS, etc.)
Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • #218831 Reply

    Hello,

    I’ve just loaded a document with the Content-type “text/html; charset=ISO-8859-1”. It contained the following code:

    “Japanese: 宮”

    During loading, the encoded kanji were replaced with the symbol 宮.

    Then, I tried to save. MyEclipse warned me that the symbols couldn’t be converted to ISO-8859-1 (which is not true; you just have to escape them). So I changed the encoding the in the Content-type to UTF-8. Now, MyEclipse would save but in the resulting file, the kanji symbols were replaced by “?”.

    Suggestions:

    1. When a symbol can’t be expressed in a certain encoding, convert it to the escaped form (&#…;)

    2. Leave escaped symbols in the input alone

    Thanks,


    Aaron Digulla
    http://www.philmann-dark.de/

    #218835 Reply

    Here is an example HTML. To reproduce the problem:

    1. Open Notepad, paste the text into the file, save it into the workspace
    2. Refresh the project
    3. Open the file in the MyEclipse HTML editor
    4. Switch to design mode. The first kanji will show as 雨, the next two will be OK.
    5. Click anywhere and type something
    6. Save the file
    7. You will get a warning: The encoding (ISO-8859-1) cannot … (such as the one in position 376). Press OK.
    8. Switch back to Source view: The Editor will have changed the escaped kanji to UTF8.

    If you load the file from disk with Notepad, you’ll see that it will have written “?” to the file. Now, the editor and the file on disk contain different data.

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head><title>Usagi Yojimbo Dojo - Ame Tomoe</title>
    <meta http-equiv="Content-type" content="text/html; charset=ISO-8859-1" />
    <meta name="distribution" content="global,local" />
    </head><body>
    
    <h1>Character Information</h1>
    
    <p>雨朋絵</p>
    
    </body>
    </html>
    
    #218836 Reply

    Sorry for the double post but for some reason I cannot preview this nor can I edit the post…

    Here is an example HTML. To reproduce the problem:

    1. Open Notepad, paste the text into the file, save it into the workspace
    2. Refresh the project
    3. Open the file in the MyEclipse HTML editor
    4. Switch to design mode. The first kanji will show as &#38632;, the next two will be OK.
    5. Click anywhere and type something
    6. Save the file
    7. You will get a warning: The encoding (ISO-8859-1) cannot … (such as the one in position 376). Press OK.
    8. Switch back to Source view: The Editor will have changed the escaped kanji to UTF8.

    If you load the file from disk with Notepad, you’ll see that it will have written “?” to the file. Now, the editor and the file on disk contain different data.

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head><title>Usagi Yojimbo Dojo - Ame Tomoe</title>
    <meta http-equiv="Content-type" content="text/html; charset=ISO-8859-1" />
    <meta name="distribution" content="global,local" />
    </head><body>
    
    <h1>Character Information</h1>
    
    <p>&#38632;&#26379;&#32117;</p>
    
    </body>
    </html>
    
    #218837 Reply

    Additional note: To make the example work, remove the three “amp;” in the code after you pasted it to notepad.

    #218846 Reply

    Riyad Kalla
    Member

    Aaron,
    Thank you for the very detailed report, I am looking into this now and will file it if I can reproduce it.

    #218850 Reply

    Riyad Kalla
    Member

    Aaron,
    I followed your steps as you posted them and was unable to reproduce this problem, the content I ended up with in the editor, and notepad and double checked in textpad was this:

    
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
    
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <title>Usagi Yojimbo Dojo - Ame Tomoe</title>
    <meta http-equiv="Content-type" content="text/html; charset=ISO-8859-1" />
    <meta name="distribution" content="global,local" />
    </head>
    
    <body>
    <h1>dCharacter Information</h1>
    
    <p>#38632;#26379;#32117;</p>
    </body>
    </html>
    

    I did switch to the design view, did add the “p” at the beginning of “Character” and saved it… I wasn’t prompted about any encoding issues however. When I checked the file encoding in Eclipse it is infact ISO-8859-1 and I am on Windows XP Pro SP2 with a US English locale.

    Did I miss any of the steps you outlined?

    #218917 Reply

    @support-rkalla wrote:

    
    <p>#38632;#26379;#32117;</p>
    

    Did I miss any of the steps you outlined?

    Yes, it’s just a slight but important typo: The & before the # are missing. In my instructions, I said to remove only “amp;”, not the “&” 🙂

    #218956 Reply

    Riyad Kalla
    Member

    Hmmm yes I see what you mean, I was able to reproduce this problem. I will file it ASAP, thank you for taking the time to walk me throught his.

    #235391 Reply

    alexiz
    Member

    <%@page contentType=”text/html; charset=ISO-2022″%>

    Please put the above line to firstline of your jsp file . must be first line.

Viewing 9 posts - 1 through 9 (of 9 total)
Reply To: Cannot safe japanese characters with ISO-8859-1

You must be logged in to post in the forum log in