Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charset in iframe is not utf8 #184

Closed
davidshen84 opened this issue Sep 23, 2012 · 29 comments
Closed

charset in iframe is not utf8 #184

davidshen84 opened this issue Sep 23, 2012 · 29 comments

Comments

@davidshen84
Copy link

i create a epicEditor with default setting, and entered below text:

empty

   content

then i get the content from the editor by using:

var data = editor.exportFile('myfile');

when i post the data to my server, i saw

"#empty\n\n  content"

the spaces are incorrectly encoded

@OscarGodson
Copy link
Owner

Hmm, I can't reproduce this on my end since I use it everyday with spaces. Do you think adding the UTF8 meta tag will fix this? Could you send a pull request of a fix? If not ill try adding the UTF8 meta tag in another branch and let you try it out.

@mxswd
Copy link

mxswd commented Sep 23, 2012

Yeah I get this too. It's really annoying. I had to filter unicode out to fix it.

When I use 4 spaces and export, I get 2 unicode chars at the start of the line. So when I pass it to be formatted, it doesn't see the code block.

I can write up a test tomorrow...

@OscarGodson
Copy link
Owner

Are you on a western keyboard? And are you seeing this in your console when you do exportFile or after it goes into a DB or something else?

On Sep 23, 2012, at 8:39 AM, "Maxwell Swadling" notifications@github.com wrote:

Yeah I get this too. It's really annoying. I had to filter unicode out to fix it.

When I use 4 spaces and export, I get 2 unicode chars at the start of the line. So when I pass it to be formatted, it doesn't see the code block.

I can write up a test tomorrow...


Reply to this email directly or view it on GitHub.

@OscarGodson
Copy link
Owner

Yeah, tried even changing languages and no luck. What browser, OS, and keyboard setup are you guys on?

@mxswd
Copy link

mxswd commented Sep 23, 2012

Ok here is the bug.

https://github.com/maxpow4h/EpicEditor/commit/c4969d2d6654c83d56abbad693aa1a42a88a940a

Get the gh-pages branch and make apply that patch / make those 2 changes.
Run the ruby app.rb or setup a localhost one of your own.

In the editor write up something like

asdf

    asdf

asdf

Then hit the try button, it does a ajax post to /api so you can inspect the chars properly (I think webkit filters the console.log?...)

You will see something like this:

But if you inspect the chars, you should see

This is on Mac, Chrome, Japanese Keyboard in Romaji.

@OscarGodson
Copy link
Owner

Until I get a second to play with it, does adding the UTF8 meta tag to the editor fix the issue?

On Sep 23, 2012, at 10:10 AM, "Maxwell Swadling" notifications@github.com wrote:

Ok here is the bug.

maxpow4h/EpicEditor@c4969d2

Get the gh-pages branch and make apply that patch / make those 2 changes.
Run the ruby app.rb or setup a localhost one of your own.

In the editor write up something like

asdf

asdf

asdf
Then hit the try button, it does a ajax post to /api so you can inspect the chars properly (I think webkit filters the console.log?...)

You will see something like this:

But if you inspect the chars, you should see

This is on Mac, Chrome, Japanese Keyboard in Romaji.


Reply to this email directly or view it on GitHub.

@OscarGodson
Copy link
Owner

oh, and big thanks for this test case! I've heard a couple other people doing Ruby stuff have this issue so hopefully your test case will help me figure this out!

On Sep 23, 2012, at 10:10 AM, "Maxwell Swadling" notifications@github.com wrote:

Ok here is the bug.

maxpow4h/EpicEditor@c4969d2

Get the gh-pages branch and make apply that patch / make those 2 changes.
Run the ruby app.rb or setup a localhost one of your own.

In the editor write up something like

asdf

asdf

asdf
Then hit the try button, it does a ajax post to /api so you can inspect the chars properly (I think webkit filters the console.log?...)

You will see something like this:

But if you inspect the chars, you should see

This is on Mac, Chrome, Japanese Keyboard in Romaji.


Reply to this email directly or view it on GitHub.

@OscarGodson
Copy link
Owner

Trying to test it but when I try to install Sinatra:

⨀_⨀ gh-pages EpicEditor $ gem install sinatra
Fetching: rack-protection-1.2.0.gem (100%)
Fetching: sinatra-1.3.3.gem (100%)
Successfully installed rack-protection-1.2.0
Successfully installed sinatra-1.3.3
2 gems installed
/usr/local/rvm/gems/ruby-1.9.3-p194/gems/json-1.7.4/lib/json/ext/parser.bundle: [BUG] Segmentation fault
ruby 1.8.7 (2012-02-08 patchlevel 358) [universal-darwin12.0]

Not much of a ruby guy. looking into it tho

@OscarGodson
Copy link
Owner

So, I did a test with a node framework (Geddy). I checked my DB and see:

No special characters. This is with the following code:

I then tried looking at what gets output into the browser if I empty the contents raw, but I just see actual spaces:

I'm thinking this is either a Ruby or japanese keyboard specific thing. Here's a stackoverflow thread on something similar if not the same issue:

http://stackoverflow.com/questions/11512592/ruby-markdown-string-processing-issue-with-encoding

The guy there said it was not an EpicEditor issue, but if there's a fix I can make on the front-end side to fix Ruby I'd love to do it. Unfortunately I'm not well versed in Ruby and Sinatra doesn't seem to want to install on my machine. Any help debugging that error or if you use another framework or just raw ruby would help :)

UPDATE:
Oh good, the new skitch evernote links dont work at all creating broken images. Fantastic.

@mxswd
Copy link

mxswd commented Sep 23, 2012

Ok, can you try using this

var express = require('express');
var app = express();

app.use(express.bodyParser());
app.post('/api', function(req, res) {
  var r = req.body.text;
  var b = new Buffer(r, 'ascii');
  console.log(b.toString());
  res.send("ok");
});

app.use("/", express.static(__dirname));

app.listen(3000);

And using a POST like this

$.ajax({
    url: '/api',
    type: 'POST',
    contentType: 'application/json',
    data: JSON.stringify({"text": editor.exportFile()}),
});

That will treat it like ASCII, so if you see this in your terminal.

Then we both have the bug.

I don't think it's a Ruby bug, but it could be something with my keyboard. I'm still testing that out.

EDIT: The post on stack overflow has an server side fix by doing:

bad = "#{194.chr}#{160.chr}".force_encoding('utf-8')
good = 32.chr
self.body = body.gsub(bad, good)

So I think these are the chars we should be looking at debugging.

@davidshen84
Copy link
Author

@OscarGodson my system configuration is pure English with language supporting Chinese. And I was inputting English using English keyboard setting.

@mxswd
Copy link

mxswd commented Sep 24, 2012

I just tried the U.S. keyboard on Safari and got the same bug.

@OscarGodson
Copy link
Owner

Well, in the example you have above you're converting the encoding to ascii, right? So, no matter what the encoding was before it'll be messed up because you're converting a rich encoding down to ascii. This is why, by default, this works out of the box with in Node (express, geddy, etc). You're manually trying to convert to ascii.

The original ticket says "charset in iframe is not utf8". There's a simple way to test this:

editor.getElement('editor').characterSet

Put that in your console on http://epiceditor.com or locally or wherever. For me it says UTF-8. Does it for you two?

I believe your Rails(?) app or other kind of Ruby app is not setting the character set header correctly or at all... I think.

When you load your app thats giving you this issue is the Content-Type:text/html; charset=UTF-8 set in the headers? In Node if you dont force ascii, UTF8 is the default and why I'm not seeing this in my apps.

Can you put that JS in your console and also see what headers your app is sending out?

@davidshen84
Copy link
Author

Hi Oscar,

My HTML has header "Content-Type: text/html; charset=utf-8", and my web app
is developed in Python2.7 with encoding set in the head. I am observing
this issue on the client, when it post the data. I am NOT trying to convert
UTF-8 to ASCII on the web app side. My browser is Chromium latest in Linux.

On Mon, Sep 24, 2012 at 11:13 AM, Oscar Godson notifications@git.luolix.topwrote:

Well, in the example you have above you're converting the encoding to
ascii, right? So, no matter what the encoding was before it'll be messed up
because you're converting a rich encoding down to ascii. This is why, by
default, this works out of the box with in Node (express, geddy, etc).
You're manually trying to convert to ascii.

The original ticket says "charset in iframe is not utf8". There's a simple
way to test this:

editor.getElement('editor').characterSet

Put that in your console on http://epiceditor.com or locally or wherever.
For me it says UTF-8. Does it for you two?

I believe your Rails(?) app or other kind of Ruby app is not setting the
character set header correctly or at all... I think.

When you load your app thats giving you this issue is the Content-Type:text/html;
charset=UTF-8 set in the headers? In Node if you dont force ascii, UTF8
is the default and why I'm not seeing this in my apps.

Can you put that JS in your console and also see what headers your app is
sending out?


Reply to this email directly or view it on GitHubhttps://github.com//issues/184#issuecomment-8806477.

Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

@OscarGodson
Copy link
Owner

Hmm. What does that line of JS return? That's (characterSet) a native JS method so it won't be wrong.

Running out of ideas if that JS call says UTF8... Can I see your JS? Are you setting the encoding type on your AJAX POST request?

On Sep 23, 2012, at 8:44 PM, "Xi Shen" notifications@github.com wrote:

Hi Oscar,

My HTML has header "Content-Type: text/html; charset=utf-8", and my web app
is developed in Python2.7 with encoding set in the head. I am observing
this issue on the client, when it post the data. I am NOT trying to convert
UTF-8 to ASCII on the web app side. My browser is Chromium latest in Linux.

On Mon, Sep 24, 2012 at 11:13 AM, Oscar Godson notifications@git.luolix.topwrote:

Well, in the example you have above you're converting the encoding to
ascii, right? So, no matter what the encoding was before it'll be messed up
because you're converting a rich encoding down to ascii. This is why, by
default, this works out of the box with in Node (express, geddy, etc).
You're manually trying to convert to ascii.

The original ticket says "charset in iframe is not utf8". There's a simple
way to test this:

editor.getElement('editor').characterSet

Put that in your console on http://epiceditor.com or locally or wherever.
For me it says UTF-8. Does it for you two?

I believe your Rails(?) app or other kind of Ruby app is not setting the
character set header correctly or at all... I think.

When you load your app thats giving you this issue is the Content-Type:text/html;
charset=UTF-8 set in the headers? In Node if you dont force ascii, UTF8
is the default and why I'm not seeing this in my apps.

Can you put that JS in your console and also see what headers your app is
sending out?


Reply to this email directly or view it on GitHubhttps://github.com//issues/184#issuecomment-8806477.

Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Reply to this email directly or view it on GitHub.

@davidshen84
Copy link
Author

I tried editor.getElement('editor').characterSet on
http://epiceditor.com/ in the console with Chrome 21 on Win7x64, and it
returns "ISO-8859-1" . My system language is English with "Language for
non-Unicoe programs" set to P.R.China.

On Mon, Sep 24, 2012 at 12:26 PM, Oscar Godson notifications@git.luolix.topwrote:

Hmm. What does that line of JS return? That's (characterSet) a native JS
method so it won't be wrong.

Running out of ideas if that JS call says UTF8... Can I see your JS? Are
you setting the encoding type on your AJAX POST request?

On Sep 23, 2012, at 8:44 PM, "Xi Shen" notifications@github.com wrote:

Hi Oscar,

My HTML has header "Content-Type: text/html; charset=utf-8", and my web
app
is developed in Python2.7 with encoding set in the head. I am observing
this issue on the client, when it post the data. I am NOT trying to
convert
UTF-8 to ASCII on the web app side. My browser is Chromium latest in
Linux.

On Mon, Sep 24, 2012 at 11:13 AM, Oscar Godson notifications@git.luolix.topwrote:

Well, in the example you have above you're converting the encoding to
ascii, right? So, no matter what the encoding was before it'll be
messed up
because you're converting a rich encoding down to ascii. This is why,
by
default, this works out of the box with in Node (express, geddy, etc).
You're manually trying to convert to ascii.

The original ticket says "charset in iframe is not utf8". There's a
simple
way to test this:

editor.getElement('editor').characterSet

Put that in your console on http://epiceditor.com or locally or
wherever.
For me it says UTF-8. Does it for you two?

I believe your Rails(?) app or other kind of Ruby app is not setting
the
character set header correctly or at all... I think.

When you load your app thats giving you this issue is the
Content-Type:text/html;
charset=UTF-8 set in the headers? In Node if you dont force ascii,
UTF8
is the default and why I'm not seeing this in my apps.

Can you put that JS in your console and also see what headers your app
is
sending out?


Reply to this email directly or view it on GitHub<
https://github.com/OscarGodson/EpicEditor/issues/184#issuecomment-8806477>.

Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/184#issuecomment-8807239.

Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

@OscarGodson
Copy link
Owner

Ah ha, that's probably it. I'll need to research how to override that. I think that UTF8 meta tag will fix it, but I'll need to figure out how to change the default encoding.

On Sep 23, 2012, at 10:03 PM, "Xi Shen" notifications@github.com wrote:

I tried editor.getElement('editor').characterSet on
http://epiceditor.com/ in the console with Chrome 21 on Win7x64, and it
returns "ISO-8859-1" . My system language is English with "Language for
non-Unicoe programs" set to P.R.China.

On Mon, Sep 24, 2012 at 12:26 PM, Oscar Godson notifications@git.luolix.topwrote:

Hmm. What does that line of JS return? That's (characterSet) a native JS
method so it won't be wrong.

Running out of ideas if that JS call says UTF8... Can I see your JS? Are
you setting the encoding type on your AJAX POST request?

On Sep 23, 2012, at 8:44 PM, "Xi Shen" notifications@github.com wrote:

Hi Oscar,

My HTML has header "Content-Type: text/html; charset=utf-8", and my web
app
is developed in Python2.7 with encoding set in the head. I am observing
this issue on the client, when it post the data. I am NOT trying to
convert
UTF-8 to ASCII on the web app side. My browser is Chromium latest in
Linux.

On Mon, Sep 24, 2012 at 11:13 AM, Oscar Godson notifications@git.luolix.topwrote:

Well, in the example you have above you're converting the encoding to
ascii, right? So, no matter what the encoding was before it'll be
messed up
because you're converting a rich encoding down to ascii. This is why,
by
default, this works out of the box with in Node (express, geddy, etc).
You're manually trying to convert to ascii.

The original ticket says "charset in iframe is not utf8". There's a
simple
way to test this:

editor.getElement('editor').characterSet

Put that in your console on http://epiceditor.com or locally or
wherever.
For me it says UTF-8. Does it for you two?

I believe your Rails(?) app or other kind of Ruby app is not setting
the
character set header correctly or at all... I think.

When you load your app thats giving you this issue is the
Content-Type:text/html;
charset=UTF-8 set in the headers? In Node if you dont force ascii,
UTF8
is the default and why I'm not seeing this in my apps.

Can you put that JS in your console and also see what headers your app
is
sending out?


Reply to this email directly or view it on GitHub<
https://github.com/OscarGodson/EpicEditor/issues/184#issuecomment-8806477>.

Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/184#issuecomment-8807239.

Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Reply to this email directly or view it on GitHub.

@OscarGodson
Copy link
Owner

@davidshen84 what do you see when you do the characterSet call in your app in Chrome's console? Is it the same or different?

@davidshen84
Copy link
Author

@osca, sorry, i cannot get the reference to the editor instance, and i
cannot modify the code right now. :)

On Mon, Sep 24, 2012 at 2:47 PM, Oscar Godson notifications@git.luolix.topwrote:

@davidshen84 https://github.com/davidshen84 what do you see when you do
the characterSet call in your app in Chrome's console? Is it the same or
different?


Reply to this email directly or view it on GitHubhttps://github.com//issues/184#issuecomment-8808683.

Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

@OscarGodson
Copy link
Owner

That's fine whenever you can. This bug is looking to be tricky especially since I'm having front end issues trying to dynamically set the character encoding. :)

On Sep 23, 2012, at 11:52 PM, "Xi Shen" notifications@github.com wrote:

@osca, sorry, i cannot get the reference to the editor instance, and i
cannot modify the code right now. :)

On Mon, Sep 24, 2012 at 2:47 PM, Oscar Godson notifications@git.luolix.topwrote:

@davidshen84 https://github.com/davidshen84 what do you see when you do
the characterSet call in your app in Chrome's console? Is it the same or
different?


Reply to this email directly or view it on GitHubhttps://github.com//issues/184#issuecomment-8808683.

Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Reply to this email directly or view it on GitHub.

@mxswd
Copy link

mxswd commented Sep 24, 2012

I think it is a problem with the editor. Try this on the default docs page.

tryItBtn.onclick = function () {
  var str = editor.exportFile();
  var i = 0;
  for (i = 0; i < str.length; i++) {
    console.log(str.charCodeAt(i));
  }
}

Where it should have 4 "32"'s for the 4 spaces, it has "32" "160" "32" "160". The "160"'s are the broken chars...

@OscarGodson
Copy link
Owner

Awh, 160 is &nbsp; for the editor pane. I think I figured it out... WebKit converts no-break-breaking-spaces to unicode characters when you ask for innerText. Firefox gives the actual HTML entity in our code. I need to replace \u00a0 to ' '.

OscarGodson added a commit that referenced this issue Sep 24, 2012
…ld appear in some cases on some platforms because they didnt understand no-break spaces (such as ascii)
@OscarGodson
Copy link
Owner

The fix is in develop. Can you guys check it out and see if that actually does fix it? Now the tryItBtn only returns 4 32 character codes.

Thanks a lot @maxpow4h !!!

@mxswd
Copy link

mxswd commented Sep 24, 2012

Awesome! Thanks for the fix!

@davidshen84
Copy link
Author

I think the issue has not been fix completely. When I edit the content,

  • then preview, the preview is correct;
  • then I export the file and post the data to my service, the data is correct;

But when I try to edit the content again, the 4 spaces are gone. This issue does not reproduce on http://epiceditor.com/, which does not have the latest developer version.

I think the editor needs to retain the char(160) for HTML to render the content in the editor-view correctly,

@OscarGodson
Copy link
Owner

I think it has something to do with how you're saving it or giving it back. The develop branch is working as expected:
http://screencast.com/t/6oHLdyi5

Tested on Firefox too. You can't test against epiceditor.com because you're passing it through a backend and DB. Could you show me with jing maybe? Maybe let me see your logs or give me the exact steps to repro it?

@maxpow4h Did you ever test this? Did the fix work for you or not?

In the meantime I'll setup a quick geddy + mongo "app" and see if I can reproduce this.

@OscarGodson
Copy link
Owner

Awh, yep. I was able to repro this by doing some stuff with geddy. Working on it! Something with the importFile method

OscarGodson added a commit that referenced this issue Sep 30, 2012
…-break space in chrome which is a unicode character \u00a0. Firefox was fine because we already were replacing spaces with &nbsp;s
@OscarGodson
Copy link
Owner

OK @davidshen84 could you pull and try again :) fingers crossed - it worked for me in my quick little blogging app I just did.

@johnmdonahue we need some tests to test that spaces are preserved correctly. Not sure how to test that tho off the top of my head. If you ever get time let me know if you have any good ideas on how to test all these spacing cases.

@mxswd
Copy link

mxswd commented Sep 30, 2012

@OscarGodson ah, I only export file, I don't import, so I never noticed.

edit: pulled latest version and added an importFile, works for me fine too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants