My revised analysis below:
1.) Where is the content originally from?
One person sends a MS doc file. Why she doesn't have ability for MS docX, i don't know.
I open the file on mac os with latest Pages. If I cut and paste from Pages into JCE, I loose all the italics.
So I export the file from Pages to MS Word docX file. I then open it in Virtual Machine running Windows 10 with Word 2007.
Then save as "Single File Web Page" and it gets the ".mht" file extension. I open this file with Opera Browser, latest version on mac.
2. How are the spaces saved in the MHT file?
By doing a "hexdump -C filename" on this file, I see the place where the non breaking space goes as two bytes, "0d 0a". What these are I don't know. Other spaces are just encode as regular spaces as one byte hex "20" . (sample file attached)
3.) What does JCE do with these 0d 0a bytes?
In JCE Global configuration, If I have I have "Keep non-breaking spaces" set to Yes, when I cut and paste text from this web page, JCE converts these "0d0a" bytes to six characters " :" , ( the last one being a semicolon rather than a colon).
When I save the article, What ends up in the database are these six characters which can be seen if I dump the article from command line by
echo "select introtext from #_content where id = 48;" | myql database-name | hexdump -C
BTW, JCE does a great job of with MS Word cleanup.
4.)What does JCE do with 0d 0a bytes when "Keep non-breaking spaces" is set to No?
When I cut and paste text from this web page, JCE converts these "0d0a" to "c2a0" which is hex form for " :"
While in JCE, when I toggle off the editor to see the raw HTML, I see only a space, and not the " "; as before.
However, when I save the article and dump the database, I see the non-breaking space is still there, just in another form as "c2a0"
5.)Is there anyway to convert the non-breaking spaces to a regular space?
I can do this manually by leaving "Keep non-breaking spaces" set to Yes, and then toggling the editor and replacing them, but this is really time consuming.
I could also just paste the html from JCE into vim and do search and replace, then paste it back. But not all users have that ability.
I wonder if it would be possible for JCE to have a button to convert all "c2a0" to "20" and one to convert all " :" to regular space?