Get your favorite beverage, sit back, and join in the discussion
You are not logged in.
If you find you have strange characters in your story, it means there are non-ASCII characters in the file that you supplied.
I create my HTML by saving a document out of LibreOffice as an HTML file. However, LibreOffice likes to make open and close quote characters, rather than the generic ASCII one. The same goes for the apostrophe and a few others.
But how do you know there are strange characters in your file? I've written this tiny python program to list all non-ASCII characters in a file. Unfortunately, you need to be happy running Python from source. Only an idiot would run an EXE from a site such as this so I'm not going to supply one. Do not run even source code without understanding exactly what it does.
import sys // License: CC_0 // Please do NOT attribute this code args_count = len(sys.argv) if args_count < 2: print("usage:", sys.argv[0], " <filename>") exit(1) with open(sys.argv[1], encoding="utf-8") as f: c = "*" chars = set() while len(c) > 0: if ord(c) > 254: chars.add(c) c = f.read(1) for c in chars: print("We've got char", ord(c), "(", c, ")")
>python charcheck.py teddy.html
We've got char 8221 ( � )
We've got char 8217 ( ’ )
We've got char 8230 ( … )
We've got char 8220 ( “ )
You can then cut and paste the characters in that output into a find-and-replace dialog in your text editor.
Offline
It should be pointed out that the site will not have any problem with these characters if the file is actually encoded as UTF-8. The problem is that, if it was saved on Windows, it probably won't be. It will likely be encoded as "ANSI". But ANSI is not an actual encoding, it is an alias to the system's default encoding. On Windows, the default encoding is Windows-1252. The server, however, is running Linux, and the default encoding on Linux is Latin-1. Since the two don't agree, the server can't decode the file properly, and that's why we get garbled text with non-ASCII characters. The real solution is just to make sure you're sending a UTF-8 encoded file.
If you want to make damned sure that your file is in UTF-8 form before uploading it, load it into Notepad, then go to File > Save As... At the bottom of the Save As... dialog, there is an encoding drop-down list. Choose UTF-8 from that drop down, and save the file. This will guarantee a UTF-8 encoded file, which the site should be able to translate, even with the non-ASCII characters in it.
Eric Storm
Offline