encoding errors
Nowadays everything that is text should be encoded as utf-8 without bom in files unless you have a very good reason not to.
That said, let's explore the problems if it is not
Here are a few examples:
Encoding | Interpreted as | encoded text | Looks like | Python3 |
---|---|---|---|---|
ASCII | utf-8 | A (0x41) | A | "A".encode("ascii").decode("utf-8") |
iso_8859-1 | utf-8 | é (0xE9) | error / � | "é".encode("iso_8859-1").decode("utf-8") |
utf-8 | ascii | é | error / � | "é".encode("utf-8").decode("ascii") |
utf-8 | iso_8859-1 | é | é (0xC2..) | "é".encode("utf-8").decode("iso_8859-1") |
binary | iso_8859-1 | 1111111...1111111 | ÿÿÿ....ÿÿÿ | bytes([0b11111111]).decode("iso_8859-1") |
If you see Â, Ã, Ä or Å followed by another character, you are viewing a file that is utf-8 encoded and contains C1 Controls and Latin-1 Supplement or Latin Extended-A characters in a viewer that interprets them as iso_8859 (ISO Latin).
If you see □, then your font does not contain the character. It may also be that the character is unprintable, like the vertical space that microsoft word adds incorrectly when pressing ctrl-enter.