INTRODUCTION
You can't spend very long surfing the web nowadays without encountering non-English words, names or accent marks (known as "diacritics"). Resumés, fiancés, clichés... from the first Noël to the last jalapeño, English has become a truly global language. Why then is it so hard to read or create webpages containing foreign characters? It doesn't have to be. This tutorial will show you how to be a true netizen of the world in no time flat.
READING WEBPAGES
1. If possible, update your browser and computer operating system to the newest versions available. This will enable you to view most languages without taking any further steps.
2. Check your browser's default character set. Go to the 'View' menu, then to the 'Character Set', 'Character Encoding' or 'Text Encoding' menu. The default setting should be "UTF-8".
3. Some browsers, like Internet Explorer for Mac, have relatively limited language support. You can check your browser's support for a specific language or character here.
4. No luck? Try a different browser. Opera (Windows users should download the international version) and Safari (comes with Mac OSX) seem to have the most complete support, followed by Firefox, then Netscape.
5. If you still can't view a language, download a font specifically designed for it.
WRITING WEBPAGES
First, some background. "Unicode" is the universal character set. It assigns machine-readable codes to every letter, number and symbol character used by humankind. (Not an easy task!) Unicode provides unique placeholders for these characters, but not the fonts needed to view them. Fortunately, pretty much everyone with a computer has a font covering the major European languages.
It's a lot easier to type or paste "é" into a webpage than to figure out its special code. To be able to type or paste foreign characters, just define the page's content type as UTF-8. UTF-8 is a more practical subset of unicode.
Most webpages define their content type as ISO-8859-1, the standard content type for English. Here's an ISO-8859-1 tag, which would be in the <head> section of your code:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Replace the above tag with this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Next, you'll need to tell the user's machine to view your page using the most universal font it has available. Do this by using your cascading style sheet. If you don't already have one, copy this into a text file and save it as "your_file_name.css":
body { font-family: Code2000, "Arial Unicode MS", "GNU Unifont", "Bitstream Cyberbit", Cyberbit, "TITUS Cyberbit Basic", "Microsoft Sans Serif", "MS Sans Serif", "Everson Mono Unicode", "Everson Mono", "DejaVu Sans", Tahoma, sans-serif; }
GNU Unifont, which is a unicode font for Linux users, isn't very attractive, so sticklers for clean design may want to avoid listing it. You'll notice that the popular Verdana font isn't listed above. Verdana has a bug that causes diacritics (accent marks) to display in the space next to a character rather than above it.
Lastly, place this code within the <head> of your HTML document, with the correct path to your css file:
<link rel="stylesheet" href="your_file_name.css" type="text/css">
Wherever possible, avoid setting fonts within the HTML of your page, since these will override your unicode fonts.
NOTE:
If you just have one or two non-English characters, and prefer to stick with the ISO-8859-1 content type, here's an easy way to convert them into unicode. Paste the character(s) into the 'Unicode Characters' field, click 'Convert' and use the encoding in the "decimal, partial" or "decimal, full" field.
Certain languages don't display correctly in most HTML editors. Try viewing the page in your browser. Everything should look just fine.
While we're on the subject of HTML editors, some will actually encode your page's text for you. For example, if you change your encoding from UTF-8 to ISO-8859-1, the editor will convert all of the non-English characters into their unicode HTML codes. Adobe GoLive is one product with this feature. Of course, this makes your source code more difficult to read, so stick with UTF-8 if you have a multi-language page.
Now go help the world communicate! ;)
KHHow2s
index