Export Chinese Excel to CSV
When you export Chinese Excel to CSV (Comma-separated values), some Chinese characters will become a question mark “?”, same will happen for symbols and other languages such as Japanese, why is that?
To understand what is happening, we need to know what is encoding. Computer stores a set of codes that represent a single character, instead of directly storing the character actually want to actually show. For conversion of worksheet to CSV, you need to know the below encoding systems.
ASCII (American Standard Code for Information Interchange)
ASII uses 8-bit code units, an old encoding system which stores mainly numbers, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes. Many old systems still use this encoding system.
8 bit means the computer memory uses “8” digits with 1 and 0 combination (binary) to represent a character, 8 bits memory is equal to 1 byte. For example,
1000001 represents A
0111001 represents 9
Big5 and GB
Based on ASCII system, Taiwan developed Big5 encoding system for Traditional Chinese, and China developed GB encoding system for Simplified Chinese. Because you can only choose one encoding system at one time, you cannot mix a file with both Traditional and Simplified Chinese.
Unicode is the most popular encoding system nowadays and has become an international standard, the latest version of Unicode contains a repertoire of more than 110,000 characters (support different languages) covering 100 scripts and multiple symbol sets. Unicode has two versions: UTF-8 and UTF-16.
UTF-8 encoding is variable-length and uses 8-bit code units, designed for backward compatibility with ASCII.
UTF-16 encoding uses 16-bit code units, it is not compatible with ASCII.
Solution to export Chinese Excel to CSV
When you export Excel as CSV, the encoding of CSV depends on the language of your Windows,
– English Windows: ASCII
– Traditional Chinese Windows: BIG5
– Simplified Chinese Windows: GB2312
Now you understand that you should have no problem exporting CSV in your Windows language, you only have problem exporting foreign language as CSV. For example, if you are using Traditional Chinese Windows, you cannot export CSV with Simplified Chinese, you will see question mark “?”. To export in Unicode, we need the help of TXT (tab delimited text) to work around.
Step 1) Export worksheet as “Unicode Text (.txt)”
Step 2) Open the .txt with Notepad
Step 3) Copy a “tab”
Step 4) Find and Replace (Ctrl+H) the “tab” with comma
Step 5) Save file as CSV, choose “Unicode” in encoding.