Export Chinese Excel to CSV

This Excel tutorial explains how to export Chinese Excel to CSV, the method is also applicable to other languages. (解決Excel轉CSV亂碼).

You may also want to read:

Import Chinese CSV to Excel

Export Chinese Excel to CSV

When you export Chinese Excel to CSV (Comma-separated values), some Chinese characters will become a question mark “?”, same will happen for symbols and other languages such as Japanese, why is that?

To understand what is happening, we need to know what is encoding. Computer stores a set of codes that represent a single character, instead of directly storing the character actually want to actually show. For conversion of worksheet to CSV, you need to know the below encoding systems.

ASCII (American Standard Code for Information Interchange)

ASII uses 8-bit code units, an old encoding system which stores mainly numbers, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes. Many old systems still use this encoding system.

8 bit means the computer memory uses “8” digits with 1 and 0 combination (binary) to represent a character, 8 bits memory is equal to 1 byte. For example,

1000001 represents A

0111001 represents 9

Big5 and GB

Based on ASCII system, Taiwan developed Big5 encoding system for Traditional Chinese, and China developed GB encoding system for Simplified Chinese. Because you can only choose one encoding system at one time, you cannot mix a file with both Traditional and Simplified Chinese.

Unicode

Unicode is the most popular encoding system nowadays and has become an international standard, the latest version of Unicode contains a repertoire of more than 110,000 characters (support different languages) covering 100 scripts and multiple symbol sets. Unicode has two versions: UTF-8 and UTF-16.

UTF-8 encoding is variable-length and uses 8-bit code units, designed for backward compatibility with ASCII.

UTF-16 encoding uses 16-bit code units, it is not compatible with ASCII.

Solution to export Chinese Excel to CSV

When you export Excel as CSV, the encoding of CSV depends on the language of your Windows,

– English Windows: ASCII
– Traditional Chinese Windows: BIG5
– Simplified Chinese Windows: GB2312

Now you understand that you should have no problem exporting CSV in your Windows language, you only have problem exporting foreign language as CSV. For example, if you are using Traditional Chinese Windows, you cannot export CSV with Simplified Chinese, you will see question mark “?”. To export in Unicode, we need the help of TXT (tab delimited text) to work around.

Step 1) Export worksheet as “Unicode Text (.txt)”

unicode_01

Step 2) Open the .txt with Notepad

unicode_02

Step 3) Copy a “tab”

unicode_03

Step 4) Find and Replace (Ctrl+H)  the “tab” with comma

unicode_04

Step 5) Save file as CSV, choose “Unicode” in encoding.

unicode_05

 

 Outbound References:

http://en.wikipedia.org/wiki/Character_encoding

http://en.wikipedia.org/wiki/ASCII#Bit_width

http://en.wikipedia.org/wiki/UTF-16

 

Leave a Reply

Your email address will not be published.