Import Chinese CSV to Excel
When you import Chinese CSV to Excel, some Chinese characters will become a question mark “?” or convert to unexpected Chinese characters, same will happen for symbols and other languages such as Japanese, why is that?
To understand what is happening, we need to know what is encoding. Computer stores a set of codes that represent a single character, instead of directly storing the character actually want to actually show. For conversion of worksheet to CSV, you need to know the below encoding systems.
ASCII (American Standard Code for Information Interchange)
ASII uses 8-bit code units, an old encoding system which stores mainly numbers, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes. Many old systems still use this encoding system.
8 bit means the computer memory uses “8” digits with 1 and 0 combination (binary) to represent a character, 8 bits memory is equal to 1 byte. For example,
1000001 represents A
0111001 represents 9
Big5 and GB
Based on ASCII system, Taiwan developed Big5 encoding system for Traditional Chinese, and China developed GB encoding system for Simplified Chinese. Because you can only choose one encoding system at one time, you cannot mix a file with both Traditional and Simplified Chinese.
Unicode is the most popular encoding system nowadays and has become an international standard, the latest version of Unicode contains a repertoire of more than 110,000 characters (support different languages) covering 100 scripts and multiple symbol sets. Unicode has two versions: UTF-8 and UTF-16.
UTF-8 encoding is variable-length and uses 8-bit code units, designed for backward compatibility with ASCII.
UTF-16 encoding uses 16-bit code units, it is not compatible with ASCII.
Solution to import Chinese CSV to Excel
Because there are so many encoding systems out there, we have to understand clearly what kind of encoding system was used to create the CSV, and how the application that you open the file with encodes the file.
When you open Chinese CSV with Excel, the encoding of CSV depends on the language of your Windows,
– English Windows: ASCII
– Traditional Chinese Windows: BIG5
– Simplified Chinese Windows: GB2312
Most problems arise from exporting web database to CSV, where the encoding is usually Unicode UTF-8 instead of Big5 or GB. To solve the problem, you can open the CSV with Notepad and change the encoding to ASCII in order to sync the encoding system with Excel.
However, if your file is a mix of Chinese and other languages, you will lost characters other than Chinese, because you are not encoding with Unicode.
An alternative solution would be using the Excel Import function instead of double clicking the CSV to open with Excel.
Navigate to Excel menu bar Data > From text
Now you can choose UTF-8 (if the exported CSV is UTF-8)