Professional translation of CSV files – What needs to be considered?

CSV stands for comma-separated values or character-separated values. As the name implies, the values here are separated by characters. There is no official standard for the use of this file format, but it is thoroughly explained in RFC

4180. CSV files are quite appropriate for translations, although there are certain things that still need to be taken into consideration.

Which system uses CSV?

As a rule, proprietary systems and applications use the CSV file format. Texts are frequently exported from databases, such as in content management systems, with the aid of a CSV format. Databases can use a wide range of formats for importing and exporting, while CSV is the simplest solution. For better control of the structure, an XML format is appropriate, which can be populated from a CSV data file.

Character sets or code page

A character set, also called a code page, defines a mapping of numbers and characters. Unicode is the format most frequently used and UTF-8 is the coding most frequently used for Unicode characters. If UTF-8-coded CSV files are used, these can be translated into any language. This also works with other character sets, but you need to be aware that a CSV file contains no indication of the format in which it is coded. This can lead to problems.

If the file is in ANSI format with the Western European character set, for instance, and is to be translated into an Eastern European variant, you quickly realize that these have other characters for the same codes. The possible ways of translating in ANSI format are thus limited. For that reason, it is advisable to use Unicode, such as UTF-8 from the start.

Which data can be put into a CSV file?

Basically, anything that is text-based can be put into a CSV file. Our project managers already have experience in projects with HTML and XML in CSV files. Both work without a problem. However, we once had a case where standard tools were used to export a file in the CSV format. At first, everything looked fine, but then, we found out that the tool could not export any fields with more than 32,768 characters. Such difficulties are special cases, of course, and rarely occur. In this case, we were able to notify the client in time and thus prevent further problems.

