Dear Stefano,

When you export a code_list in Collect and open it in Excel, there is a problem with the characters.

item_code      item_label_en       Original
    1           Araruãna            Araruãna
    2           Copaíba              Copaíba
    3           Açaí                Açaí
    4           Maçã               Maçã

"Original" is what I typed in Collect. Is there a way to avoid this?

Best, Marcelo

asked 22 Jun '15, 15:15

Marcelo's gravatar image

Marcelo
806612
accept rate: 18%


Hi Marcelo,

It's not a problem with Collect but it's a problem with Excel: the generated CSV file uses a UTF-8 character encoding, but Excel tries to read it using a different encoding (probably ISO-8859-1) and does not give the possibility to choose a different one... If you need to use the generated CSV file now, what I can suggest is to use a different software, like Libre Office or Open Office, that is completely free and open source and it's a bit more powerful when using CSV files.

By the way, since we introduced a Excel format code list importer, we are also working on a Excel format code list exporter, so that people like you that are working in Windows won't have such a problem.

Many thanks.

permanent link

answered 22 Jun '15, 15:31

OF%20Collect's gravatar image

OF Collect ♦♦
1.3k4
accept rate: 16%

Just to complete this answer:

The way to open a CSV that is encoded in UTF-8 in Microsoft Excel is as follows :

Use the Data tab, then click on From Text :

alt text

Then choose the CSV file that was exported from Collect and then choose the Charset encoding in the selector ( Unicode (UTF-8) ):

alt text

permanent link

answered 23 Jun '15, 17:07

Open%20Foris's gravatar image

Open Foris ♦♦
2605611
accept rate: 8%

edited 23 Jun '15, 17:08

Dear Stefano,

the problem is not the encoding of the .csv file, but a persistent problem with the file encoding in the Tomcat settings (I reported on that earlier). Try to export e.g. a species list with foreign characters from Collect and you see that the outcome is ?????, Same problem occurs e.g. when exporting a Collect-Mobile file!

The point is that the file encoding in the Tomcat settings must be set. You find this variable in Collect/Tomcat/bin/setenv.sh and setenv.bat. The original setting should be extended to: export JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF8" in both files! Maybe you can consider this for the next versions.

Nils Nölke

permanent link

answered 29 Jun '16, 11:13

Fehrmann's gravatar image

Fehrmann
31338
accept rate: 0%

Dear Nils Nölke,
many thanks for your suggestion but... have you tried to apply the solution you mention? I just tried it and... simply it doesn't work...!
The problem with the CSV files is that the encoding used to generate the file is not written anywhere in the file (due to the nature of the file, that, like the extension says, it's just a list of "comma separated values") so when you open it, Excel tries to "guess" the encoding used, but it fails... You will have to use some other way to specify that the encoding used was "UTF-8"...
We will solve this "issue" by simply allowing the code/species lists to be exported directly to an Excel format (in that case the encoding will be written in the file).
Thanks again and check frequently for updates on Collect.
Open Foris Team

permanent link

answered 29 Jun '16, 18:27

OF%20Collect's gravatar image

OF Collect ♦♦
1.3k4
accept rate: 16%

Dear Stefano, the additional setting -Dfile.encoding=UTF8 in the Tomcat solves a big problem we always had before. Maybe my explanation was not sufficient to reproduce it:

  1. We are talking about a Windows installation (the problem during export is not evident under Linux, as the Tomcat uses the OS environment and encoding for the JVM is UTF8),
  2. You are right, the setting has no effect on the exported .csv via the export button in the species list (as .csv has no encoding) if you import it correctly,
  3. BUT: in our case we had the problem that exporting a collect survey (.collect and .collect-mobile) and importing it (on another installation or on the same) always destroyed the species list. It is here not about the "encoding of a .csv file" but about the encoding setting of the Tomcat. To reproduce the error (under Windows) simply export a survey with special characters and import it again (with and without the setting).
  4. The encoding problem of Tomcat running under Windows is known, you find several sources in the net. It happens during starting the JVM that encoding is set to "windows-1253" instead of "UTF8".

Best regards, Lutz & Nils

permanent link

answered 30 Jun '16, 08:07

Fehrmann's gravatar image

Fehrmann
31338
accept rate: 0%

Dear Luts and Nils,
many thanks for your detailed message, it has been very helpful. Actually you were right, there was a problem with the version of Collect prior to 3.10.13 under Windows: exporting a survey and importing it could destroy the special characters in the species lists.
We solved this problem by specifying the character encoding used to write each file during the survey export (we were doing it already but somehow this setting was ignored at a lower level, so basically it didn't have any effect).
Anyway, the problem has been solved with version 3.10.13 of Collect without modifying the setenv file (we have too many installation of Collect all around the world, standalone and on proper servers... it would be complex to update every of them...) so we will be independent from the environment where Collect runs.
You can try to update your installation and test it.
Let us know, many thanks.
Open Foris Team

permanent link

answered 04 Jul '16, 09:32

OF%20Collect's gravatar image

OF Collect ♦♦
1.3k4
accept rate: 16%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×208

question asked: 22 Jun '15, 15:15

question was seen: 8,421 times

last updated: 04 Jul '16, 09:32