Dear OF Team,

I would like to import a rather long species list (csv >700 MB) to Collect, using the provided R-scripts (https://drive.google.com/open?id=1FKe3dlFRgw1IrUmUtF6q02j4qDr9gkki). As reference, I am working with the PDF Species coding approach for Open Foris Collect and Calc. However, I can't get the script to work properly as the following step keeps resulting in an error:

split scientific names into genus and species

sp_dt[, c( "genus", "species", "tspecies", "subspecies" ) := tstrsplit(scientific_name, " ", fixed=FALSE)] Error in [.data.table(sp_dt, , :=(c("genus", "species", "tspecies", : Supplied 4 columns to be assigned 2 items. Please see NEWS for v1.12.2.

Even though I am considering splitting the csv into separate species lists according to the taxonomical order, I'd still need the script to work.

Can anybody point me in the right direction to find the source of this error?

Thank you very much in advance,

Alex

asked 10 Dec '20, 12:00

wexxo's gravatar image

wexxo
1301926
accept rate: 0%


Dear Alex,

please try to update package data.table

This is probably caused by a bug in that package and it should be already fixed, see e.g. https://github.com/Rdatatable/data.table/issues/3495

Does this help?

Regards, Lauri

permanent link

answered 10 Dec '20, 16:47

LauriV's gravatar image

LauriV ♦
4002313
accept rate: 20%

Dear Lauri,

thank you for your quick answer. Unfortunately, updating the data.table (+all) packages and R.Studio (v.1.3.1093) did not solve the problem. The error output stays the same, print(sp_dt)at this points yields:

family scientific_name 1: Chactidae auyantepuia amapaensis 2: Chactidae auyantepuia laurae


108: Chactidae vachoniochactas lasallei 109: Chactidae vachoniochactas roraima

The csv represents an extract of the final species list to be imported (from the GBIF DB) for testing purposes.

(11 Dec '20, 07:29) wexxo wexxo's gravatar image

Dear Alex, is comma the separator in your CSV input file? I noticed that this script fails if separator is tab or semicolon.. Indeed, it may needs fixes then.

(11 Dec '20, 08:33) LauriV ♦ LauriV's gravatar image

Dear Lauri, Yes, the separator of the sp_list.csv is comma.

(11 Dec '20, 09:23) wexxo wexxo's gravatar image

Dear Alex, pls send a subset of your data to us and we can check this. Thanks! Lauri

(11 Dec '20, 09:57) LauriV ♦ LauriV's gravatar image

Dear Alex, thanks for sharing your data. I fixed two issues in the R script:

1) Microsoft applications may add weird characters into the column names, and this was the case in your CSV (and you do not see this Excel nor NotePad++), see e.g. https://stackoverflow.com/questions/22974765/weird-characters-added-to-first-column-name-after-reading-a-toad-exported-csv-fi?rq=1

so name "family" was read as "ï..family". This is fixed in read.csv() line.

2) The code was only working with a list that contains at least one case where there is subspecies or variant name! This obvious design mistake is now fixed so that the script works when input data contains just "pure" species names.

Regards, Lauri

permanent link

answered 11 Dec '20, 14:42

LauriV's gravatar image

LauriV ♦
4002313
accept rate: 20%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×402
×70

question asked: 10 Dec '20, 12:00

question was seen: 120 times

last updated: 11 Dec '20, 14:42