Talk:Merge and aggregate datasets: Difference between revisions

Content added Content deleted

Inline

Revision as of 21:48, 7 December 2020

Duplication of task goals if not task name

So... this task is pretty much an exact duplicate of CSV data manipulation which has been around for 7+ years and has some 85 entries. Admittedly this task has slightly better defined goals and is less trivial, but a large percentage of the code from there could be lifted and used unchanged here.

Some overlap of tasks is inevitable, and honestly I think this one is probably more useful to demonstrate working with real-world data than the other. I hesitate to make any unilateral decisions (unlike with the recent deluge of "Find words containing whatever" tasks that we've been hit with,) but I also don't want to needlessly proliferate trivial variations. Thoughts? --Thundergnat (talk) 19:16, 7 December 2020 (UTC)

Missing fields in the CSV files. There might be a lot of overlap, but no "exact duplication", and handling of missing fields, although not highlighted, is a significant difference I think. --Paddy3118 (talk) 19:40, 7 December 2020 (UTC)

My motivation to submit this task was that I recently was working with R-script for the first time. I'm reasonably experienced with programming but had quite a hard time getting it to work.

The examples and tutorials on stackoverflow and other places are generally either too trivial, or too specific for one exact use-case. Merging, grouping and aggregating different datasets is a very common thing I encounter a lot for my work.

So that's why I submitted this task (after also asking here), and made sure to include the most common "hurdles", like missing records, missing values, multiple aggregator functions at once, working with date values and unorderd source files.

..."two datasets as provided in .csv files"...

Many examples don't read the csv from files. --Paddy3118 (talk) 19:42, 7 December 2020 (UTC)

@@ Line 5: / Line 5: @@
 :Missing fields in the CSV files. There might be a lot of overlap, but no "exact duplication", and handling of missing fields, although not highlighted, is a significant difference I think. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 19:40, 7 December 2020 (UTC)
+::My motivation to submit this task was that I recently was working with R-script for the first time. I'm reasonably experienced with programming but had quite a hard time getting it to work.
+::The examples and tutorials on stackoverflow and other places are generally either too trivial, or too specific for one exact use-case. Merging, grouping and aggregating different datasets is a very common thing I encounter a lot for my work.
+::So that's why I submitted this task (after also asking here), and made sure to include the most common "hurdles", like missing records, missing values, multiple aggregator functions at once, working with date values and unorderd source files.
 == ..."two datasets as provided in .csv files"... ==