Talk:Merge and aggregate datasets: Difference between revisions

Content added Content deleted

Inline

Revision as of 21:58, 7 December 2020

Duplication of task goals if not task name

So... this task is pretty much an exact duplicate of CSV data manipulation which has been around for 7+ years and has some 85 entries. Admittedly this task has slightly better defined goals and is less trivial, but a large percentage of the code from there could be lifted and used unchanged here.

Some overlap of tasks is inevitable, and honestly I think this one is probably more useful to demonstrate working with real-world data than the other. I hesitate to make any unilateral decisions (unlike with the recent deluge of "Find words containing whatever" tasks that we've been hit with,) but I also don't want to needlessly proliferate trivial variations. Thoughts? --Thundergnat (talk) 19:16, 7 December 2020 (UTC)

Missing fields in the CSV files. There might be a lot of overlap, but no "exact duplication", and handling of missing fields, although not highlighted, is a significant difference I think. --Paddy3118 (talk) 19:40, 7 December 2020 (UTC)

My motivation to submit this task was that I recently was working with R-script for the first time. I'm reasonably experienced with programming but had quite a hard time getting it to work.

The examples and tutorials on stackoverflow and other places are generally either too trivial, or too specific for one exact use-case. Merging, grouping and aggregating different datasets is a very common thing I encounter a lot for my work.

So that's why I submitted this task (after also asking here), and made sure to include the most common "hurdles", like missing records, missing values, multiple aggregator functions at once, working with date values and unorderd source files. --BdR (talk) 22:49, 7 December 2020 (UTC)

..."two datasets as provided in .csv files"...

Many examples don't read the csv from files. --Paddy3118 (talk) 19:42, 7 December 2020 (UTC)

Loading from .csv file is shortest code and more practical, but for quickly copying and testing the code examples the hard-coded data is easier. So, when possible, I try to include both, and then comment out the .csv load lines, see the Python example code. --BdR (talk) 21:58, 7 December 2020‎ (UTC)

Revision as of 21:58, 7 December 2020 (view source) rosettacode>BdR (→‎..."two datasets as provided in .csv files"...) ← Older edit		Revision as of 21:58, 7 December 2020 (view source) rosettacode>BdR (→‎..."two datasets as provided in .csv files"...) Newer edit →
Line 15:		Line 15:
	Many examples don't read the csv from files. --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 19:42, 7 December 2020 (UTC)		Many examples don't read the csv from files. --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 19:42, 7 December 2020 (UTC)

	::Loading from .csv file is shortest code and more practical, but for quickly copying and testing the code examples the hard-coded data is easier. So, when possible, I try to include both, and then comment out the .csv load lines, see the Python example code. --[[User:BdR\|BdR]] ([[User talk:BdR\|talk]]) 21:58, 7 December 2020‎ (UTC)		:Loading from .csv file is shortest code and more practical, but for quickly copying and testing the code examples the hard-coded data is easier. So, when possible, I try to include both, and then comment out the .csv load lines, see the Python example code. --[[User:BdR\|BdR]] ([[User talk:BdR\|talk]]) 21:58, 7 December 2020‎ (UTC)