Merge and aggregate datasets: Difference between revisions

← Older edit

Merge and aggregate datasets (view source)

Revision as of 01:52, 31 March 2024

55,957 bytes added , 2 months ago

Added FreeBASIC

Jjuanhdez

2,136

edits

Revision as of 16:06, 25 April 2021 (view source) Mark Clements (talk \| contribs) (Add Datalog task) ← Older edit		Latest revision as of 01:52, 31 March 2024 (view source) Jjuanhdez (talk \| contribs) (Added FreeBASIC)
(62 intermediate revisions by 14 users not shown)
Line 1: {{task\|Data Structures}}~~Merge and aggregate datasets~~ Merge and aggregate datasets ;Task: Merge and aggregate two datasets as provided in   '''.csv'''   files into a new ~~resuling~~resulting dataset. ~~Use the appropriate methods and data structures depending on the programming language. Use the most common libraries only when built-in functionality is not sufficient.<br>~~ ~~'''Note:''' Either load the data from the .csv files or create the required data structures hard-coded.~~ Use the appropriate methods and data structures depending on the programming language. ~~patients.csv file contents:~~ ~~<nowiki>~~ Use the most common libraries only when built-in functionality is not sufficient. ;Note: Either load the data from the   '''.csv'''   files or create the required data structures hard-coded. '''patients.csv'''   file contents: <pre> PATIENT_ID,LASTNAME 1001,Hopper Line 14 ⟶ 24: 2002,Gosling 5005,Kurtz </~~nowiki~~pre> ~~visits.csv file contents:~~ '''visits.csv'''   file contents: ~~<nowiki>PATIENT_ID,VISIT_DATE,SCORE~~ <pre> PATIENT_ID,VISIT_DATE,SCORE 2002,2020-09-10,6.8 1001,2020-09-17,5.5 Line 26 ⟶ 38: 4004,2020-11-05,7.0 1001,2020-11-19,5.3 </~~nowiki~~pre> Create a resulting dataset in-memory or output it to screen or file, whichever is appropriate for the programming language at hand. ~~Merge and group per patient id and last name, get the maximum visit date and get the sum and average of the scores per patient to get the resulting datasset.~~ ~~Note that the visit date is purposefully provided as ISO format, so that it could also be processed as text and sorted alphabetically to determine the maximum date.~~ Merge and group per patient id and last name,   get the maximum visit date,   and get the sum and average of the scores per patient to get the resulting dataset. ~~<nowiki>\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \|~~ Note that the visit date is purposefully provided as ISO format,   so that it could also be processed as text and sorted alphabetically to determine the maximum date. <pre> \| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \| \| 1001 \| Hopper \| 2020-11-19 \| 17.4 \| 5.80 \| \| 2002 \| Gosling \| 2020-10-08 \| 6.8 \| 6.80 \| Line 38 ⟶ 54: \| 4004 \| Wirth \| 2020-11-05 \| 15.4 \| 7.70 \| \| 5005 \| Kurtz \| \| \| \| </~~nowiki~~pre> ;Note: This task is aimed in particular at programming languages that are used in data science and data processing, such as F#, Python, R, SPSS, MATLAB etc. ;Related tasks: Line 48 ⟶ 66: * [[Read entire file]] * [[Read a file line by line]] <br><br> =={{header\|11l}}== {{trans\|Python: Stdlib csv only}} <syntaxhighlight lang="11l">V patients_csv = ‘PATIENT_ID,LASTNAME 1001,Hopper 4004,Wirth 3003,Kemeny 2002,Gosling 5005,Kurtz’ V visits_csv = ‘PATIENT_ID,VISIT_DATE,SCORE 2002,2020-09-10,6.8 1001,2020-09-17,5.5 4004,2020-09-24,8.4 2002,2020-10-08, 1001,,6.6 3003,2020-11-12, 4004,2020-11-05,7.0 1001,2020-11-19,5.3’ F csv2list(s) [[String]] rows L(row) s.split("\n") rows [+]= row.split(‘,’) R rows V patients = csv2list(patients_csv) V visits = csv2list(visits_csv) V result = copy(patients) result.sort_range(1..) result[0].append(‘LAST_VISIT’) V last = Dict(visits[1..], p_vis -> (p_vis[0], p_vis[1])) L(record) 1 .< result.len result[record].append(last.get(result[record][0], ‘’)) result[0] [+]= [‘SCORE_SUM’, ‘SCORE_AVG’] V n = Dict(patients[1..], p -> (p[0], 0)) V tot = Dict(patients[1..], p -> (p[0], 0.0)) L(record) visits[1..] V p = record[0] V score = record[2] I !score.empty n[p]++ tot[p] += Float(score) L(record) 1 .< result.len V p = result[record][0] I n[p] != 0 result[record] [+]= [‘#3.1’.format(tot[p]), ‘#2.2’.format(tot[p] / n[p])] E result[record] [+]= [‘’, ‘’] L(record) result print(‘\| ’record.map(r -> r.center(10)).join(‘ \| ’)‘ \|’)</syntaxhighlight> {{out}} <pre> \| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \| \| 1001 \| Hopper \| 2020-11-19 \| 17.4 \| 5.80 \| \| 2002 \| Gosling \| 2020-10-08 \| 6.8 \| 6.80 \| \| 3003 \| Kemeny \| 2020-11-12 \| \| \| \| 4004 \| Wirth \| 2020-11-05 \| 15.4 \| 7.70 \| \| 5005 \| Kurtz \| \| \| \| </pre> =={{header\|AutoHotkey}}== <~~lang~~syntaxhighlight ~~AutoHotkey~~lang="autohotkey">Merge_and_aggregate(patients, visits){ ID := [], LAST_VISIT := [], SCORE_SUM := [], VISIT := [] for i, line in StrSplit(patients, "`n", "`r"){ Line 74 ⟶ 161: output .= ID "`t" name "`t" LAST_VISIT[id] "`t" SCORE_SUM[id] "`t" SCORE_SUM[id]/VISIT[id] "`n" return output }</~~lang~~syntaxhighlight> Examples:<~~lang~~syntaxhighlight ~~AutoHotkey~~lang="autohotkey">patients = ( PATIENT_ID,LASTNAME Line 99 ⟶ 186: MsgBox % Merge_and_aggregate(patients, visits) return</~~lang~~syntaxhighlight> {{out}} <pre>PATIENT_ID LASTNAME LAST_VISIT SCORE_SUM SCORE_AVG Line 109 ⟶ 196: =={{header\|AWK}}== <syntaxhighlight lang="awk"> ~~<lang AWK>~~ # syntax: GAWK -f MERGE_AND_AGGREGATE_DATASETS.AWK RC-PATIENTS.CSV RC-VISITS.CSV # files may appear in any order Line 148 ⟶ 235: exit(0) } </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 160 ⟶ 247: =={{header\|C++}}== Uses C++20 <~~lang~~syntaxhighlight lang="cpp">#include <iostream> #include <optional> #include <ranges> Line 236 ⟶ 323: cout << " \|\n"; } }</~~lang~~syntaxhighlight> {{out}} Line 248 ⟶ 335: \| 5005 \| Kurtz \| \| \| \| </pre> =={{header\|C sharp}}== <syntaxhighlight lang="csharp">using System; using System.Collections.Generic; using System.Globalization; using System.Linq; using System.Runtime.Serialization; public static class MergeAndAggregateDatasets { public static void Main() { string patientsCsv = @" PATIENT_ID,LASTNAME 1001,Hopper 4004,Wirth 3003,Kemeny 2002,Gosling 5005,Kurtz"; string visitsCsv = @" PATIENT_ID,VISIT_DATE,SCORE 2002,2020-09-10,6.8 1001,2020-09-17,5.5 4004,2020-09-24,8.4 2002,2020-10-08, 1001,,6.6 3003,2020-11-12, 4004,2020-11-05,7.0 1001,2020-11-19,5.3"; string format = "yyyy-MM-dd"; var formatProvider = new DateTimeFormat(format).FormatProvider; var patients = ParseCsv( patientsCsv.Split(Environment.NewLine, StringSplitOptions.RemoveEmptyEntries), line => (PatientId: int.Parse(line[0]), LastName: line[1])); var visits = ParseCsv( visitsCsv.Split(Environment.NewLine, StringSplitOptions.RemoveEmptyEntries), line => ( PatientId: int.Parse(line[0]), VisitDate: DateTime.TryParse(line[1], formatProvider, DateTimeStyles.None, out var date) ? date : default(DateTime?), Score: double.TryParse(line[2], out double score) ? score : default(double?) ) ); var results = patients.GroupJoin(visits, p => p.PatientId, v => v.PatientId, (p, vs) => ( p.PatientId, p.LastName, LastVisit: vs.Max(v => v.VisitDate), ScoreSum: vs.Sum(v => v.Score), ScoreAvg: vs.Average(v => v.Score) ) ).OrderBy(r => r.PatientId); Console.WriteLine("\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \|"); foreach (var r in results) { Console.WriteLine($"\| {r.PatientId,-10} \| {r.LastName,-8} \| {r.LastVisit?.ToString(format) ?? "",-10} \| {r.ScoreSum,9} \| {r.ScoreAvg,9} \|"); } } private static IEnumerable<T> ParseCsv<T>(string[] contents, Func<string[], T> constructor) { for (int i = 1; i < contents.Length; i++) { var line = contents[i].Split(','); yield return constructor(line); } } }</syntaxhighlight> {{out}} <pre> \| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \| \| 1001 \| Hopper \| 2020-11-19 \| 17.4 \| 5.8 \| \| 2002 \| Gosling \| 2020-10-08 \| 6.8 \| 6.8 \| \| 3003 \| Kemeny \| 2020-11-12 \| 0 \| \| \| 4004 \| Wirth \| 2020-11-05 \| 15.4 \| 7.7 \| \| 5005 \| Kurtz \| \| 0 \| \|</pre> =={{header\|Datalog}}== The following implementation is for Souffle. ~~Missing~~Souffle ~~scores~~does ~~are~~not ~~represented~~currently byinclude ~~0.0/0.0~~dates and ~~missing~~times ~~dates~~in ~~are~~the ~~represented~~base ~~by empty symbols. Rather than combine the summaries~~distribution, ~~the~~but ~~date and~~it ~~score~~is ~~summaries~~straightforward ~~have~~to ~~been~~use ~~presented~~C ~~separately.~~timestamps: <syntaxhighlight lang="c">// datetime.cpp ~~<lang datalog>.decl Patient(id:number, lastname:symbol)~~ #include <ctime> #include <cstdint> extern "C" { int64_t from date(const char* string) { struct tm tmInfo = {0}; strptime(string, "%Y-%m-%d", &tmInfo); return mktime(&tmInfo); // localtime } }</syntaxhighlight> Rather than combine the summaries, the date and score summaries have been presented separately. <syntaxhighlight lang="prolog">/* rosetta.dl / #define NaN to_float("NaN") .functor from_date(date:symbol) : number .decl Patient(id:number, lastname:symbol) .decl Visit(id:number, date:symbol, score:float) ~~.decl nans(x:float)~~ .decl SummaryDates(id:number, lastname:symbol, last_date:symbol) .decl SummaryScores(id:number, lastname:symbol, score_sum:float, score_mean:float) .decl MissingDates(x:number) Patient(1001,"Hopper"). Patient(4004,"Wirth"). Line 265 ⟶ 450: Visit(1001,"2020-09-17",5.5). Visit(4004,"2020-09-24",8.4). Visit(2002,"2020-10-08",~~0.0/0.0~~NaN). Visit(1001,"",6.6). Visit(3003,"2020-11-12",~~0.0/0.0~~NaN). Visit(4004,"2020-11-05",7.0). Visit(1001,"2020-11-19",5.3). MissingDates(@from_date("")) :- true. ~~nans(0.0 / 0.0).~~ SummaryDates(id, lastname, last_date) :- Patient(id,lastname), ~~last_date_ord~~last_timestamp = max ~~ord(date)~~ts: {Visit(id, date, _), ~~date~~ts != ""@from_date(date), !MissingDates(ts)}, Visit(id, last_date, _), ~~ord(last_date)~~last_timestamp = ~~last_date_ord~~@from_date(last_date). SummaryScores(id, lastname, score_sum, score_mean) :- Patient(id,lastname), score_sum = sum score: {Visit(id, _, score), ~~!nans(~~score) != NaN}, score_mean = mean score: {Visit(id, _, score), ~~!nans(~~score) != NaN}. .output SummaryDates .output SummaryScores</~~lang~~syntaxhighlight> Then this is called using: <syntaxhighlight lang="bash">g++ -shared -fPIC datetime.cpp -o libfunctors.so souffle -D- rosetta.dl</syntaxhighlight> {{output}} <pre>--------------- SummaryScores id lastname score_sum score_mean =============== 1001 Hopper 17.399999999999999 5.7999999999999998 2002 Gosling 6.7999999999999998 6.7999999999999998 4004 Wirth 15.4 7.7000000000000002 =============== --------------- SummaryDates id lastname last_date =============== 1001 Hopper 2020-11-19 2002 Gosling 2020-10-08 3003 Kemeny 2020-11-12 4004 Wirth 2020-11-05 ===============</pre> =={{header\|FreeBASIC}}== {{trans\|C++}} <syntaxhighlight lang="vbnet">Type Patient ID As String LastName As String End Type Type Visit PatientID As String Fecha As String Score As Single HasScore As Integer End Type Dim As Patient patients(5) Dim As Visit visits(8) patients(1).ID = "1001": patients(1).LastName = "Hopper" patients(2).ID = "4004": patients(2).LastName = "Wirth" patients(3).ID = "3003": patients(3).LastName = "Kemeny" patients(4).ID = "2002": patients(4).LastName = "Gosling" patients(5).ID = "5005": patients(5).LastName = "Kurtz" visits(1).PatientID = "2002": visits(1).Fecha = "2020-09-10": visits(1).Score = 6.8: visits(1).HasScore = -1 visits(2).PatientID = "1001": visits(2).Fecha = "2020-09-17": visits(2).Score = 5.5: visits(2).HasScore = -1 visits(3).PatientID = "4004": visits(3).Fecha = "2020-09-24": visits(3).Score = 8.4: visits(3).HasScore = -1 visits(4).PatientID = "2002": visits(4).Fecha = "2020-10-08": visits(4).HasScore = 0 visits(5).PatientID = "1001": visits(5).Fecha = "" : visits(5).Score = 6.6: visits(5).HasScore = -1 visits(6).PatientID = "3003": visits(6).Fecha = "2020-11-12": visits(6).HasScore = 0 visits(7).PatientID = "4004": visits(7).Fecha = "2020-11-05": visits(7).Score = 7.0: visits(7).HasScore = -1 visits(8).PatientID = "1001": visits(8).Fecha = "2020-11-19": visits(8).Score = 5.3: visits(8).HasScore = -1 Print "\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \|" For i As Integer = 1 To 5 Dim As String lastVisit = "" Dim As Single sum = 0 Dim As Integer numScores = 0 For j As Integer = 1 To 8 If patients(i).ID = visits(j).PatientID Then If visits(j).HasScore Then sum += visits(j).Score numScores += 1 End If If visits(j).Fecha > lastVisit Then lastVisit = visits(j).Fecha End If End If Next j Print "\| "; patients(i).ID; " \| "; Print Using "\ \ \| \ \ \| "; patients(i).LastName; lastVisit; If numScores > 0 Then Print Using "#######.# \| #######.#"; sum; (sum / Csng(numScores)); Else Print " \| "; End If Print " \|" Next i Sleep</syntaxhighlight> {{out}} <pre>\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \| \| 1001 \| Hopper \| 2020-11-19 \| 17.4 \| 5.8 \| \| 4004 \| Wirth \| 2020-11-05 \| 15.4 \| 7.7 \| \| 3003 \| Kemeny \| 2020-11-12 \| \| \| \| 2002 \| Gosling \| 2020-10-08 \| 6.8 \| 6.8 \| \| 5005 \| Kurtz \| \| \| \|</pre> =={{header\|F_Sharp\|F#}}== Note that the scores are right justified to copy the task description. It would be more natural to leave them right justified. <~~lang~~syntaxhighlight lang="fsharp"> // Merge and aggregate datasets. Nigel Galloway: January 6th., 2021 let rFile(fName)=seq{use n=System.IO.File.OpenText(fName) Line 295 ⟶ 570: let fG n g=let z=G.[n]\|>Seq.sumBy(fun n->try float n.[2] with :? System.FormatException->0.0) fN n g (G.[n]\|>Seq.sort\|>Seq.last).[1] (if z=0.0 then "" else string z) (if z=0.0 then "" else string(z/(float(Seq.length G.[n])))) </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 310 ⟶ 585: =={{header\|Go}}== {{trans\|Wren}} <~~lang~~syntaxhighlight lang="go">package main import ( Line 447 ⟶ 722: } mergePrint(merges) }</~~lang~~syntaxhighlight> {{out}} Line 457 ⟶ 732: \| 4004 \| Wirth \| 2020-11-05 \| 15.4 \| 7.70 \| \| 5005 \| Kurtz \| \| \| \| </pre> =={{header\|Haskell}}== ===Reading and merging=== Merging of fields and databases is defined as a monoid operation for corresponding types. <syntaxhighlight lang="haskell">import Data.List import Data.Maybe import System.IO (readFile) import Text.Read (readMaybe) import Control.Applicative ((<\|>)) ------------------------------------------------------------ newtype DB = DB { entries :: [Patient] } deriving Show instance Semigroup DB where DB a <> DB b = normalize $ a <> b instance Monoid DB where mempty = DB [] normalize :: [Patient] -> DB normalize = DB . map mconcat . groupBy (\x y -> pid x == pid y) . sortOn pid ------------------------------------------------------------ data Patient = Patient { pid :: String , name :: Maybe String , visits :: [String] , scores :: [Float] } deriving Show instance Semigroup Patient where Patient p1 n1 v1 s1 <> Patient p2 n2 v2 s2 = Patient (fromJust $ Just p1 <\|> Just p2) (n1 <\|> n2) (v1 <\|> v2) (s1 <\|> s2) instance Monoid Patient where mempty = Patient mempty mempty mempty mempty ------------------------------------------------------------ readDB :: String -> DB readDB = normalize . mapMaybe readPatient . readCSV readPatient r = do i <- lookup "PATIENT_ID" r let n = lookup "LASTNAME" r let d = lookup "VISIT_DATE" r >>= readDate let s = lookup "SCORE" r >>= readMaybe return $ Patient i n (maybeToList d) (maybeToList s) where readDate [] = Nothing readDate d = Just d readCSV :: String -> [(String, String)] readCSV txt = zip header <$> body where header:body = splitBy ',' <$> lines txt splitBy ch = unfoldr go where go [] = Nothing go s = Just $ drop 1 <$> span (/= ch) s</syntaxhighlight> <pre>let patients = readDB <$> readFile "patients.csv" Main> let visits = readDB <$> readFile "visits.csv" Main> mapM_ print . entries =<< patients Patient {pid = "1001", name = Just "Hopper", visits = [], scores = []} Patient {pid = "2002", name = Just "Gosling", visits = [], scores = []} Patient {pid = "3003", name = Just "Kemeny", visits = [], scores = []} Patient {pid = "4004", name = Just "Wirth", visits = [], scores = []} Patient {pid = "5005", name = Just "Kurtz", visits = [], scores = []} Main> mapM_ print . entries =<< visits Patient {pid = "1001", name = Nothing, visits = ["2020-09-17","2020-11-19"], scores = [5.3,6.6,5.5]} Patient {pid = "2002", name = Nothing, visits = ["2020-09-10","2020-10-08"], scores = [6.8]} Patient {pid = "3003", name = Nothing, visits = ["2020-11-12"], scores = []} Patient {pid = "4004", name = Nothing, visits = ["2020-09-24","2020-11-05"], scores = [7.0,8.4]} Main> mapM_ print . entries =<< patients <> visits Patient {pid = "1001", name = Just "Hopper", visits = ["2020-09-17","2020-11-19"], scores = [5.3,6.6,5.5]} Patient {pid = "2002", name = Just "Gosling", visits = ["2020-09-10","2020-10-08"], scores = [6.8]} Patient {pid = "3003", name = Just "Kemeny", visits = ["2020-11-12"], scores = []} Patient {pid = "4004", name = Just "Wirth", visits = ["2020-09-24","2020-11-05"], scores = [7.0,8.4]} Patient {pid = "5005", name = Just "Kurtz", visits = [], scores = []}</pre> ===Pretty tabulation=== <syntaxhighlight lang="haskell"> tabulateDB (DB ps) header cols = intercalate "\|" <$> body where body = transpose $ zipWith pad width table table = transpose $ header : map showPatient ps showPatient p = sequence cols p width = maximum . map length <$> table pad n col = (' ' :) . take (n+1) . (++ repeat ' ') <$> col main = do a <- readDB <$> readFile "patients.csv" b <- readDB <$> readFile "visits.csv" mapM_ putStrLn $ tabulateDB (a <> b) header fields where header = [ "PATIENT_ID", "LASTNAME", "VISIT_DATE" , "SCORES SUM","SCORES AVG"] fields = [ pid , fromMaybe [] . name , \p -> case visits p of {[] -> []; l -> last l} , \p -> case scores p of {[] -> []; s -> show (sum s)} , \p -> case scores p of {[] -> []; s -> show (mean s)} ] mean lst = sum lst / genericLength lst</syntaxhighlight> <pre>Main> main PATIENT_ID \| LASTNAME \| VISIT_DATE \| SCORES SUM \| SCORES AVG 1001 \| Hopper \| 2020-11-19 \| 17.4 \| 5.7999997 2002 \| Gosling \| 2020-10-08 \| 6.8 \| 6.8 3003 \| Kemeny \| 2020-11-12 \| \| 4004 \| Wirth \| 2020-11-05 \| 15.4 \| 7.7 5005 \| Kurtz \| \| \| </pre> =={{header\|Harbour}}== Harbour does not have special values for NA or NaN, and missing numerical values are represented as zeros. In the following, we have used -999 for missing scores. <syntaxhighlight lang="xbase"> #xcommand INSERT INTO <table> ( <uField1>[, <uFieldN> ] ) VALUE ( <uVal1>[, <uValN> ] ) => ; <table>->(dbAppend()); <table>-><uField1> := <uVal1> [; <table>-><uFieldN> := <uValN>] #xcommand INSERT INTO <table> ( <uFieldList,...> ) VALUES ( <uValList1,...> ) [, ( <uValListN,...> )] => ; INSERT INTO <table> ( <uFieldList> ) VALUE ( <uValList1> ) ; [; INSERT INTO <table> ( <uFieldList> ) VALUE ( <uValListN> )] && Singular cases (so we can use VALUES for all instances) #xcommand INSERT INTO <table> (<uField>) VALUE (<uVal>) => ; <table>->(dbAppend()); <table>-><uField> := <uVal> #xcommand INSERT INTO <table> ( <uFieldList,...> ) VALUES ( <uValList,...> ) => ; INSERT INTO <table> ( <uFieldList> ) VALUE ( <uValList> ) PROCEDURE Main() LOCAL pStruct, vStruct, rStruct, xCurId, aAgg SET DATE FORMAT "yyyy-mm-dd" && create and populate the patient table pStruct := {{"patient_id", "n", 8, 0}, {"lastname", "c", 10, 0 }} dbCreate( "patient", pStruct,, .T., "patient" ) INSERT INTO patient (patient_id, lastname) VALUES (1001, "Hopper"), (4004, "Wirth"), ; (3003, "Kemeny"), (2002, "Gosling"), (5005, "Kurtz") INDEX ON patient_id TO pat_id && create and populate the visit table vStruct := {{"patient_id", "n", 8, 0}, {"visit_date", "d", 10, 0}, {"score", "n", 8, 1}} dbCreate( "visit", vStruct,, .T., "visit" ) INSERT INTO visit (patient_id, visit_date, score) VALUES (2002, ctod("2020-09-10"), 6.8), ; (1001, ctod("2020-09-17"), 5.5), (4004, ctod("2020-09-24"), 8.4), ; (2002, ctod("2020-10-08"), -999), (1001, ctod("1900-01-01"), 6.6), ; (3003, ctod("2020-11-12"), -999), (4004, ctod("2020-11-05"), 7.0), ; (1001, ctod("2020-11-19"), 5.3) INDEX ON patient_id TO visit_id && create the result table rStruct := { {"patient_id", "n", 8, 0}, ; {"n", "i", 8, 0}, {"sum_score", "n", 8, 1}, ; {"avg_score", "n", 8, 1}, {"max_date", "d", 10, 0}} dbCreate("report", rStruct,, .T., "report") SELECT visit DO WHILE ! Eof() xCurId := patient_id && grouping variable aAgg := {0, 0, 0.0, ctod("1900-01-01")} && initial values DO WHILE ! Eof() .AND. xCurId == patient_id aAgg := {1+aAgg[1], iif(score==-999,aAgg[2],1+aAgg[2]), ; iif(score==-999, aAgg[3], score+aAgg[3]), max(visit_date, aAgg[4])} && update SKIP ENDDO INSERT INTO report (patient_id, n, sum_score, avg_score, max_date) ; VALUES (xCurId, aAgg[1], aAgg[3], aAgg[3]/aAgg[2], aAgg[4]) ENDDO SELECT report INDEX ON patient_id TO report_id SELECT patient SET RELATION TO patient_id INTO report ? "NUM", "PATIENT_ID", "LASTNAME", "N", "SUM_SCORE", "AVG_SCORE", "MAX_DATE" LIST patient_id, lastname, report->n, report->sum_score, report->avg_score, report->max_date RETURN</syntaxhighlight> With output: <pre>NUM PATIENT_ID LASTNAME N SUM_SCORE AVG_SCORE MAX_DATE 1 1001 Hopper 3 17.4 5.8 2020-11-19 4 2002 Gosling 2 6.8 6.8 2020-10-08 3 3003 Kemeny 1 0.0 0.0 2020-11-12 2 4004 Wirth 2 15.4 7.7 2020-11-05 5 5005 Kurtz 0 0.0 0.0 - - </pre> =={{header\|J}}== One approach here would be to use [[j:Jd/Index\|Jd]] In other words, we can set things up like this: <syntaxhighlight lang="j">NB. setup: require'jd pacman' load JDP,'tools/csv_load.ijs' F=: jpath '~temp/rosettacode/example/CSV' jdcreatefolder_jd_ CSVFOLDER=: F assert 0<{{)n PATIENTID,LASTNAME 1001,Hopper 4004,Wirth 3003,Kemeny 2002,Gosling 5005,Kurtz }} fwrite F,'patients.csv' assert 0<{{)n PATIENTID,VISIT_DATE,SCORE 2002,2020-09-10,6.8 1001,2020-09-17,5.5 4004,2020-09-24,8.4 2002,2020-10-08, 1001,,6.6 3003,2020-11-12, 4004,2020-11-05,7.0 1001,2020-11-19,5.3 }} fwrite F,'visits.csv' csvprepare 'patients';F,'patients.csv' csvprepare 'visits';F,'visits.csv' csvload 'patients';1 csvload 'visits';1 jd'ref patients PATIENTID visits PATIENTID'</syntaxhighlight> And, then we can run our query: <syntaxhighlight lang="j">require'jd' echo jd {{)n reads PATIENT_ID: first p.PATIENTID, LASTNAME: first p.LASTNAME, LAST_VISIT: max v.VISIT_DATE, SCORE_SUM: sum v.SCORE, SCORE_AVG: avg v.SCORE by p.PATIENTID from p:patients, v:p.visits }} -.LF</syntaxhighlight> Which displays this result: <pre>┌───────────┬──────────┬────────┬──────────┬─────────┬─────────┐ │p.PATIENTID│PATIENT_ID│LASTNAME│LAST_VISIT│SCORE_SUM│SCORE_AVG│ ├───────────┼──────────┼────────┼──────────┼─────────┼─────────┤ │1001 │1001 │Hopper │2020-09-17│5.5 │5.5 │ │4004 │4004 │Wirth │2020-09-24│8.4 │8.4 │ │3003 │3003 │Kemeny │2020-11-12│ __ │ __ │ │2002 │2002 │Gosling │2020-09-10│6.8 │6.8 │ │5005 │5005 │Kurtz │? │ 0 │ 0 │ └───────────┴──────────┴────────┴──────────┴─────────┴─────────┘</pre> Another approach would be to use J's csv library: <syntaxhighlight lang="j">require'csv' patients=: fixcsv {{)n PATIENTID,LASTNAME 1001,Hopper 4004,Wirth 3003,Kemeny 2002,Gosling 5005,Kurtz }} visits=: fixcsv {{)n PATIENTID,VISIT_DATE,SCORE 2002,2020-09-10,6.8 1001,2020-09-17,5.5 4004,2020-09-24,8.4 2002,2020-10-08, 1001,,6.6 3003,2020-11-12, 4004,2020-11-05,7.0 1001,2020-11-19,5.3 }} task=: {{ P=. <@:>"1\|:/:~}.patients V=. <@:>"1\|:/:~}.visits id=. 0 {:: P nm=. 1 {:: P sel1=. (0 {:: P) e. 0 {:: V sel2=. (~.0 {:: V) e. 0 {:: P NB. unnecessary for this example exp=. sel1 #inv sel2 # ] agg=. /.(&.:".) vdt=. exp (0 {:: V) {:/. 1 {:: V sum=. exp ":,.(0 {:: V) +//. 0". 2 {:: V avg=. exp ":,.(0 {:: V) (+/%#)/. 0". 2 {:: V labels=. ;:'PATIENT_ID LASTNAME LAST_VISIT SCORE_SUM SCORE_AVG' labels,:id;nm;vdt;sum;avg }}</syntaxhighlight> Here: <syntaxhighlight lang="j"> task'' ┌──────────┬────────┬──────────┬─────────┬─────────┐ │PATIENT_ID│LASTNAME│LAST_VISIT│SCORE_SUM│SCORE_AVG│ ├──────────┼────────┼──────────┼─────────┼─────────┤ │1001 │Hopper │2020-11-19│17.4 │5.8 │ │2002 │Gosling │2020-10-08│ 6.8 │3.4 │ │3003 │Kemeny │2020-11-12│ 0 │ 0 │ │4004 │Wirth │2020-11-05│15.4 │7.7 │ │5005 │Kurtz │ │ │ │ └──────────┴────────┴──────────┴─────────┴─────────┘</syntaxhighlight> If the empty score in visits was a display concern, we might instead do it this way: <syntaxhighlight lang="j">task=: {{ P=. <@:>"1\|:/:~}.patients V=. <@:>"1\|:/:~}.visits id=. 0 {:: P nm=. 1 {:: P sel1=. (0 {:: P) e. 0 {:: V sel2=. (~.0 {:: V) e. 0 {:: P NB. unnecessary for this example exp=. sel1 #inv sel2 # ] agg=. /.(&.:".) vdt=. exp (0 {:: V) {:/. 1 {:: V sel3=. (0 {:: V) +.//. 2 @#@{::"1 }.visits exp2=: [:exp sel3 #inv sel3 #] sum=. exp2 ":,.(0 {:: V) +//. 0". 2 {:: V avg=. exp2 ":,.(0 {:: V) (+/%#)/. 0". 2 {:: V labels=. ;:'PATIENT_ID LASTNAME LAST_VISIT SCORE_SUM SCORE_AVG' labels,:id;nm;vdt;sum;avg }}</syntaxhighlight> Which gives us: <syntaxhighlight lang="j"> task'' ┌──────────┬────────┬──────────┬─────────┬─────────┐ │PATIENT_ID│LASTNAME│LAST_VISIT│SCORE_SUM│SCORE_AVG│ ├──────────┼────────┼──────────┼─────────┼─────────┤ │1001 │Hopper │2020-11-19│17.4 │5.8 │ │2002 │Gosling │2020-10-08│ 6.8 │3.4 │ │3003 │Kemeny │2020-11-12│ │ │ │4004 │Wirth │2020-11-05│15.4 │7.7 │ │5005 │Kurtz │ │ │ │ └──────────┴────────┴──────────┴─────────┴─────────┘</syntaxhighlight> =={{header\|Java}}== <syntaxhighlight lang="java"> import java.util.Arrays; import java.util.Collections; import java.util.Comparator; import java.util.DoubleSummaryStatistics; import java.util.List; public final class MergeAndAggregateDatasets { public static void main(String[] args) { List<Patient> patients = Arrays.asList( new Patient("1001", "Hopper"), new Patient("4004", "Wirth"), new Patient("3003", "Kemeny"), new Patient("2002", "Gosling"), new Patient("5005", "Kurtz") ); List<Visit> visits = Arrays.asList( new Visit("2002", "2020-09-10", 6.8), new Visit("1001", "2020-09-17", 5.5), new Visit("4004", "2020-09-24", 8.4), new Visit("2002", "2020-10-08", null), new Visit("1001", "" , 6.6), new Visit("3003", "2020-11-12", null), new Visit("4004", "2020-11-05", 7.0), new Visit("1001", "2020-11-19", 5.3) ); Collections.sort(patients, Comparator.comparing(Patient::patientID)); System.out.println("\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \|"); for ( Patient patient : patients ) { List<Visit> patientVisits = visits.stream().filter( v -> v.visitID == patient.patientID() ).toList(); String lastVisit = patientVisits.stream() .map( v -> v.visitDate ).max(Comparator.naturalOrder()).orElseGet( () -> " None " ); DoubleSummaryStatistics statistics = patientVisits.stream() .filter( v -> v.score != null ).mapToDouble(Visit::score).summaryStatistics(); double scoreSum = statistics.getSum(); double scoreAverage = statistics.getAverage(); String patientDetails = String.format("%12s%11s%13s%12.2f%12.2f", patient.patientID, patient.lastName, lastVisit, scoreSum, scoreAverage); System.out.println(patientDetails); } private static record Patient(String patientID, String lastName) {}; private static record Visit(String visitID, String visitDate, Double score) {}; } } </syntaxhighlight> {{ out }} <pre> \| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \| 1001 Hopper 2020-11-19 17.40 5.80 2002 Gosling 2020-10-08 6.80 6.80 3003 Kemeny 2020-11-12 0.00 0.00 4004 Wirth 2020-11-05 15.40 7.70 5005 Kurtz None 0.00 0.00 </pre> =={{header\|jq}}== {{works with\|jq}} '''Works with gojq, the Go implementation of jq''' In the context of jq, a relational dataset "table" is naturally represented as an array of JSON objects, each representing a row in the table. When displaying such a table, we will only show the constituent rows. Null values will be represented by JSON's `null`. '''Ingesting CSV data''' <syntaxhighlight lang="jq"> # objectify/1 takes an array of atomic values as inputs, and packages # these into an object with keys specified by the "headers" array and # values obtained by trimming string values, replacing empty strings # by null, and converting strings to numbers if possible. def objectify(headers): def tonumberq: tonumber? // .; def trimq: if type == "string" then sub("^ +";"") \| sub(" +$";"") else . end; def tonullq: if . == "" then null else . end; . as $in \| reduce range(0; headers\|length) as $i ({}; .[headers[$i]] = ($in[$i] \| trimq \| tonumberq \| tonullq) ); def csv2jsonHelper: .[0] as $headers \| reduce (.[1:][] \| select(length > 0) ) as $row ([]; . + [ $row\|objectify($headers) ]); </syntaxhighlight> '''Aggregation functions''' <syntaxhighlight lang="jq"> # output {LAST_VISIT} def LAST_VISIT($patient_id): {LAST_VISIT: (map(select( .PATIENT_ID == $patient_id).VISIT_DATE) \| max)}; # output {SCORE_SUM, SCORE_AVG} def SCORE_SUMMARY($patient_id): map(select( .PATIENT_ID == $patient_id).SCORE) \| {SCORE_SUM: add, count: length} \| {SCORE_SUM, SCORE_AVG: (if .SCORE_SUM and .count > 0 then .SCORE_SUM/.count else null end)}; </syntaxhighlight>'''The task'''<syntaxhighlight lang="jq"> # Read the two tables: INDEX($patients \| [splits("\n")] \| map(split(",")) \| csv2jsonHelper[]; .PATIENT_ID) as $patients \| ($visits \| [splits("\n")] \| map(split(",")) \| csv2jsonHelper) as $visits # Construct the new table: \| $visits \| map(.PATIENT_ID as $PATIENT_ID \| {$PATIENT_ID} + ($visits \| {LASTNAME: $patients[$PATIENT_ID\|tostring]} + LAST_VISIT($PATIENT_ID) + SCORE_SUMMARY($PATIENT_ID))) # ... but display it as a sequence of JSON objects \| .[] </syntaxhighlight> '''Invocation''' jq -Rnc --rawfile patients patients.csv --rawfile visits visits.csv -f program.jq {{out}} <pre> {"PATIENT_ID":2002,"LASTNAME":{"PATIENT_ID":2002,"LASTNAME":"Gosling"},"LAST_VISIT":"2020-10-08","SCORE_SUM":6.8,"SCORE_AVG":3.4} {"PATIENT_ID":1001,"LASTNAME":{"PATIENT_ID":1001,"LASTNAME":"Hopper"},"LAST_VISIT":"2020-11-19","SCORE_SUM":17.4,"SCORE_AVG":5.8} {"PATIENT_ID":4004,"LASTNAME":{"PATIENT_ID":4004,"LASTNAME":"Wirth"},"LAST_VISIT":"2020-11-05","SCORE_SUM":15.4,"SCORE_AVG":7.7} {"PATIENT_ID":2002,"LASTNAME":{"PATIENT_ID":2002,"LASTNAME":"Gosling"},"LAST_VISIT":"2020-10-08","SCORE_SUM":6.8,"SCORE_AVG":3.4} {"PATIENT_ID":1001,"LASTNAME":{"PATIENT_ID":1001,"LASTNAME":"Hopper"},"LAST_VISIT":"2020-11-19","SCORE_SUM":17.4,"SCORE_AVG":5.8} {"PATIENT_ID":3003,"LASTNAME":{"PATIENT_ID":3003,"LASTNAME":"Kemeny"},"LAST_VISIT":"2020-11-12","SCORE_SUM":null,"SCORE_AVG":null} {"PATIENT_ID":4004,"LASTNAME":{"PATIENT_ID":4004,"LASTNAME":"Wirth"},"LAST_VISIT":"2020-11-05","SCORE_SUM":15.4,"SCORE_AVG":7.7} {"PATIENT_ID":1001,"LASTNAME":{"PATIENT_ID":1001,"LASTNAME":"Hopper"},"LAST_VISIT":"2020-11-19","SCORE_SUM":17.4,"SCORE_AVG":5.8} </pre> =={{header\|Julia}}== {{trans\|Python}} <~~lang~~syntaxhighlight lang="julia">using CSV, DataFrames, Statistics # load data from csv files Line 500 ⟶ 1,261: end println(df_result) </~~lang~~syntaxhighlight>{{out}} <pre> 5×5 DataFrame Line 512 ⟶ 1,273: │ 5 │ 5005 │ Kurtz │ missing │ missing │ missing │ </pre> =={{header\|Mathematica}}/{{header\|Wolfram Language}}== <syntaxhighlight lang="mathematica">a = ImportString["PATIENT_ID,LASTNAME 1001,Hopper 4004,Wirth 3003,Kemeny 2002,Gosling 5005,Kurtz", "CSV"]; b = ImportString["PATIENT_ID,VISIT_DATE,SCORE 2002,2020-09-10,6.8 1001,2020-09-17,5.5 4004,2020-09-24,8.4 2002,2020-10-08, 1001,,6.6 3003,2020-11-12, 4004,2020-11-05,7.0 1001,2020-11-19,5.3", "CSV"]; a = <\|a[[1, 1]] -> #1, a[[1, 2]] -> #2\|> & @@@ Rest[a]; b = <\|b[[1, 1]] -> #1, b[[1, 2]] -> If[#2 != "", DateObject[#2], Missing[]], b[[1, 3]] -> If[#3 =!= "", #3, Missing[]]\|> & @@@ Rest[b]; j = JoinAcross[a, b, Key["PATIENT_ID"], "Outer"]; gr = GroupBy[j, #["PATIENT_ID"] &]; <\|"PATIENT_ID" -> #[[1, "PATIENT_ID"]], "LASTNAME" -> #[[1, "LASTNAME"]], "VISIT_DATE" -> If[DeleteMissing[#[[All, "VISIT_DATE"]]] =!= {}, Max@DeleteMissing[#[[All, "VISIT_DATE"]]], Missing[]], "SCORE_SUM" -> If[DeleteMissing@#[[All, "SCORE"]] =!= {}, Total@DeleteMissing@#[[All, "SCORE"]], Missing[]], "SCORE_AVG" -> If[DeleteMissing@#[[All, "SCORE"]] =!= {}, Mean@DeleteMissing@#[[All, "SCORE"]], Missing[]]\|> & /@ gr // Dataset</syntaxhighlight> {{out}} <pre> PATIENT_ID LASTNAME VISIT_DATE SCORE_SUM SCORE_AVG 2002 2002 Gosling Thu 8 Oct 2020 6.8 6.8 1001 1001 Hopper Thu 19 Nov 2020 17.4 5.8 4004 4004 Wirth Thu 5 Nov 2020 15.4 7.7 3003 3003 Kemeny Thu 12 Nov 2020 \[LongDash] \[LongDash] 5005 5005 Kurtz \[LongDash] \[LongDash] \[LongDash]</pre> =={{header\|Mercury}}== <syntaxhighlight lang="prolog">:- module rosetta. :- interface. :- import_module io. :- pred main(io::di, io::uo) is det. :- implementation. :- import_module list, int, float, string, maybe, solutions. :- pred patient(int::out, string::out) is multi. patient(1001, "Hopper"). patient(4004, "Wirth"). patient(3003, "Kemeny"). patient(2002, "Gosling"). patient(5005, "Kurtz"). :- func nan = float. nan = det_to_float("NaN"). :- type maybe_date ---> date(year::int, month::int, day::int); no. :- pred visit(int::out, maybe_date::out, float::out) is multi. visit(2002, date(2020,09,10), 6.8). visit(1001, date(2020,09,17), 5.5). visit(4004, date(2020,09,24), 8.4). visit(2002, date(2020,10,08), nan). visit(1001, no, 6.6). visit(3003, date(2020,11,12), nan). visit(4004, date(2020,11,05), 7.0). visit(1001, date(2020,11,19), 5.3). %% Utilities :- pred bag_aggr(pred(T)::(pred(out) is nondet), pred(T,U,U)::pred(in,in,out) is det, U::in, U::out) is det. :- pred bag_count(pred(T)::(pred(out) is nondet), int::out) is det. :- pred bag_sum(pred(float)::(pred(out) is nondet), float::out) is det. :- pred bag_avg(pred(float)::(pred(out) is nondet), float::out) is det. :- pred bag_max(pred(T)::(pred(out) is nondet), T::in, T::out) is det. :- pred bag_max_date(pred(maybe_date)::(pred(out) is nondet), maybe_date::out) is det. bag_aggr(Predicate, Aggregator, Initial, Result) :- promise_equivalent_solutions[Result] ( unsorted_aggregate(Predicate, Aggregator, Initial, Result)). bag_count(Predicate, Count) :- bag_aggr(Predicate, (pred(_X::in,Y::in,Z::out) is det :- Z = Y+1), 0, Count). bag_sum(Predicate, Sum) :- bag_aggr(Predicate, (pred(X::in,Y::in,Z::out) is det :- Z = X+Y), 0.0, Sum). bag_avg(Predicate, Avg) :- bag_count(Predicate, N), bag_sum(Predicate, Sum), (if N = 0 then Avg = nan else Avg = Sum/float(N)). bag_max(Predicate, Initial, Max) :- bag_aggr(Predicate, (pred(X::in,Y::in,Z::out) is det :- compare(R,X,Y), (if R = (>) then Z = X else Z = Y)), Initial, Max). bag_max_date(Predicate, MaxDate) :- bag_max(Predicate, date(0,0,0), MaxDate1), (if MaxDate1 = date(0,0,0) then MaxDate = no else MaxDate = MaxDate1). main(!IO) :- print_line("{Id, Lastname, SumScores, AvgScores, MaxDate}:", !IO), aggregate((pred({Id,Lastname,Sum,Avg,MaxDate}::out) is nondet :- patient(Id,Lastname), Scores = (pred(Score::out) is nondet :- visit(Id,_,Score), \+is_nan(Score)), bag_avg(Scores, Avg), bag_sum(Scores, Sum), Dates = (pred(Date::out) is nondet :- visit(Id,Date,_), Date\=no), bag_max_date(Dates, MaxDate)), print_line, !IO).</syntaxhighlight> {{out}} <pre>{Id, Lastname, SumScores, AvgScores, MaxDate}: {1001, "Hopper", 17.4, 5.8, date(2020, 11, 19)} {2002, "Gosling", 6.8, 6.8, date(2020, 10, 8)} {3003, "Kemeny", 0.0, nan, date(2020, 11, 12)} {4004, "Wirth", 15.4, 7.7, date(2020, 11, 5)} {5005, "Kurtz", 0.0, nan, no}</pre> =={{header\|Nim}}== ===CSV files and tables=== <~~lang~~syntaxhighlight ~~Nim~~lang="nim">import algorithm, parsecsv, strformat, strutils, tables const NoValue = -1.0 Line 567 ⟶ 1,441: let scoreSum = if count == 0: "" else: &"{sum:>4.1f}" let scoreAvg = if count == 0: "" else: &"{sum / count.toFloat: >4.2f}" echo &"\| {id:^10} \| {name:^10} \| {lastVisit:^10} \| {scoreSum:>7} \| {scoreAvg:>6} \|"</~~lang~~syntaxhighlight> {{out}} Line 580 ⟶ 1,454: {{trans\|Python}} We use the high level standard library "db_sqlite" rather than the low level one "sqlite3". <~~lang~~syntaxhighlight ~~Nim~~lang="nim">import parseCsv, db_sqlite, sequtils, strutils const FNames = ["patients1.csv", "patients2.csv"] Line 632 ⟶ 1,506: echo "\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \|" for row in result: echo "\| " & row.mapit(it.center(10)).join(" \| ") & '\|'</~~lang~~syntaxhighlight> {{out}} Line 643 ⟶ 1,517: =={{header\|Phix}}== <!--<syntaxhighlight lang="phix">(phixonline)--> ~~<lang Phix>constant patients_txt = split("""~~ <span style="color: #008080;">constant</span> <span style="color: #000000;">patients_txt</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">split</span><span style="color: #0000FF;">(</span><span style="color: #008000;">""" ~~PATIENT_ID,LASTNAME~~ PATIENT_ID,LASTNAME ~~1001,Hopper~~ 1001,Hopper ~~4004,Wirth~~ 4004,Wirth ~~3003,Kemeny~~ 3003,Kemeny ~~2002,Gosling~~ 2002,Gosling ~~5005,Kurtz""",'\n'),~~ 5005,Kurtz"""</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'\n'</span><span style="color: #0000FF;">),</span> ~~visits_txt = split("""~~ <span style="color: #000000;">visits_txt</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">split</span><span style="color: #0000FF;">(</span><span style="color: #008000;">""" ~~PATIENT_ID,VISIT_DATE,SCORE~~ PATIENT_ID,VISIT_DATE,SCORE ~~2002,2020-09-10,6.8~~ ~~1001~~ 2002,2020-09-1710,56.58 ~~4004~~ 1001,2020-09-2417,85.45 ~~2002~~ 4004,2020-1009-0824,8.4 2002,2020-10-08, ~~1001,,6.6~~ 1001,,6.6 ~~3003,2020-11-12,~~ ~~4004~~ 3003,2020-11-0512,~~7.0~~ ~~1001~~ 4004,2020-11-1905,57.~~3""",'\n')~~0 1001,2020-11-19,5.3"""</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'\n'</span><span style="color: #0000FF;">)</span> ~~--or, assuming these files contain exactly the same actual raw text,~~ <span style="color: #000080;font-style:italic;">--or, assuming these files contain exactly the same actual raw text, ~~-- and the use of GT_LF_STRIPPED is instead of above split('\n'):~~ --~~constant~~ ~~patients_txt~~ = ~~get_text("patients.csv",~~ and the use of GT_LF_STRIPPED is instead of above split('\n'),: --constant ~~visits_txt~~ patients_txt = get_text("~~visits~~patients.csv",GT_LF_STRIPPED), -- visits_txt = get_text("visits.csv",GT_LF_STRIPPED)</span> ~~function sap(string s) return split(s,',')&{"",0,0} end function~~ <span style="color: #008080;">function</span> <span style="color: #000000;">sap</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">return</span> <span style="color: #7060A8;">split</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span><span style="color: #008000;">','</span><span style="color: #0000FF;">,</span><span style="color: #004600;">false</span><span style="color: #0000FF;">)&{</span><span style="color: #008000;">""</span><span style="color: #0000FF;">,</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">0</span><span style="color: #0000FF;">}</span> <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> ~~function svp(string s) return split(s,',') end function~~ <span style="color: #008080;">function</span> <span style="color: #000000;">svp</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">return</span> <span style="color: #7060A8;">split</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span><span style="color: #008000;">','</span><span style="color: #0000FF;">,</span><span style="color: #004600;">false</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> ~~sequence patient_data = sort(apply(patients_txt[2..$],sap)),~~ <span style="color: #004080;">sequence</span> <span style="color: #000000;">patient_data</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sort</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">apply</span><span style="color: #0000FF;">(</span><span style="color: #000000;">patients_txt</span><span style="color: #0000FF;">[</span><span style="color: #000000;">2</span><span style="color: #0000FF;">..$],</span><span style="color: #000000;">sap</span><span style="color: #0000FF;">)),</span> ~~visit_data = sort_columns(apply(visits_txt[2..$],svp),{1,-2})~~ <span style="color: #000000;">visit_data</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sort_columns</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">apply</span><span style="color: #0000FF;">(</span><span style="color: #000000;">visits_txt</span><span style="color: #0000FF;">[</span><span style="color: #000000;">2</span><span style="color: #0000FF;">..$],</span><span style="color: #000000;">svp</span><span style="color: #0000FF;">),{</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,-</span><span style="color: #000000;">2</span><span style="color: #0000FF;">})</span> ~~visit_data = append(visit_data,{"","","0"}) -- (add a sentinel)~~ <span style="color: #000000;">visit_data</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">visit_data</span><span style="color: #0000FF;">,{</span><span style="color: #008000;">""</span><span style="color: #0000FF;">,</span><span style="color: #008000;">""</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"0"</span><span style="color: #0000FF;">})</span> <span style="color: #000080;font-style:italic;">-- (add a sentinel)</span> ~~string last_id = "",id,name,dt,scstr,avstr~~ <span style="color: #004080;">string</span> <span style="color: #000000;">last_id</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">""</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">id</span><span style="color: #0000FF;">,</span><span style="color: #000000;">name</span><span style="color: #0000FF;">,</span><span style="color: #000000;">dt</span><span style="color: #0000FF;">,</span><span style="color: #000000;">scstr</span><span style="color: #0000FF;">,</span><span style="color: #000000;">avstr</span> ~~atom score,score_total,average~~ <span style="color: #004080;">atom</span> <span style="color: #000000;">score</span><span style="color: #0000FF;">,</span><span style="color: #000000;">score_total</span><span style="color: #0000FF;">,</span><span style="color: #000000;">average</span> ~~integer visit_count = 0, pdx = 1~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">visit_count</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">pdx</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span> ~~for i=1 to length(visit_data) do~~ <span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">visit_data</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> ~~{id,dt,scstr} = visit_data[i]~~ <span style="color: #0000FF;">{</span><span style="color: #7060A8;">id</span><span style="color: #0000FF;">,</span><span style="color: #000000;">dt</span><span style="color: #0000FF;">,</span><span style="color: #000000;">scstr</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">visit_data</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> ~~score = iff(scstr=""?0:scanf(scstr,"%f")[1][1])~~ <span style="color: #000000;">score</span> <span style="color: #0000FF;">=</span> <span style="color: #008080;">iff</span><span style="color: #0000FF;">(</span><span style="color: #000000;">scstr</span><span style="color: #0000FF;">=</span><span style="color: #008000;">""</span><span style="color: #0000FF;">?</span><span style="color: #000000;">0</span><span style="color: #0000FF;">:</span><span style="color: #7060A8;">scanf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">scstr</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%f"</span><span style="color: #0000FF;">)[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">][</span><span style="color: #000000;">1</span><span style="color: #0000FF;">])</span> ~~if id!=last_id then~~ <span style="color: #008080;">if</span> <span style="color: #7060A8;">id</span><span style="color: #0000FF;">!=</span><span style="color: #000000;">last_id</span> <span style="color: #008080;">then</span> ~~if visit_count then~~ <span style="color: #008080;">if</span> <span style="color: #000000;">visit_count</span> <span style="color: #008080;">then</span> ~~average = score_total/visit_count~~ <span style="color: #000000;">average</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">score_total</span><span style="color: #0000FF;">/</span><span style="color: #000000;">visit_count</span> ~~patient_data[pdx][4..5] = {score_total,average}~~ <span style="color: #000000;">patient_data</span><span style="color: #0000FF;">[</span><span style="color: #000000;">pdx</span><span style="color: #0000FF;">][</span><span style="color: #000000;">4</span><span style="color: #0000FF;">..</span><span style="color: #000000;">5</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">score_total</span><span style="color: #0000FF;">,</span><span style="color: #000000;">average</span><span style="color: #0000FF;">}</span> ~~end if~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~if i=length(visit_data) then exit end if -- (sentinel)~~ <span style="color: #008080;">if</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">visit_data</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span> <span style="color: #008080;">exit</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> <span style="color: #000080;font-style:italic;">-- (sentinel)</span> ~~score_total = score~~ <span style="color: #000000;">score_total</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">score</span> ~~visit_count = (score!=0)~~ <span style="color: #000000;">visit_count</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">score</span><span style="color: #0000FF;">!=</span><span style="color: #000000;">0</span><span style="color: #0000FF;">)</span> ~~while id!=patient_data[pdx][1] do pdx += 1 end while~~ <span style="color: #008080;">while</span> <span style="color: #7060A8;">id</span><span style="color: #0000FF;">!=</span><span style="color: #000000;">patient_data</span><span style="color: #0000FF;">[</span><span style="color: #000000;">pdx</span><span style="color: #0000FF;">][</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">do</span> <span style="color: #000000;">pdx</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> <span style="color: #008080;">end</span> <span style="color: #008080;">while</span> ~~patient_data[pdx][3] = dt~~ <span style="color: #000000;">patient_data</span><span style="color: #0000FF;">[</span><span style="color: #000000;">pdx</span><span style="color: #0000FF;">][</span><span style="color: #000000;">3</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">dt</span> ~~last_id = id~~ <span style="color: #000000;">last_id</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">id</span> ~~elsif score!=0 then~~ <span style="color: #008080;">elsif</span> <span style="color: #000000;">score</span><span style="color: #0000FF;">!=</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span> ~~score_total += score~~ <span style="color: #000000;">score_total</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">score</span> ~~visit_count += 1~~ <span style="color: #000000;">visit_count</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> ~~end if~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end for~~ <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> ~~printf(1,"\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \|\n")~~ <span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \|\n"</span><span style="color: #0000FF;">)</span> ~~for i=1 to length(patient_data) do~~ <span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">patient_data</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> ~~{id,name,dt,score,average} = patient_data[i]~~ <span style="color: #0000FF;">{</span><span style="color: #7060A8;">id</span><span style="color: #0000FF;">,</span><span style="color: #000000;">name</span><span style="color: #0000FF;">,</span><span style="color: #000000;">dt</span><span style="color: #0000FF;">,</span><span style="color: #000000;">score</span><span style="color: #0000FF;">,</span><span style="color: #000000;">average</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">patient_data</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> ~~scstr = iff(score=0?"":sprintf("%4.1f",score))~~ <span style="color: #000000;">scstr</span> <span style="color: #0000FF;">=</span> <span style="color: #008080;">iff</span><span style="color: #0000FF;">(</span><span style="color: #000000;">score</span><span style="color: #0000FF;">=</span><span style="color: #000000;">0</span><span style="color: #0000FF;">?</span><span style="color: #008000;">""</span><span style="color: #0000FF;">:</span><span style="color: #7060A8;">sprintf</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"%4.1f"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">score</span><span style="color: #0000FF;">))</span> ~~avstr = iff(average=0?"":sprintf("%4.2f",average))~~ <span style="color: #000000;">avstr</span> <span style="color: #0000FF;">=</span> <span style="color: #008080;">iff</span><span style="color: #0000FF;">(</span><span style="color: #000000;">average</span><span style="color: #0000FF;">=</span><span style="color: #000000;">0</span><span style="color: #0000FF;">?</span><span style="color: #008000;">""</span><span style="color: #0000FF;">:</span><span style="color: #7060A8;">sprintf</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"%4.2f"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">average</span><span style="color: #0000FF;">))</span> ~~printf(1,"\| %-10s \| %-7s \| %10s \| %-9s \| %-9s \|\n",~~ <span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\| %-10s \| %-7s \| %10s \| %-9s \| %-9s \|\n"</span><span style="color: #0000FF;">,</span> ~~{id,name,dt,scstr,avstr})~~ <span style="color: #0000FF;">{</span><span style="color: #7060A8;">id</span><span style="color: #0000FF;">,</span><span style="color: #000000;">name</span><span style="color: #0000FF;">,</span><span style="color: #000000;">dt</span><span style="color: #0000FF;">,</span><span style="color: #000000;">scstr</span><span style="color: #0000FF;">,</span><span style="color: #000000;">avstr</span><span style="color: #0000FF;">})</span> ~~end for</lang>~~ <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> <!--</syntaxhighlight>--> {{out}} <pre> Line 713 ⟶ 1,589: Weird requirement: "Use the most common libraries only when built-in functionality is not sufficient." Not even a "use strict;" :) <~~lang~~syntaxhighlight lang="perl">#!/usr/bin/perl my $fmt = '\| %-11s' x 5 . "\|\n"; printf $fmt, qw( PATIENT_ID LASTNAME LAST_VISIT SCORE_SUM SCORE_AVG); Line 749 ⟶ 1,625: 3003,2020-11-12, 4004,2020-11-05,7.0 1001,2020-11-19,5.3</~~lang~~syntaxhighlight> {{out}} <pre> Line 759 ⟶ 1,635: \| 5005 \| Kurtz \| \| \| \| </pre> =={{header\|Prolog}}== Implemented using SWI Prolog: <syntaxhighlight lang="prolog">patient(1001,'Hopper'). patient(4004,'Wirth'). patient(3003,'Kemeny'). patient(2002,'Gosling'). patient(5005,'Kurtz'). visit(2002,'2020-09-10',6.8). visit(1001,'2020-09-17',5.5). visit(4004,'2020-09-24',8.4). visit(2002,'2020-10-08',nan). visit(1001,'',6.6). visit(3003,'2020-11-12',nan). visit(4004,'2020-11-05',7.0). visit(1001,'2020-11-19',5.3). summaryDates(Id, Lastname, LastDate) :- aggregate(max(Ts), Score^Date^(visit(Id, Date, Score), Date \= '', parse_time(Date, iso_8601, Ts)), MaxTs), format_time(atom(LastDate), '%Y-%m-%d', MaxTs), patient(Id,Lastname). summaryScores(Id, Lastname, Sum, Mean) :- aggregate(r(sum(Score),count), Date^(visit(Id, Date, Score), Score \= nan), r(Sum,Count)), patient(Id,Lastname), Mean is Sum/Count. test :- summaryDates(Id, Lastname, LastDate), writeln(summaryDates(Id, Lastname, LastDate)), fail. test :- summaryScores(Id, Lastname, ScoreSum, ScoreMean), writeln(summaryScores(Id, Lastname, ScoreSum, ScoreMean)), fail.</syntaxhighlight> {{output}} <pre>summaryDates(1001,Hopper,2020-11-19) summaryDates(2002,Gosling,2020-10-08) summaryDates(3003,Kemeny,2020-11-12) summaryDates(4004,Wirth,2020-11-05) summaryScores(1001,Hopper,17.4,5.8) summaryScores(2002,Gosling,6.8,6.8) summaryScores(4004,Wirth,15.4,7.7) false.</pre> Implemented using XSB Prolog (which allows for user-defined aggregates): <syntaxhighlight lang="prolog">:- import bagMax/2, bagCount/2, bagSum/2, bagReduce/4 from aggregs. :- import julian_date/7, date_string/3 from iso8601. :- import load_csv/2, add_cvt_type_hook/2 from proc_files. ?- add_cvt_type_hook(date,date_converter(_,_)). date_converter(Atom,Date) :- date_string('YYYY-MM-DD',Date,Atom). :- load_csv('visit.csv',visit(integer,date,float)). :- load_csv('patient.csv',patient(integer,atom)). is_nan(Number) :- X is Number, X =\= Number. summaryDates(Id, Lastname, LastDate) :- bagMax(date_number(Id), LastDateNumber), patient(Id,Lastname), julian_date(LastDateNumber, Y, M, D, _, _, _), date_converter(LastDate, date(Y,M,D)). summaryScores(Id, Lastname, Sum, Mean) :- bagSum(scores(Id), Sum), bagCount(scores(Id), Count), Mean is Sum/Count, patient(Id,Lastname). test :- summaryDates(Id,Lastname,LastDate), writeln(summaryDates(Id,Lastname,LastDate)), fail. test :- summaryScores(Id, Lastname, ScoreSum, ScoreMean), writeln(summaryScores(Id, Lastname, ScoreSum, ScoreMean)), fail. / Put hilog declarations together / date_number(Id)(Number) :- visit(Id, date(Y,M,D), _), julian_date(Number, Y, M, D, _, _, _). scores(Id)(Score) :- visit(Id, _, Score), \+is_nan(Score). :- hilog maximum. maximum(X,Y,Z) :- X @> Y -> Z=X ; Z=Y. :- hilog sum. sum(X,Y,Z) :- Z is X+Y. :- hilog successor. successor(X,_Y,Z) :- Z is X+1.</syntaxhighlight> =={{header\|PureBasic}}== <syntaxhighlight lang="purebasic">Structure Person Name$ EndStructure Structure Visits Datum$ Score$ EndStructure Structure Merge Patient.Person List PVisit.Visits() EndStructure NewMap P.Merge() NewList ID$() If ReadFile(1,"./Data/patients.csv")=0 : End 1 : EndIf header=1 While Not Eof(1) buf1$=ReadString(1) If header=1 : header=0 : Continue : EndIf bufId$=StringField(buf1$,1,",") P(bufId$)\Patient\Name$=StringField(buf1$,2,",") AddElement(ID$()) : ID$()=bufId$ Wend CloseFile(1) If ReadFile(2,"./Data/visits.csv")=0 : End 2 : EndIf header=1 While Not Eof(2) buf1$=ReadString(2) If header=1 : header=0 : Continue : EndIf bufId$=StringField(buf1$,1,",") AddElement(P(bufId$)\PVisit()) P(bufId$)\PVisit()\Datum$=StringField(buf1$,2,",") P(bufId$)\PVisit()\Score$=StringField(buf1$,3,",") Wend CloseFile(2) If OpenConsole()=0 : End 3 : EndIf SortList(ID$(),#PB_Sort_Ascending) PrintN("\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \|") ForEach ID$() Print("\| "+LSet(ID$(),11)) Print("\| "+LSet(P(ID$())\Patient\Name$,9)+"\|") SortStructuredList(P(ID$())\PVisit(),#PB_Sort_Ascending,OffsetOf(Visits\Datum$),TypeOf(Visits\Datum$)) ForEach P(ID$())\PVisit() scs.f+ValF(p(ID$())\PVisit()\Score$) : c+Bool(ValF(p(ID$())\PVisit()\Score$)) Next If LastElement(P(ID$())\PVisit()) sca.f=scs/c Print(" "+LSet(P(ID$())\PVisit()\Datum$,10)+" \|") Print(RSet(StrF(scs,1),10)+" \|") If Not IsNAN(sca) : Print(RSet(StrF(sca,2),10)+" \|") : Else : Print(Space(11)+"\|") : EndIf Else Print(Space(12)+"\|"+Space(11)+"\|"+Space(11)+"\|") EndIf PrintN("") : scs=0 : c=0 Next Input()</syntaxhighlight> {{out}} <pre>\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \| \| 1001 \| Hopper \| 2020-11-19 \| 17.4 \| 5.80 \| \| 2002 \| Gosling \| 2020-10-08 \| 6.8 \| 6.80 \| \| 3003 \| Kemeny \| 2020-11-12 \| 0.0 \| \| \| 4004 \| Wirth \| 2020-11-05 \| 15.4 \| 7.70 \| \| 5005 \| Kurtz \| \| \| \|</pre> =={{header\|Python}}== ===Python: Using pandas library=== <~~lang~~syntaxhighlight ~~Python~~lang="python"># to install pandas library go to cmd prompt and type: # cd %USERPROFILE%\AppData\Local\Programs\Python\Python38-32\Scripts\ # pip install pandas Line 807 ⟶ 1,850: df_result = df_group.agg({'VISIT_DATE': 'max', 'SCORE': [lambda x: x.sum(min_count=1),'mean']}) print(df_result)</~~lang~~syntaxhighlight> <pre> PATIENT_ID LASTNAME LAST_VISIT SCORE Line 821 ⟶ 1,864: Using only standard libraries and input from csv files. <~~lang~~syntaxhighlight lang="python">import csv fnames = 'patients.csv patients_visits.csv'.split() Line 856 ⟶ 1,899: #%% for record in result: print(f"\| {' \| '.join(f'{r:^10}' for r in record)} \|")</~~lang~~syntaxhighlight> {{out}} Line 871 ⟶ 1,914: (The SQL SELECT statement is modelled on that of the SQL entry elsewhere on this page). <~~lang~~syntaxhighlight lang="python">import sqlite3 import csv Line 935 ⟶ 1,978: result = join_tables_and_group(conn) for record in result: print(f"\| {' \| '.join(f'{str(r):^10}' for r in record)} \|")</~~lang~~syntaxhighlight> {{out}} Line 946 ⟶ 1,989: =={{header\|R}}== <~~lang~~syntaxhighlight Rlang="r"># load data from csv files # setwd("C:\Temp\csv\") # df_patient <- read.csv(file="patients.csv", header = TRUE, sep = ",") Line 987 ⟶ 2,030: df_result <- merge(df_patient, df_agg, by = 'PATIENT_ID', all.x = TRUE) print(df_result)</~~lang~~syntaxhighlight> {{out}} <pre> Line 999 ⟶ 2,042: =={{header\|Raku}}== <syntaxhighlight lang="raku" ~~perl6~~line>my @names = map { ( <PATIENT_ID LASTNAME> Z=> .list ).hash }, ( 1001, 'Hopper' ), ( 4004, 'Wirth' ), Line 1,035 ⟶ 2,078: my @out_field_names = <PATIENT_ID LASTNAME LAST_VISIT SCORE_SUM SCORE_AVG>; my @rows = @result.sort(.<PATIENT_ID>).map(.{@out_field_names}); say .map({$_ // ''}).fmt('%-10s', ' \| ') for @out_field_names, \|@rows;</~~lang~~syntaxhighlight> {{out}} <pre> Line 1,047 ⟶ 2,090: =={{header\|REXX}}== <~~lang~~syntaxhighlight lang="rexx">/ REXX / patients='patients.csv' l=linein(patients) Line 1,127 ⟶ 2,170: swl=swl wa.i End Return strip(swl)</~~lang~~syntaxhighlight> {{out}} <pre>\| PATIENT_ID \| LASTNAME \| LAST_VISIT \| SCORE_SUM \| SCORE_AVG \| Line 1,135 ⟶ 2,178: \| 4004 \| Wirth \| 2020-11-05 \| 15.4 \| 7.7 \| \| 5005 \| Kurtz \| \| \| \|</pre> =={{header\|SAS}}== <syntaxhighlight lang="sas"> %let datefmt=E8601DA10.; data patient; infile "patient.csv" dsd dlm=','; attrib id length=4 lastname length=$10; input id lastname; data visit; infile "visit.csv" dsd dlm=','; attrib id length=4 date informat=&datefmt format=&datefmt score length=8; input id date score; proc sql; select from (select id, max(date) format=&datefmt as max_date, sum(score) as sum_score, avg(score) as avg_score from visit group by id) natural right join patient order by id;</syntaxhighlight> {{out}} <pre>: id lastname max_date sum_score avg_score : ------------------------------------------------------ : 1001 Hopper 2020-11-19 17.4 5.8 : 2002 Gosling 2020-10-08 6.8 6.8 : 3003 Kemeny 2020-11-12 . . : 4004 Wirth 2020-11-05 15.4 7.7 : 5005 Kurtz . . .</pre> =={{header\|SPSS}}== Line 1,140 ⟶ 2,212: Note: the PATIENT_ID 5005 is missing from the result, because MATCH FILES cannot add rows for missing keys, i.e. AFAIK a "left join" is not possible is SPSS. <~~lang~~syntaxhighlight ~~SPSS~~lang="spss">* set working directory to location of .csv files CD 'C:\Temp\csv\'. Line 1,205 ⟶ 2,277: /last_visit = MAX(VISIT_DATE) /score_avg = MEAN(SCORE) /score_sum = SUM(SCORE).</~~lang~~syntaxhighlight> {{out}} Line 1,218 ⟶ 2,290: =={{header\|SQL}}== <~~lang~~syntaxhighlight ~~SQL~~lang="sql">-- drop tables DROP TABLE IF EXISTS tmp_patients; DROP TABLE IF EXISTS tmp_visits; Line 1,276 ⟶ 2,348: p.LASTNAME ORDER BY p.PATIENT_ID;</~~lang~~syntaxhighlight> {{out}} <pre> Line 1,286 ⟶ 2,358: 5005 Kurtz NULL NULL NULL </pre> =={{header\|Transd}}== <syntaxhighlight lang="scheme">#lang transd MainModule: { tbl: `1001,Hopper 4004,Wirth 3003,Kemeny 2002,Gosling 5005,Kurtz`, tbl1: ` 2002,2020-09-10,6.8 1001,2020-09-17,5.5 4004,2020-09-24,8.4 2002,2020-10-08, 1001,,6.6 3003,2020-11-12, 4004,2020-11-05,7.0 1001,2020-11-19,5.3`, cols: `@key_PATIENT_ID:Int, LASTNAME:String, VISIT_DATE:DateTime, SCORE:Double, SCORE_AVG:Double, NUM_VISITS:Int`, Record : typealias(Tuple<Int DateTime Double>()), _start: (λ (with base TSDBase() (load-table base tbl colNames: cols) (build-index base "PATIENT_ID") (with vizs Vector<Record>() (load-table vizs tbl1 :mixedTypes fieldSep: "," rowSep: "\n" ) (for viz in vizs do (tsd-query base :update set: (lambda PATIENT_ID Int() VISIT_DATE DateTime() SCORE Double() SCORE_AVG Double() NUM_VISITS Int() (+= NUM_VISITS 1) (set VISIT_DATE (get viz 1)) (set SCORE (+ SCORE (get viz 2))) (set SCORE_AVG (/ SCORE NUM_VISITS))) where: (lambda PATIENT_ID Int() (eq PATIENT_ID (get viz 0)))) )) (with cols ["PATIENT_ID","LASTNAME","VISIT_DATE","SCORE","SCORE_AVG"] (with recs (tsd-query base select: cols as: [[Int(), String(), DateTime(), Double(), Double()]] where: (lambda PATIENT_ID Int() true) ) (for i in cols do (textout width: 10 i "\|")) (lout "") (for rec in recs do (for-each rec (λ i :Data() (textout width: 10 i "\|" ))) (lout "")) )) )) }</syntaxhighlight>{{out}} <pre> PATIENT_ID\| LASTNAME\|VISIT_DATE\| SCORE\| SCORE_AVG\| 1001\| Hopper\|2020-11-19\| 17.4\| 5.8\| 4004\| Wirth\| 2020-11-5\| 15.4\| 7.7\| 3003\| Kemeny\|2020-11-12\| 0\| 0\| 2002\| Gosling\| 2020-10-8\| 6.8\| 3.4\| 5005\| Kurtz\| \| 0\| 0\| </pre> =={{header\|TutorialD}}== <syntaxhighlight lang="text">BEGIN; TYPE Date UNION; TYPE DateValid IS {Date POSSREP {year INTEGER, month INTEGER, day INTEGER}}; TYPE DateNone IS {Date POSSREP {}}; TYPE DateUnknown IS {Date POSSREP {}}; END; VAR patient REAL RELATION {id INT, lastname CHAR} KEY {id}; INSERT patient RELATION {TUPLE {id 1001, lastname 'Hopper'}, TUPLE {id 4004, lastname 'Wirth'}, TUPLE {id 3003, lastname 'Kemeny'}, TUPLE {id 2002, lastname 'Gosling'}, TUPLE {id 5005, lastname 'Kurtz'} }; VAR visit REAL RELATION {id INT, date Date, score RATIONAL} KEY {id, date}; INSERT visit RELATION { TUPLE {id 2002, date DateValid(2020,09,10), score 6.8}, TUPLE {id 1001, date DateValid(2020,09,17), score 5.5}, TUPLE {id 4004, date DateValid(2020,09,24), score 8.4}, TUPLE {id 2002, date DateValid(2020,10,08), score NAN}, TUPLE {id 1001, date DateNone(), score 6.6}, TUPLE {id 3003, date DateValid(2020,11,12), score NAN}, TUPLE {id 4004, date DateValid(2020,11,05), score 7.0}, TUPLE {id 1001, date DateValid(2020,11,19), score 5.3} }; ((SUMMARIZE (visit WHERE score>0.0) BY {id}: {sumscore := SUM(score), avgscore := AVG(score)}) UNION (EXTEND (patient {id} MINUS ((visit WHERE score>0.0) {id})): {sumscore := NaN, avgscore := NaN})) JOIN (SUMMARIZE visit BY {id}: {maxdate := MAX(date)} UNION (EXTEND (patient {id} MINUS (visit {id})): {maxdate := DateUnknown()})) JOIN patient</syntaxhighlight> {{out}} <pre> id sumscore avgscore maxdate lastname 1001 17.4 5.8 DateValid(2020, 11, 19) Hopper 2002 6.8 6.8 DateValid(2020, 10, 8) Gosling 4004 15.4 7.7 DateValid(2020, 11, 5) Wirth 3003 NaN NaN DateValid(2020, 11, 12) Kemeny 5005 NaN NaN DateUnknown() Kurtz</pre> =={{header\|Wren}}== {{libheader\|Wren-sort}} {{libheader\|Wren-fmt}} <~~lang~~syntaxhighlight ~~ecmascript~~lang="wren">import "./fmt" for Fmt import "./sort" for Sort class Patient { Line 1,385 ⟶ 2,572: var merges = Patient.ids.map { \|id\| Merge.new(id) }.toList Merge.print(merges)</~~lang~~syntaxhighlight> {{out}}