Text processing/2: Difference between revisions

From Rosetta Code
Content added Content deleted
(Added zkl)
m (→‎{{header|Wren}}: Minor tidy)
 
(48 intermediate revisions by 25 users not shown)
Line 1: Line 1:
{{task|Text processing}}
{{task|Text processing}}
The following data shows a few lines from the file readings.txt (as used in the [[Data Munging]] task).


The data comes from a pollution monitoring station with twenty four instruments monitoring twenty four aspects of pollution in the air. Periodically a record is added to the file constituting a line of 49 white-space separated fields, where white-space can be one or more space or tab characters.
The following task concerns data that came from a pollution monitoring station with twenty-four instruments monitoring twenty-four aspects of pollution in the air. Periodically a record is added to the file, each record being a line of 49 fields separated by white-space, which can be one or more space or tab characters.


The fields (from the left) are:
The fields (from the left) are:
DATESTAMP [ VALUEn FLAGn ] * 24
DATESTAMP [ VALUEn FLAGn ] * 24
i.e. a datestamp followed by twenty four repetitions of a floating point instrument value and that instruments associated integer flag. Flag values are >= 1 if the instrument is working and < 1 if there is some problem with that instrument, in which case that instrument's value should be ignored.
i.e. a datestamp followed by twenty-four repetitions of a floating-point instrument value and that instrument's associated integer flag. Flag values are >= 1 if the instrument is working and < 1 if there is some problem with it, in which case that instrument's value should be ignored.

A sample from the full data file [http://rosettacode.org/resources/readings.zip readings.txt] is:
A sample from the full data file [http://rosettacode.org/resources/readings.zip readings.txt], which is also used in the [[Text processing/1]] task, follows:

<pre style="height:17ex;overflow:scroll">
Data is no longer available at that link. Zipped mirror available [https://github.com/thundergnat/rc/blob/master/resouces/readings.zip here]
<pre>
1991-03-30 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1
1991-03-30 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1
1991-03-31 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 20.000 1 20.000 1 20.000 1 35.000 1 50.000 1 60.000 1 40.000 1 30.000 1 30.000 1 30.000 1 25.000 1 20.000 1 20.000 1 20.000 1 20.000 1 20.000 1 35.000 1
1991-03-31 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 10.000 1 20.000 1 20.000 1 20.000 1 35.000 1 50.000 1 60.000 1 40.000 1 30.000 1 30.000 1 30.000 1 25.000 1 20.000 1 20.000 1 20.000 1 20.000 1 20.000 1 35.000 1
Line 18: Line 19:
</pre>
</pre>


;Task:
The task:
# Confirm the general field format of the file
# Confirm the general field format of the file.
# Identify any DATESTAMPs that are duplicated.
# Identify any DATESTAMPs that are duplicated.
# What number of records have good readings for all instruments.
# Report the number of records that have good readings for all instruments.
<br><br>

=={{header|11l}}==
{{trans|Python}}

<syntaxhighlight lang="11l">V debug = 0B
V datePat = re:‘\d{4}-\d{2}-\d{2}’
V valuPat = re:‘[-+]?\d+\.\d+’
V statPat = re:‘-?\d+’
V totalLines = 0
Set[String] dupdate
Set[String] badform
Set[String] badlen
V badreading = 0
Set[String] datestamps

L(line) File(‘readings.txt’).read().rtrim("\n").split("\n")
totalLines++
V fields = line.split("\t")
V date = fields[0]
V pairs = (1 .< fields.len).step(2).map(i -> (@fields[i], @fields[i + 1]))

V lineFormatOk = datePat.match(date)
& all(pairs.map(p -> :valuPat.match(p[0])))
& all(pairs.map(p -> :statPat.match(p[1])))
I !lineFormatOk
I debug
print(‘Bad formatting ’line)
badform.add(date)

I pairs.len != 24 | any(pairs.map(p -> Int(p[1]) < 1))
I debug
print(‘Missing values ’line)
I pairs.len != 24
badlen.add(date)
I any(pairs.map(p -> Int(p[1]) < 1))
badreading++

I date C datestamps
I debug
print(‘Duplicate datestamp ’line)
dupdate.add(date)

datestamps.add(date)

print("Duplicate dates:\n "sorted(Array(dupdate)).join("\n "))
print("Bad format:\n "sorted(Array(badform)).join("\n "))
print("Bad number of fields:\n "sorted(Array(badlen)).join("\n "))
print("Records with good readings: #. = #2.2%\n".format(
totalLines - badreading, (totalLines - badreading) / Float(totalLines) * 100))
print(‘Total records: ’totalLines)</syntaxhighlight>

{{out}}
<pre>
Duplicate dates:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
Bad format:

Bad number of fields:

Records with good readings: 5017 = 91.70%

Total records: 5471
</pre>


=={{header|Ada}}==
=={{header|Ada}}==
{{libheader|Simple components for Ada}}
{{libheader|Simple components for Ada}}
<lang ada>with Ada.Calendar; use Ada.Calendar;
<syntaxhighlight lang="ada">with Ada.Calendar; use Ada.Calendar;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Text_IO; use Ada.Text_IO;
with Strings_Edit; use Strings_Edit;
with Strings_Edit; use Strings_Edit;
Line 87: Line 156:
Close (File);
Close (File);
Put_Line ("Valid records " & Image (Count) & " of " & Image (Line_No) & " total");
Put_Line ("Valid records " & Image (Count) & " of " & Image (Line_No) & " total");
end Data_Munging_2;</lang>
end Data_Munging_2;</syntaxhighlight>
Sample output
Sample output
<pre>
<pre>
Line 99: Line 168:


=={{header|Aime}}==
=={{header|Aime}}==
<syntaxhighlight lang="aime">check_format(list l)
<lang aime>void
check_format(list l)
{
{
integer i;
integer i;
text s;
text s;


if (l_length(l) != 49) {
if (~l != 49) {
error("wrong number of fields");
error("bad field count");
}
}


s = lf_q_text(l);
s = l[0];
if (length(s) != 10 || character(s, 4) != '-' || character(s, 7) != '-') {
if (match("????-??-??", s)) {
error("bad date format");
error("bad date format");
}
}
atoi(delete(delete(s, 7), 4));
l[0] = s.delete(7).delete(4).atoi;


i = 1;
i = 1;
while (i < 49) {
while (i < 49) {
l_r_real(l, i, atof(l_q_text(l, i)));
atof(l[i]);
i += 1;
i += 1;
l_r_integer(l, i, atoi(l_q_text(l, i)));
l[i >> 1] = atoi(l[i]);
i += 1;
i += 1;
}
}

l.erase(25, -1);
}
}


integer
main(void)
main(void)
{
{
integer goods;
integer goods, i, v;
file f;
file f;
list l;
list l;
record r;
index x;


goods = 0;
goods = 0;


f_affix(f, "readings.txt");
f.affix("readings.txt");


while (f_list(f, l, 0) != -1) {
while (f.list(l, 0) != -1) {
if (!trap(check_format, l)) {
if (!trap(check_format, l)) {
if (r_key(r, l_head(l))) {
if ((x[v = lf_x_integer(l)] += 1) != 1) {
v_text(cat3("duplicate ", l_head(l), " line\n"));
v_form("duplicate ~ line\n", v);
} else {
integer i;

r_put(r, l_head(l), 0);
i = 2;
while (i < 49) {
if (l_q_integer(l, i) != 1) {
break;
}
i += 2;
}
if (49 < i) {
goods += 1;
}
}
}

i = 1;
l.ucall(min_i, 1, i);
goods += iclip(0, i, 1);
}
}
}
}


o_integer(goods);
o_(goods, " good lines\n");
o_text(" good unique lines\n");


return 0;
0;
}</lang>
}</syntaxhighlight>
{{out}} (the "reading.txt" needs to be converted to UNIX end-of-line)
{{out}} (the "reading.txt" needs to be converted to UNIX end-of-line)
<pre>duplicate 1990-03-25 line
<pre>duplicate 19900325 line
duplicate 1991-03-31 line
duplicate 19910331 line
duplicate 1992-03-29 line
duplicate 19920329 line
duplicate 1993-03-28 line
duplicate 19930328 line
duplicate 1995-03-26 line
duplicate 19950326 line
5013 good unique lines</pre>
5017 good lines</pre>


=={{header|Amazing Hopper}}==
{{Trans|AWK}}
<syntaxhighlight lang="c">
#include <basico.h>

algoritmo

número de campos correcto = `awk 'NF != 49' basica/readings.txt`

fechas repetidas = `awk '++count[$1] >= 2{print $1, "(",count[$1],")"}' basica/readings.txt`

resultados buenos = `awk '{rec++;ok=1; for(i=0;i<24;i++){if($(2*i+3)<1){ok=0}}; recordok += ok} END {print "Total records",rec,"OK records", recordok, "or", recordok/rec*100,"%"}' basica/readings.txt`

"Check field number by line: ", #( !(number(número de campos correcto)) ? "Ok\n" : "Nok\n";),\
"\nCheck duplicated dates:\n", fechas repetidas,NL, \
"Number of records have good readings for all instruments:\n",resultados buenos,\
"(including "
fijar separador( NL )
contar tokens en 'fechas repetidas'
" duplicated records)\n", luego imprime todo

terminar
</syntaxhighlight>
{{out}}
<pre>
Check field number by line: Ok

Check duplicated dates:
1990-03-25 ( 2 )
1991-03-31 ( 2 )
1992-03-29 ( 2 )
1993-03-28 ( 2 )
1995-03-26 ( 2 )

Number of records have good readings for all instruments:
Total records 5471 OK records 5017 or 91,7017 %
(including 5 duplicated records)

</pre>


=={{header|AutoHotkey}}==
=={{header|AutoHotkey}}==


<lang autohotkey>; Author: AlephX Aug 17 2011
<syntaxhighlight lang="autohotkey">; Author: AlephX Aug 17 2011
data = %A_scriptdir%\readings.txt
data = %A_scriptdir%\readings.txt


Line 225: Line 325:
msgbox, Duplicate Dates:`n%wrongDates%`nRead Lines: %lines%`nValid Lines: %valid%`nwrong lines: %totwrong%`nDuplicates: %TotWrongDates%`nWrong Formatted: %unvalidformat%`n
msgbox, Duplicate Dates:`n%wrongDates%`nRead Lines: %lines%`nValid Lines: %valid%`nwrong lines: %totwrong%`nDuplicates: %TotWrongDates%`nWrong Formatted: %unvalidformat%`n
</syntaxhighlight>
</lang>


Sample Output:
Sample Output:
Line 252: Line 352:


If their are any scientific notation fields then their will be an e in the file:
If their are any scientific notation fields then their will be an e in the file:
<lang awk>bash$ awk '/[eE]/' readings.txt
<syntaxhighlight lang="awk">bash$ awk '/[eE]/' readings.txt
bash$</lang>
bash$</syntaxhighlight>
Quick check on the number of fields:
Quick check on the number of fields:
<lang awk>bash$ awk 'NF != 49' readings.txt
<syntaxhighlight lang="awk">bash$ awk 'NF != 49' readings.txt
bash$</lang>
bash$</syntaxhighlight>
Full check on the file format using a regular expression:
Full check on the file format using a regular expression:
<lang awk>bash$ awk '!(/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+)+$/ && NF==49)' readings.txt
<syntaxhighlight lang="awk">bash$ awk '!(/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+)+$/ && NF==49)' readings.txt
bash$</lang>
bash$</syntaxhighlight>
Full check on the file format as above but using regular expressions allowing intervals (gnu awk):
Full check on the file format as above but using regular expressions allowing intervals (gnu awk):
<lang awk>bash$ awk --re-interval '!(/^[0-9]{4}-[0-9]{2}-[0-9]{2}([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+){24}+$/ )' readings.txt
<syntaxhighlight lang="awk">bash$ awk --re-interval '!(/^[0-9]{4}-[0-9]{2}-[0-9]{2}([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+){24}+$/ )' readings.txt
bash$</lang>
bash$</syntaxhighlight>




Line 268: Line 368:


Accomplished by counting how many times the first field occurs and noting any second occurrences.
Accomplished by counting how many times the first field occurs and noting any second occurrences.
<lang awk>bash$ awk '++count[$1]==2{print $1}' readings.txt
<syntaxhighlight lang="awk">bash$ awk '++count[$1]==2{print $1}' readings.txt
1990-03-25
1990-03-25
1991-03-31
1991-03-31
Line 274: Line 374:
1993-03-28
1993-03-28
1995-03-26
1995-03-26
bash$</lang>
bash$</syntaxhighlight>




Line 280: Line 380:


<div style="width:100%;overflow:scroll">
<div style="width:100%;overflow:scroll">
<lang awk>bash$ awk '{rec++;ok=1; for(i=0;i<24;i++){if($(2*i+3)<1){ok=0}}; recordok += ok} END {print "Total records",rec,"OK records", recordok, "or", recordok/rec*100,"%"}' readings.txt
<syntaxhighlight lang="awk">bash$ awk '{rec++;ok=1; for(i=0;i<24;i++){if($(2*i+3)<1){ok=0}}; recordok += ok} END {print "Total records",rec,"OK records", recordok, "or", recordok/rec*100,"%"}' readings.txt
Total records 5471 OK records 5017 or 91.7017 %
Total records 5471 OK records 5017 or 91.7017 %
bash$</lang>
bash$</syntaxhighlight>
</div>
</div>


=={{header|C}}==
=={{header|C}}==
<lang c>#include <stdio.h>
<syntaxhighlight lang="c">#include <stdio.h>
#include <string.h>
#include <string.h>
#include <stdlib.h>
#include <stdlib.h>
Line 369: Line 469:
read_file("readings.txt");
read_file("readings.txt");
return 0;
return 0;
}</lang>
}</syntaxhighlight>


{{out}}
{{out}}
Line 380: Line 480:


5017 out 5471 lines good
5017 out 5471 lines good
</pre>

=={{header|C++}}==
{{libheader|Boost}}
<lang cpp>#include <boost/regex.hpp>
#include <fstream>
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <cstdlib>
#include <algorithm>
using namespace std ;

boost::regex e ( "\\s+" ) ;

int main( int argc , char *argv[ ] ) {
ifstream infile( argv[ 1 ] ) ;
vector<string> duplicates ;
set<string> datestamps ; //for the datestamps
if ( ! infile.is_open( ) ) {
cerr << "Can't open file " << argv[ 1 ] << '\n' ;
return 1 ;
}
int all_ok = 0 ;//all_ok for lines in the given pattern e
int pattern_ok = 0 ; //overall field pattern of record is ok
while ( infile ) {
string eingabe ;
getline( infile , eingabe ) ;
boost::sregex_token_iterator i ( eingabe.begin( ), eingabe.end( ) , e , -1 ), j ;//we tokenize on empty fields
vector<string> fields( i, j ) ;
if ( fields.size( ) == 49 ) //we expect 49 fields in a record
pattern_ok++ ;
else
cout << "Format not ok!\n" ;
if ( datestamps.insert( fields[ 0 ] ).second ) { //not duplicated
int howoften = ( fields.size( ) - 1 ) / 2 ;//number of measurement
//devices and values
for ( int n = 1 ; atoi( fields[ 2 * n ].c_str( ) ) >= 1 ; n++ ) {
if ( n == howoften ) {
all_ok++ ;
break ;
}
}
}
else {
duplicates.push_back( fields[ 0 ] ) ;//first field holds datestamp
}
}
infile.close( ) ;
cout << "The following " << duplicates.size() << " datestamps were duplicated:\n" ;
copy( duplicates.begin( ) , duplicates.end( ) ,
ostream_iterator<string>( cout , "\n" ) ) ;
cout << all_ok << " records were complete and ok!\n" ;
return 0 ;
}</lang>

{{out}}
<pre>
Format not ok!
The following 6 datestamps were duplicated:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
2004-12-31
</pre>
</pre>


=={{header|C sharp|C#}}==
=={{header|C sharp|C#}}==
<lang csharp>using System;
<syntaxhighlight lang="csharp">using System;
using System.Collections.Generic;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Text.RegularExpressions;
Line 521: Line 554:
}
}
}
}
}</lang>
}</syntaxhighlight>


<pre>
<pre>
Line 530: Line 563:
1993-03-28 is duplicated at Lines : 1183,1184
1993-03-28 is duplicated at Lines : 1183,1184
1995-03-26 is duplicated at Lines : 1910,1911
1995-03-26 is duplicated at Lines : 1910,1911
</pre>

=={{header|C++}}==
{{libheader|Boost}}
<syntaxhighlight lang="cpp">#include <boost/regex.hpp>
#include <fstream>
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <cstdlib>
#include <algorithm>
using namespace std ;

boost::regex e ( "\\s+" ) ;

int main( int argc , char *argv[ ] ) {
ifstream infile( argv[ 1 ] ) ;
vector<string> duplicates ;
set<string> datestamps ; //for the datestamps
if ( ! infile.is_open( ) ) {
cerr << "Can't open file " << argv[ 1 ] << '\n' ;
return 1 ;
}
int all_ok = 0 ;//all_ok for lines in the given pattern e
int pattern_ok = 0 ; //overall field pattern of record is ok
while ( infile ) {
string eingabe ;
getline( infile , eingabe ) ;
boost::sregex_token_iterator i ( eingabe.begin( ), eingabe.end( ) , e , -1 ), j ;//we tokenize on empty fields
vector<string> fields( i, j ) ;
if ( fields.size( ) == 49 ) //we expect 49 fields in a record
pattern_ok++ ;
else
cout << "Format not ok!\n" ;
if ( datestamps.insert( fields[ 0 ] ).second ) { //not duplicated
int howoften = ( fields.size( ) - 1 ) / 2 ;//number of measurement
//devices and values
for ( int n = 1 ; atoi( fields[ 2 * n ].c_str( ) ) >= 1 ; n++ ) {
if ( n == howoften ) {
all_ok++ ;
break ;
}
}
}
else {
duplicates.push_back( fields[ 0 ] ) ;//first field holds datestamp
}
}
infile.close( ) ;
cout << "The following " << duplicates.size() << " datestamps were duplicated:\n" ;
copy( duplicates.begin( ) , duplicates.end( ) ,
ostream_iterator<string>( cout , "\n" ) ) ;
cout << all_ok << " records were complete and ok!\n" ;
return 0 ;
}</syntaxhighlight>

{{out}}
<pre>
Format not ok!
The following 6 datestamps were duplicated:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
2004-12-31
</pre>


=={{header|Clojure}}==
<syntaxhighlight lang="clojure">
(defn parse-line [s]
(let [[date & data-toks] (str/split s #"\s+")
data-fields (map read-string data-toks)
valid-date? (fn [s] (re-find #"\d{4}-\d{2}-\d{2}" s))
valid-line? (and (valid-date? date)
(= 48 (count data-toks))
(every? number? data-fields))
readings (for [[v flag] (partition 2 data-fields)]
{:val v :flag flag})]
(when (not valid-line?)
(println "Malformed Line: " s))
{:date date
:no-missing-readings? (and (= 48 (count data-toks))
(every? pos? (map :flag readings)))}))

(defn analyze-file [path]
(reduce (fn [m line]
(let [{:keys [all-dates dupl-dates n-full-recs invalid-lines]} m
this-date (:date line)
dupl? (contains? all-dates this-date)
full? (:no-missing-readings? line)]
(cond-> m
dupl? (update-in [:dupl-dates] conj this-date)
full? (update-in [:n-full-recs] inc)
true (update-in [:all-dates] conj this-date))))
{:dupl-dates #{} :all-dates #{} :n-full-recs 0}
(->> (slurp path)
clojure.string/split-lines
(map parse-line))))

(defn report-summary [path]
(let [m (analyze-file path)]
(println (format "%d unique dates" (count (:all-dates m))))
(println (format "%d duplicated dates [%s]"
(count (:dupl-dates m))
(clojure.string/join " " (sort (:dupl-dates m)))))
(println (format "%d lines with no missing data" (:n-full-recs m)))))
</syntaxhighlight>

{{out}}
<pre>
5466 unique dates
5 duplicated dates [1990-03-25 1991-03-31 1992-03-29 1993-03-28 1995-03-26]
5017 lines with no missing data
</pre>
</pre>


=={{header|COBOL}}==
=={{header|COBOL}}==
{{works with|OpenCOBOL}}
{{works with|OpenCOBOL}}
<lang cobol> IDENTIFICATION DIVISION.
<syntaxhighlight lang="cobol"> IDENTIFICATION DIVISION.
PROGRAM-ID. text-processing-2.
PROGRAM-ID. text-processing-2.


Line 696: Line 845:
INSPECT input-data (offset:) TALLYING data-len
INSPECT input-data (offset:) TALLYING data-len
FOR CHARACTERS BEFORE delim
FOR CHARACTERS BEFORE delim
.</lang>
.</syntaxhighlight>


{{out}}
{{out}}
Line 711: Line 860:


=={{header|D}}==
=={{header|D}}==
<lang d>void main() {
<syntaxhighlight lang="d">void main() {
import std.stdio, std.array, std.string, std.regex, std.conv,
import std.stdio, std.array, std.string, std.regex, std.conv,
std.algorithm;
std.algorithm;
Line 747: Line 896:
repeatedDates.byKey.filter!(k => repeatedDates[k] > 1));
repeatedDates.byKey.filter!(k => repeatedDates[k] > 1));
writeln("Good reading records: ", goodReadings);
writeln("Good reading records: ", goodReadings);
}</lang>
}</syntaxhighlight>
{{out}}
{{out}}
<pre>Duplicated timestamps: 1990-03-25, 1991-03-31, 1992-03-29, 1993-03-28, 1995-03-26
<pre>Duplicated timestamps: 1990-03-25, 1991-03-31, 1992-03-29, 1993-03-28, 1995-03-26
Good reading records: 5017</pre>
Good reading records: 5017</pre>

=={{header|Eiffel}}==
<syntaxhighlight lang="eiffel">
class
APPLICATION

create
make

feature

make
-- Finds double date stamps and wrong formats.
local
found: INTEGER
double: STRING
do
read_wordlist
fill_hash_table
across
hash as h
loop
if h.key.has_substring ("_double") then
io.put_string ("Double date stamp: %N")
double := h.key
double.remove_tail (7)
io.put_string (double)
io.new_line
end
if h.item.count /= 24 then
io.put_string (h.key.out + " has the wrong format. %N")
found := found + 1
end
end
io.put_string (found.out + " records have not 24 readings.%N")
good_records
end

good_records
-- Number of records that have flag values > 0 for all readings.
local
count, total: INTEGER
end_date: STRING
do
create end_date.make_empty
across
hash as h
loop
count := 0
across
h.item as d
loop
if d.item.flag > 0 then
count := count + 1
end
end
if count = 24 then
total := total + 1
end
end
io.put_string ("%NGood records: " + total.out + ". %N")
end

original_list: STRING = "readings.txt"

read_wordlist
--Preprocesses data in 'data'.
local
l_file: PLAIN_TEXT_FILE
do
create l_file.make_open_read_write (original_list)
l_file.read_stream (l_file.count)
data := l_file.last_string.split ('%N')
l_file.close
end

data: LIST [STRING]

fill_hash_table
--Fills 'hash' using the date as key.
local
by_dates: LIST [STRING]
date: STRING
data_tup: TUPLE [val: REAL; flag: INTEGER]
data_arr: ARRAY [TUPLE [val: REAL; flag: INTEGER]]
i: INTEGER
do
create hash.make (data.count)
across
data as d
loop
if not d.item.is_empty then
by_dates := d.item.split ('%T')
date := by_dates [1]
by_dates.prune (date)
create data_tup
create data_arr.make_empty
from
i := 1
until
i > by_dates.count - 1
loop
data_tup := [by_dates [i].to_real, by_dates [i + 1].to_integer]
data_arr.force (data_tup, data_arr.count + 1)
i := i + 2
end
hash.put (data_arr, date)
if not hash.inserted then
date.append ("_double")
hash.put (data_arr, date)
end
end
end
end

hash: HASH_TABLE [ARRAY [TUPLE [val: REAL; flag: INTEGER]], STRING]

end
</syntaxhighlight>
{{out}}
<pre>
Double date stamp:
1990-03-25
Double date stamp:
1991-03-31
Double date stamp:
1992-03-29
Double date stamp:
1993-03-28
Double date stamp:
1995-03-26
0 records have not 24 readings.

Good records: 5017.
</pre>


=={{header|Erlang}}==
=={{header|Erlang}}==
Uses function from [[Text_processing/1]]. It does some correctness checks for us.
Uses function from [[Text_processing/1]]. It does some correctness checks for us.
<syntaxhighlight lang="erlang">
<lang Erlang>
-module( text_processing2 ).
-module( text_processing2 ).


Line 786: Line 1,070:


value_flag_records() -> 24.
value_flag_records() -> 24.
</syntaxhighlight>
</lang>
{{out}}
{{out}}
<pre>
<pre>
Line 795: Line 1,079:


=={{header|F Sharp|F#}}==
=={{header|F Sharp|F#}}==
<lang fsharp>
<syntaxhighlight lang="fsharp">
let file = @"readings.txt"
let file = @"readings.txt"


Line 815: Line 1,099:
ok <- ok + 1
ok <- ok + 1
printf "%d records were ok\n" ok
printf "%d records were ok\n" ok
</syntaxhighlight>
</lang>
Prints:
Prints:
<lang fsharp>
<syntaxhighlight lang="fsharp">
Date 1990-03-25 is duplicated
Date 1990-03-25 is duplicated
Date 1991-03-31 is duplicated
Date 1991-03-31 is duplicated
Line 824: Line 1,108:
Date 1995-03-26 is duplicated
Date 1995-03-26 is duplicated
5017 records were ok
5017 records were ok
</syntaxhighlight>
</lang>

=={{header|Factor}}==
{{works with|Factor|0.99 2020-03-02}}
<syntaxhighlight lang="factor">USING: io io.encodings.ascii io.files kernel math math.parser
prettyprint sequences sequences.extras sets splitting ;

: check-format ( seq -- )
[ " \t" split length 49 = ] all?
"Format okay." "Format not okay." ? print ;

"readings.txt" ascii file-lines [ check-format ] keep
[ "Duplicates:" print [ "\t" split1 drop ] map duplicates . ]
[ [ " \t" split rest <odds> [ string>number 0 <= ] none? ] count ]
bi pprint " records were good." print</syntaxhighlight>
{{out}}
<pre>
Format okay.
Duplicates:
{
"1990-03-25"
"1991-03-31"
"1992-03-29"
"1993-03-28"
"1995-03-26"
}
5017 records were good.
</pre>

=={{header|Fortran}}==
The trouble with the dates rather suggests that they should be checked for correctness in themselves, and that the sequence check should be that each new record advances the date by one day. Daynumber calculations were long ago presented by H. F. Fliegel and T.C. van Flandern, in Communications of the ACM, Vol. 11, No. 10 (October, 1968).

Rather than copy today's data to a PDATA holder so that on the next read the new data may be compared to the old, a two-row array is used, with IT flip-flopping 1,2,1,2,1,2,... Comparison of the data as numerical values rather than text strings means that different texts that evoke the same value will not be regarded as different. If the data format were invalid, there would be horrible messages. There aren't, so ... the values should be read and plotted...

<syntaxhighlight lang="fortran">
Crunches a set of hourly data. Starts with a date, then 24 pairs of value,indicator for that day, on one line.
INTEGER Y,M,D !Year, month, and day.
INTEGER GOOD(24,2) !The indicators.
REAL*8 V(24,2) !The grist.
CHARACTER*10 DATE(2) !Along with the starting date.
INTEGER IT,TI !A flipper and its antiflipper.
INTEGER NV !Number of entirely good records.
INTEGER I,NREC,HIC !Some counters.
LOGICAL INGOOD !State flipper for the runs of data.
INTEGER IN,MSG !I/O mnemonics.
CHARACTER*666 ACARD !Scratchpad, of sufficient length for all expectation.
IN = 10 !Unit number for the input file.
MSG = 6 !Output.
OPEN (IN,FILE="Readings1.txt", FORM="FORMATTED", !This should be a function.
1 STATUS ="OLD",ACTION="READ") !Returning success, or failure.
NV = 0 !No pure records seen.
NREC = 0 !No records read.
HIC = 0 !Provoking no complaints.
DATE = "snargle" !No date should look like this!
IT = 2 !Syncopation for the 1-2 flip flop.
Chew into the file.
10 READ (IN,11,END=100,ERR=666) L,ACARD(1:MIN(L,LEN(ACARD))) !With some protection.
NREC = NREC + 1 !So, a record has been read.
11 FORMAT (Q,A) !Obviously, Q ascertains the length of the record being read.
READ (ACARD,12,END=600,ERR=601) Y,M,D !The date part is trouble, as always.
12 FORMAT (I4,2(1X,I2)) !Because there are no delimiters between the parts.
TI = IT !Thus finger the previous value.
IT = 3 - IT !Flip between 1 and 2.
DATE(IT) = ACARD(1:10) !Save the date field.
READ (ACARD(11:L),*,END=600,ERR=601) (V(I,IT),GOOD(I,IT),I = 1,24) !But after the date, delimiters abound.
Comparisons. Should really convert the date to a daynumber, check it by reversion, and then check for + 1 day only.
20 IF (DATE(IT).EQ.DATE(TI)) THEN !Same date?
IF (ALL(V(:,IT) .EQ.V(:,TI)) .AND. !Yes. What about the data?
1 ALL(GOOD(:,IT).EQ.GOOD(:,TI))) THEN !This disregards details of the spacing of the data.
WRITE (MSG,21) NREC,DATE(IT),"same." !Also trailing zeroes, spurious + signs, blah blah.
21 FORMAT ("Record",I8," Duplicate date field (",A,"), data ",A) !Say it.
ELSE !But if they're not all equal,
WRITE (MSG,21) NREC,DATE(IT),"different!" !They're different!
END IF !So much for comparing the data.
END IF !So much for just comparing the date's text.
IF (ALL(GOOD(:,IT).GT.0)) NV = NV + 1 !A fully healthy record, either way?
GO TO 10 !More! More! I want more!!

Complaints. Should really distinguish between trouble in the date part and in the data part.
600 WRITE (MSG,*) '"END" declared - insufficient data?' !Not enough numbers, presumably.
GO TO 602 !Reveal the record.
601 WRITE (MSG,*) '"ERR" declared - improper number format?' !Ah, but which number?
602 WRITE (MSG,603) NREC,L,ACARD(1:L) !Anyway, reveal the uninterpreted record.
603 FORMAT("Record",I8,", length ",I0," reads ",A) !Just so.
HIC = HIC + 1 !This may grow into a habit.
IF (HIC.LE.12) GO TO 10 !But if not yet, try the next record.
STOP "Enough distaste." !Or, give up.
666 WRITE (MSG,101) NREC,"format error!" !For A-style data? Should never happen!
GO TO 900 !But if it does, give up!

Closedown.
100 WRITE (MSG,101) NREC,"then end-of-file" !Discovered on the next attempt.
101 FORMAT ("Record",I8,": ",A) !A record number plus a remark.
WRITE (MSG,102) NV !The overall results.
102 FORMAT (" with",I8," having all values good.") !This should do.
900 CLOSE(IN) !Done.
END !Spaghetti rules.
</syntaxhighlight>

Output:
Record 85 Duplicate date field (1990-03-25), data different!
Record 456 Duplicate date field (1991-03-31), data different!
Record 820 Duplicate date field (1992-03-29), data different!
Record 1184 Duplicate date field (1993-03-28), data different!
Record 1911 Duplicate date field (1995-03-26), data different!
Record 5471: then end-of-file
with 5017 having all values good.

=={{header|Go}}==
=={{header|Go}}==
<lang go>package main
<syntaxhighlight lang="go">package main


import (
import (
"bufio"
"bufio"
"fmt"
"fmt"
"io"
"log"
"os"
"os"
"strconv"
"strconv"
"strings"
"strings"
"time"
)
)


const (
var fn = "readings.txt"
filename = "readings.txt"
readings = 24 // per line
fields = readings*2 + 1 // per line
dateFormat = "2006-01-02"
)


func main() {
func main() {
f, err := os.Open(fn)
file, err := os.Open(filename)
if err != nil {
if err != nil {
fmt.Println(err)
log.Fatal(err)
}
return
defer file.Close()
}
var allGood, uniqueGood int
defer f.Close()
// map records not only dates seen, but also if an all-good record was
var allGood, uniqueGood int
// seen for the key date.
// map records not only dates seen, but also if an all-good record was
m := make(map[time.Time]bool)
// seen for the key date.
s := bufio.NewScanner(file)
m := make(map[string]bool)
for lr := bufio.NewReader(f); ; {
for s.Scan() {
f := strings.Fields(s.Text())
line, pref, err := lr.ReadLine()
if err == io.EOF {
if len(f) != fields {
log.Fatal("unexpected format,", len(f), "fields.")
break
}
}
ts, err := time.Parse(dateFormat, f[0])
if err != nil {
if err != nil {
fmt.Println(err)
log.Fatal(err)
return
}
}
good := true
if pref {
for i := 1; i < fields; i += 2 {
fmt.Println("Unexpected long line.")
flag, err := strconv.Atoi(f[i+1])
return
if err != nil {
}
log.Fatal(err)
f := strings.Fields(string(line))
}
if len(f) != 49 {
if flag > 0 { // value is good
fmt.Println("unexpected format,", len(f), "fields.")
_, err := strconv.ParseFloat(f[i], 64)
return
if err != nil {
}
log.Fatal(err)
good := true
}
for i := 1; i < 49; i += 2 {
} else { // value is bad
flag, err := strconv.Atoi(f[i+1])
good = false
if err != nil {
}
fmt.Println(err)
}
return
if good {
}
allGood++
if flag > 0 { // value is good
}
_, err := strconv.ParseFloat(f[i], 64)
previouslyGood, seen := m[ts]
if err != nil {
if seen {
fmt.Println(err)
fmt.Println("Duplicate datestamp:", f[0])
return
}
}
m[ts] = previouslyGood || good
} else { // value is bad
if !previouslyGood && good {
good = false
uniqueGood++
}
}
}
}
if good {
if err := s.Err(); err != nil {
allGood++
log.Fatal(err)
}
}
previouslyGood, seen := m[f[0]]

if seen {
fmt.Println("Duplicate datestamp:", f[0])
fmt.Println("\nData format valid.")
fmt.Println(allGood, "records with good readings for all instruments.")
if !previouslyGood && good {
fmt.Println(uniqueGood,
m[string([]byte(f[0]))] = true
"unique dates with good readings for all instruments.")
uniqueGood++
}</syntaxhighlight>
}
{{out}}
} else {
m[string([]byte(f[0]))] = good
if good {
uniqueGood++
}
}
}
fmt.Println("\nData format valid.")
fmt.Println(allGood, "records with good readings for all instruments.")
fmt.Println(uniqueGood,
"unique dates with good readings for all instruments.")
}</lang>
Output:
<pre>
<pre>
Duplicate datestamp: 1990-03-25
Duplicate datestamp: 1990-03-25
Line 921: Line 1,306:


=={{header|Haskell}}==
=={{header|Haskell}}==
<lang haskell>
<syntaxhighlight lang="haskell">
import Data.List (nub, (\\))
import Data.List (nub, (\\))


Line 940: Line 1,325:
putStr (unlines ("duplicated dates:": duplicatedDates (map date inputs)))
putStr (unlines ("duplicated dates:": duplicatedDates (map date inputs)))
putStrLn ("number of good records: " ++ show (length $ goodRecords inputs))
putStrLn ("number of good records: " ++ show (length $ goodRecords inputs))
</syntaxhighlight>
</lang>


this script outputs:
this script outputs:
Line 956: Line 1,341:
duplicated timestamps that are on well-formed records.
duplicated timestamps that are on well-formed records.


<lang unicon>procedure main(A)
<syntaxhighlight lang="unicon">procedure main(A)
dups := set()
dups := set()
goodRecords := 0
goodRecords := 0
Line 988: Line 1,373:
}
}
end</lang>
end</syntaxhighlight>


Sample run:
Sample run:
Line 1,001: Line 1,386:


=={{header|J}}==
=={{header|J}}==
<lang j> require 'tables/dsv dates'
<syntaxhighlight lang="j"> require 'tables/dsv dates'
dat=: TAB readdsv jpath '~temp/readings.txt'
dat=: TAB readdsv jpath '~temp/readings.txt'
Dates=: getdate"1 >{."1 dat
Dates=: getdate"1 >{."1 dat
Line 1,020: Line 1,405:
1992 3 29
1992 3 29
1993 3 28
1993 3 28
1995 3 26</lang>
1995 3 26</syntaxhighlight>


=={{header|Java}}==
=={{header|Java}}==
{{trans|C++}}
{{trans|C++}}
{{works with|Java|1.5+}}
{{works with|Java|1.5+}}
<lang java5>import java.util.*;
<syntaxhighlight lang="java5">import java.util.*;
import java.util.regex.*;
import java.util.regex.*;
import java.io.*;
import java.io.*;
Line 1,068: Line 1,453:
}
}
}
}
}</lang>
}</syntaxhighlight>
The program produces the following output:
The program produces the following output:
<pre>
<pre>
Line 1,082: Line 1,467:
=={{header|JavaScript}}==
=={{header|JavaScript}}==
{{works with|JScript}}
{{works with|JScript}}
<lang javascript>// wrap up the counter variables in a closure.
<syntaxhighlight lang="javascript">// wrap up the counter variables in a closure.
function analyze_func(filename) {
function analyze_func(filename) {
var dates_seen = {};
var dates_seen = {};
Line 1,131: Line 1,516:


var analyze = analyze_func('readings.txt');
var analyze = analyze_func('readings.txt');
analyze();</lang>
analyze();</syntaxhighlight>

=={{header|jq}}==
{{works with|jq|with regex support}}

For this problem, it is convenient to use jq in a pipeline: the first invocation of jq will convert the text file into a stream of JSON arrays (one array per line):
<syntaxhighlight lang="sh">$ jq -R '[splits("[ \t]+")]' Text_processing_2.txt</syntaxhighlight>

The second part of the pipeline performs the task requirements. The following program is used in the second invocation of jq.

'''Generic Utilities'''
<syntaxhighlight lang="jq"># Given any array, produce an array of [item, count] pairs for each run.
def runs:
reduce .[] as $item
( [];
if . == [] then [ [ $item, 1] ]
else .[length-1] as $last
| if $last[0] == $item then (.[0:length-1] + [ [$item, $last[1] + 1] ] )
else . + [[$item, 1]]
end
end ) ;

def is_float: test("^[-+]?[0-9]*[.][0-9]*([eE][-+]?[0-9]+)?$");

def is_integral: test("^[-+]?[0-9]+$");

def is_date: test("[12][0-9]{3}-[0-9][0-9]-[0-9][0-9]");</syntaxhighlight>

'''Validation''':
<syntaxhighlight lang="jq"># Report line and column numbers using conventional numbering (IO=1).
def validate_line(nr):
def validate_date:
if is_date then empty else "field 1 in line \(nr) has an invalid date: \(.)" end;
def validate_length(n):
if length == n then empty else "line \(nr) has \(length) fields" end;
def validate_pair(i):
( .[2*i + 1] as $n
| if ($n | is_float) then empty else "field \(2*i + 2) in line \(nr) is not a float: \($n)" end),
( .[2*i + 2] as $n
| if ($n | is_integral) then empty else "field \(2*i + 3) in line \(nr) is not an integer: \($n)" end);
(.[0] | validate_date),
(validate_length(49)),
(range(0; (length-1) / 2) as $i | validate_pair($i)) ;

def validate_lines:
. as $in
| range(0; length) as $i | ($in[$i] | validate_line($i + 1));</syntaxhighlight>

'''Check for duplicate timestamps'''
<syntaxhighlight lang="jq">def duplicate_timestamps:
[.[][0]] | sort | runs | map( select(.[1]>1) );</syntaxhighlight>

'''Number of valid readings for all instruments''':
<syntaxhighlight lang="jq"># The following ignores any issues with respect to duplicate dates,
# but does check the validity of the record, including the date format:
def number_of_valid_readings:
def check:
. as $in
| (.[0] | is_date)
and length == 49
and all(range(0; 24) | $in[2*. + 1] | is_float)
and all(range(0; 24) | $in[2*. + 2] | (is_integral and tonumber >= 1) );

map(select(check)) | length ;</syntaxhighlight>

'''Generate Report'''
<syntaxhighlight lang="jq">validate_lines,
"\nChecking for duplicate timestamps:",
duplicate_timestamps,
"\nThere are \(number_of_valid_readings) valid rows altogether."</syntaxhighlight>
{{out}}
'''Part 1: Simple demonstration'''

To illustrate that the program does report invalid lines, we first use the six lines at the top but mangle the last line.
<syntaxhighlight lang="sh">$ jq -R '[splits("[ \t]+")]' Text_processing_2.txt | jq -s -r -f Text_processing_2.jq
field 1 in line 6 has an invalid date: 991-04-03
line 6 has 47 fields
field 2 in line 6 is not a float: 10000
field 3 in line 6 is not an integer: 1.0
field 47 in line 6 is not an integer: x

Checking for duplicate timestamps:
[
[
"1991-03-31",
2
]
]

There are 5 valid rows altogether.</syntaxhighlight>

'''Part 2: readings.txt'''
<syntaxhighlight lang="sh">$ jq -R '[splits("[ \t]+")]' readings.txt | jq -s -r -f Text_processing_2.jq
Checking for duplicate timestamps:
[
[
"1990-03-25",
2
],
[
"1991-03-31",
2
],
[
"1992-03-29",
2
],
[
"1993-03-28",
2
],
[
"1995-03-26",
2
]
]

There are 5017 valid rows altogether.</syntaxhighlight>

=={{header|Julia}}==
Refer to the code at https://rosettacode.org/wiki/Text_processing/1#Julia. Add at the end of that code the following:
<syntaxhighlight lang="julia">
dupdate = df[nonunique(df[:,[:Date]]),:][:Date]
println("The following rows have duplicate DATESTAMP:")
println(df[df[:Date] .== dupdate,:])
println("All values good in these rows:")
println(df[df[:ValidValues] .== 24,:])
</syntaxhighlight>
{{output}}
<pre>
The following rows have duplicate DATESTAMP:
2×29 DataFrames.DataFrame
│ Row │ Date │ Mean │ ValidValues │ MaximumGap │ GapPosition │ 0:00 │ 1:00 │ 2:00 │ 3:00 │ 4:00 │
├─────┼─────────────────────┼─────────┼─────────────┼────────────┼─────────────┼──────┼──────┼──────┼──────┼──────┤
│ 1 │ 1991-03-31T00:00:00 │ 23.5417 │ 24 │ 0 │ 0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2 │ 1991-03-31T00:00:00 │ 40.0 │ 1 │ 23 │ 2 │ 40.0 │ NaN │ NaN │ NaN │ NaN │

│ Row │ 5:00 │ 6:00 │ 7:00 │ 8:00 │ 9:00 │ 10:00 │ 11:00 │ 12:00 │ 13:00 │ 14:00 │ 15:00 │ 16:00 │ 17:00 │ 18:00 │
├─────┼──────┼──────┼──────┼──────┼──────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ 10.0 │ 10.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0 │ 50.0 │ 60.0 │ 40.0 │ 30.0 │ 30.0 │ 30.0 │ 25.0 │ 20.0 │
│ 2 │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │ NaN │

│ Row │ 19:00 │ 20:00 │ 21:00 │ 22:00 │ 23:00 │
├─────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ 20.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0 │
│ 2 │ NaN │ NaN │ NaN │ NaN │ NaN │
All values good in these rows:
4×29 DataFrames.DataFrame
│ Row │ Date │ Mean │ ValidValues │ MaximumGap │ GapPosition │ 0:00 │ 1:00 │ 2:00 │ 3:00 │ 4:00 │
├─────┼─────────────────────┼─────────┼─────────────┼────────────┼─────────────┼──────┼──────┼──────┼──────┼──────┤
│ 1 │ 1991-03-30T00:00:00 │ 10.0 │ 24 │ 0 │ 0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2 │ 1991-03-31T00:00:00 │ 23.5417 │ 24 │ 0 │ 0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 3 │ 1991-04-02T00:00:00 │ 19.7917 │ 24 │ 0 │ 0 │ 8.0 │ 9.0 │ 11.0 │ 12.0 │ 12.0 │
│ 4 │ 1991-04-03T00:00:00 │ 13.9583 │ 24 │ 0 │ 0 │ 10.0 │ 9.0 │ 10.0 │ 10.0 │ 9.0 │

│ Row │ 5:00 │ 6:00 │ 7:00 │ 8:00 │ 9:00 │ 10:00 │ 11:00 │ 12:00 │ 13:00 │ 14:00 │ 15:00 │ 16:00 │ 17:00 │ 18:00 │
├─────┼──────┼──────┼──────┼──────┼──────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2 │ 10.0 │ 10.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0 │ 50.0 │ 60.0 │ 40.0 │ 30.0 │ 30.0 │ 30.0 │ 25.0 │ 20.0 │
│ 3 │ 12.0 │ 27.0 │ 26.0 │ 27.0 │ 33.0 │ 32.0 │ 31.0 │ 29.0 │ 31.0 │ 25.0 │ 25.0 │ 24.0 │ 21.0 │ 17.0 │
│ 4 │ 10.0 │ 15.0 │ 24.0 │ 28.0 │ 24.0 │ 18.0 │ 14.0 │ 12.0 │ 13.0 │ 14.0 │ 15.0 │ 14.0 │ 15.0 │ 13.0 │

│ Row │ 19:00 │ 20:00 │ 21:00 │ 22:00 │ 23:00 │
├─────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2 │ 20.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0 │
│ 3 │ 14.0 │ 15.0 │ 12.0 │ 12.0 │ 10.0 │
│ 4 │ 13.0 │ 13.0 │ 12.0 │ 10.0 │ 10.0 │
</pre>

=={{header|Kotlin}}==
<syntaxhighlight lang="scala">// version 1.2.31

import java.io.File

fun main(args: Array<String>) {
val rx = Regex("""\s+""")
val file = File("readings.txt")
var count = 0
var invalid = 0
var allGood = 0
var map = mutableMapOf<String, Int>()
file.forEachLine { line ->
count++
val fields = line.split(rx)
val date = fields[0]
if (fields.size == 49) {
if (map.containsKey(date))
map[date] = map[date]!! + 1
else
map.put(date, 1)
var good = 0
for (i in 2 until fields.size step 2) {
if (fields[i].toInt() >= 1) {
good++
}
}
if (good == 24) allGood++
}
else invalid++
}

println("File = ${file.name}")
println("\nDuplicated dates:")
for ((k,v) in map) {
if (v > 1) println(" $k ($v times)")
}
println("\nTotal number of records : $count")
var percent = invalid.toDouble() / count * 100.0
println("Number of invalid records : $invalid (${"%5.2f".format(percent)}%)")
percent = allGood.toDouble() / count * 100.0
println("Number which are all good : $allGood (${"%5.2f".format(percent)}%)")
}</syntaxhighlight>

{{out}}
<pre>
File = readings.txt

Duplicated dates:
1990-03-25 (2 times)
1991-03-31 (2 times)
1992-03-29 (2 times)
1993-03-28 (2 times)
1995-03-26 (2 times)

Total number of records : 5471
Number of invalid records : 0 ( 0.00%)
Number which are all good : 5017 (91.70%)
</pre>

=={{header|Lua}}==
=={{header|Lua}}==
<lang lua>filename = "readings.txt"
<syntaxhighlight lang="lua">filename = "readings.txt"
io.input( filename )
io.input( filename )


Line 1,176: Line 1,791:
for i = 1, #bad_format do
for i = 1, #bad_format do
print( " ", bad_format[i] )
print( " ", bad_format[i] )
end</lang>
end</syntaxhighlight>
Output:
Output:
<pre>Lines read: 5471
<pre>Lines read: 5471
Line 1,189: Line 1,804:


</pre>
</pre>
=={{header|M2000 Interpreter}}==
File is in user dir. Use Win Dir$ to open the explorer window and copy there the readings.txt


<syntaxhighlight lang="m2000 interpreter">Module TestThis {
=={{header|Mathematica}}==
Document a$, exp$
<lang Mathematica>data = Import["Readings.txt","TSV"]; Print["duplicated dates: "];
\\ automatic find the enconding and the line break
Load.doc a$, "readings.txt"
m=0
n=doc.par(a$)
k=list
nl$={
}
l=0
exp$=format$("Records: {0}", n)+nl$
For i=1 to n
b$=paragraph$(a$, i)
If exist(k,Left$(b$, 10)) then
m++ : where=eval(k)
exp$=format$("Duplicate for {0} at {1}",where, i)+nl$
Else
Append k, Left$(b$, 10):=i
End if
Stack New {
Stack Mid$(Replace$(chr$(9)," ", b$), 11)
while not empty {
Read a, b
if b<=0 then l++ : exit
}
}
Next
exp$= format$("Duplicates {0}",m)+nl$
exp$= format$("Valid Records {0}",n-l)+nl$
clipboard exp$
report exp$
}
TestThis
</syntaxhighlight>
{{out}}
<pre>
Records: 5471
Duplicate for 84 at 85
Duplicate for 455 at 456
Duplicate for 819 at 820
Duplicate for 1183 at 1184
Duplicate for 1910 at 1911
Duplicates 5
Valid Records 5017

</pre>

=={{header|Mathematica}}/{{header|Wolfram Language}}==
<syntaxhighlight lang="mathematica">data = Import["Readings.txt","TSV"]; Print["duplicated dates: "];
Select[Tally@data[[;;,1]], #[[2]]>1&][[;;,1]]//Column
Select[Tally@data[[;;,1]], #[[2]]>1&][[;;,1]]//Column
Print["number of good records: ", Count[(Times@@#[[3;;All;;2]])& /@ data, 1],
Print["number of good records: ", Count[(Times@@#[[3;;All;;2]])& /@ data, 1],
" (out of a total of ", Length[data], ")"]</lang>
" (out of a total of ", Length[data], ")"]</syntaxhighlight>
{{out}}

<pre>duplicated dates:
<pre>duplicated dates:
1990-03-25
1990-03-25
Line 1,202: Line 1,866:
1993-03-28
1993-03-28
1995-03-26
1995-03-26

number of good records: 5017 (out of a total of 5471)</pre>
number of good records: 5017 (out of a total of 5471)</pre>


=={{header|MATLAB}} / {{header|Octave}}==
=={{header|MATLAB}} / {{header|Octave}}==


<lang MATLAB>function [val,count] = readdat(configfile)
<syntaxhighlight lang="matlab">function [val,count] = readdat(configfile)
% READDAT reads readings.txt file
% READDAT reads readings.txt file
%
%
Line 1,235: Line 1,898:
dix = find(diff(d)==0) % check for to consequtive timestamps with zero difference
dix = find(diff(d)==0) % check for to consequtive timestamps with zero difference


printf('number of valid records: %i\n ', sum( all( val(:,5:2:end) >= 1, 2) ) );</lang>
printf('number of valid records: %i\n ', sum( all( val(:,5:2:end) >= 1, 2) ) );</syntaxhighlight>


<pre>>> [val,count]=readdat;
<pre>>> [val,count]=readdat;
Line 1,248: Line 1,911:
number of valid records: 5017
number of valid records: 5017
</pre>
</pre>

=={{header|Nim}}==
<syntaxhighlight lang="nim">import strutils, tables

const NumFields = 49
const DateField = 0
const FlagGoodValue = 1

var badRecords: int # Number of records that have invalid formatted values.
var totalRecords: int # Total number of records in the file.
var badInstruments: int # Total number of records that have at least one instrument showing error.
var seenDates: Table[string, bool] # Table to keep track of what dates we have seen.

proc checkFloats(floats: seq[string]): bool =
## Ensure we can parse all records as floats (except the date stamp).
for index in 1..<NumFields:
try:
# We're assuming all instrument flags are floats not integers.
discard parseFloat(floats[index])
except ValueError:
return false
true

proc areAllFlagsOk(instruments: seq[string]): bool =
## Ensure that all sensor flags are ok.

# Flags start at index 2, and occur every 2 fields.
for index in countup(2, NumFields, 2):
# We're assuming all instrument flags are floats not integers
var flag = parseFloat(instruments[index])
if flag < FlagGoodValue: return false
true


# Note: we're not checking the format of the date stamp.

# Main.

var currentLine = 0
for line in "readings.txt".lines:
currentLine.inc
if line.len == 0: continue # Empty lines don't count as records.

var tokens = line.split({' ', '\t'})
totalRecords.inc

if tokens.len != NumFields:
badRecords.inc
continue

if not checkFloats(tokens):
badRecords.inc
continue

if not areAllFlagsOk(tokens):
badInstruments.inc

if seenDates.hasKeyOrPut(tokens[DateField], true):
echo tokens[DateField], " duplicated on line ", currentLine

let goodRecords = totalRecords - badRecords
let goodInstruments = goodRecords - badInstruments

echo "Total Records: ", totalRecords
echo "Records with wrong format: ", badRecords
echo "Records where all instruments were OK: ", goodInstruments</syntaxhighlight>

{{out}}
<pre>1990-03-25 duplicated on line 85
1991-03-31 duplicated on line 456
1992-03-29 duplicated on line 820
1993-03-28 duplicated on line 1184
1995-03-26 duplicated on line 1911
Total Records: 5471
Records with wrong format: 0
Records where all instruments were OK: 5017</pre>

=={{header|OCaml}}==
=={{header|OCaml}}==
<lang ocaml>#load "str.cma"
<syntaxhighlight lang="ocaml">#load "str.cma"
open Str
open Str


let strip_cr str =
let strip_cr str =
let last = pred(String.length str) in
let last = pred (String.length str) in
if str.[last] <> '\r' then (str) else (String.sub str 0 last)
if str.[last] <> '\r' then str else String.sub str 0 last


let map_records =
let map_records =
Line 1,263: Line 2,002:
aux (e::acc) tail
aux (e::acc) tail
| [_] -> invalid_arg "invalid data"
| [_] -> invalid_arg "invalid data"
| [] -> (List.rev acc)
| [] -> List.rev acc
in
in
aux [] ;;
aux [] ;;
Line 1,276: Line 2,015:
aux acc tl
aux acc tl
| [] ->
| [] ->
(List.rev acc)
List.rev acc
in
in
aux [] ;;
aux [] ;;


let record_ok (_,record) =
let record_ok (_,record) =
let is_ok (_,v) = (v >= 1) in
let is_ok (_,v) = v >= 1 in
let sum_ok =
let sum_ok =
List.fold_left (fun sum this ->
List.fold_left (fun sum this ->
if is_ok this then succ sum else sum) 0 record
if is_ok this then succ sum else sum) 0 record
in
in
(sum_ok = 24)
sum_ok = 24


let num_good_records =
let num_good_records =
Line 1,295: Line 2,034:
let li = split (regexp "[ \t]+") line in
let li = split (regexp "[ \t]+") line in
let records = map_records (List.tl li)
let records = map_records (List.tl li)
and date = (List.hd li) in
and date = List.hd li in
(date, records)
(date, records)


Line 1,301: Line 2,040:
let ic = open_in "readings.txt" in
let ic = open_in "readings.txt" in
let rec read_loop acc =
let rec read_loop acc =
let line_opt = try Some (strip_cr (input_line ic))
try
with End_of_file -> None
let line = strip_cr(input_line ic) in
in
read_loop ((parse_line line) :: acc)
with End_of_file ->
match line_opt with
close_in ic;
None -> close_in ic; List.rev acc
(List.rev acc)
| Some line -> read_loop (parse_line line :: acc)
in
in
let inputs = read_loop [] in
let inputs = read_loop [] in
Line 1,317: Line 2,056:


Printf.printf "number of good records: %d\n" (num_good_records inputs);
Printf.printf "number of good records: %d\n" (num_good_records inputs);
;;</lang>
;;</syntaxhighlight>


this script outputs:
this script outputs:
Line 1,331: Line 2,070:


=={{header|Perl}}==
=={{header|Perl}}==
<lang perl>use List::MoreUtils 'natatime';
<syntaxhighlight lang="perl">use List::MoreUtils 'natatime';
use constant FIELDS => 49;
use constant FIELDS => 49;


Line 1,358: Line 2,097:
map {" $_\n"}
map {" $_\n"}
grep {$dates{$_} > 1}
grep {$dates{$_} > 1}
sort keys %dates;</lang>
sort keys %dates;</syntaxhighlight>


Output:
Output:
Line 1,369: Line 2,108:
1995-03-26</pre>
1995-03-26</pre>


=={{header|Perl 6}}==
=={{header|Phix}}==
<!--<syntaxhighlight lang="phix">(phixonline)-->
{{trans|Perl}}
<span style="color: #000080;font-style:italic;">-- demo\rosetta\TextProcessing2.exw</span>
{{works with|Rakudo|2010.11}}
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span> <span style="color: #000080;font-style:italic;">-- (include version/first of next three lines only)</span>

<span style="color: #008080;">include</span> <span style="color: #000000;">readings</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span> <span style="color: #000080;font-style:italic;">-- global constant lines, or:
<lang perl6>my $fields = 49;
--assert(write_lines("readings.txt",lines)!=-1) -- first run, then:

--constant lines = read_lines("readings.txt")</span>
my ($good-records, %dates) = 0;
for 1 .. * Z $*IN.lines -> $line, $s {
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">\</span><span style="color: #004080;">timedate</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span>
my @fs = split /\s+/, $s;
@fs == $fields or die "$line: Bad number of fields";
<span style="color: #004080;">integer</span> <span style="color: #000000;">all_good</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span>
given shift @fs {
m/\d**4 \- \d**2 \- \d**2/ or die "$line: Bad date format";
<span style="color: #004080;">string</span> <span style="color: #000000;">fmt</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">"%d-%d-%d\t"</span><span style="color: #0000FF;">&</span><span style="color: #7060A8;">join</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"%f"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">48</span><span style="color: #0000FF;">),</span><span style="color: #008000;">'\t'</span><span style="color: #0000FF;">)</span>
++%dates{$_};
<span style="color: #004080;">sequence</span> <span style="color: #000000;">extset</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sq_mul</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">tagset</span><span style="color: #0000FF;">(</span><span style="color: #000000;">24</span><span style="color: #0000FF;">),</span><span style="color: #000000;">2</span><span style="color: #0000FF;">),</span> <span style="color: #000080;font-style:italic;">-- {2,4,6,..48}</span>
}
<span style="color: #000000;">curr</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">last</span>
my $all-flags-okay = True;
for @fs -> $val, $flag {
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
$val ~~ /\d+ \. \d+/ or die "$line: Bad value format";
<span style="color: #004080;">string</span> <span style="color: #000000;">li</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">lines</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span>
$flag ~~ /^ \-? \d+/ or die "$line: Bad flag format";
<span style="color: #004080;">sequence</span> <span style="color: #000000;">r</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">scanf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">li</span><span style="color: #0000FF;">,</span><span style="color: #000000;">fmt</span><span style="color: #0000FF;">)</span>
$flag < 1 and $all-flags-okay = False;
<span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">r</span><span style="color: #0000FF;">)!=</span><span style="color: #000000;">1</span> <span style="color: #008080;">then</span>
}
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"bad line [%d]:%s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">i</span><span style="color: #0000FF;">,</span><span style="color: #000000;">li</span><span style="color: #0000FF;">})</span>
$all-flags-okay and ++$good-records;
<span style="color: #008080;">else</span>
}
<span style="color: #000000;">curr</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">r</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">][</span><span style="color: #000000;">1</span><span style="color: #0000FF;">..</span><span style="color: #000000;">3</span><span style="color: #0000FF;">]</span>

<span style="color: #008080;">if</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">></span><span style="color: #000000;">1</span> <span style="color: #008080;">and</span> <span style="color: #000000;">curr</span><span style="color: #0000FF;">=</span><span style="color: #000000;">last</span> <span style="color: #008080;">then</span>
say 'Good records: ', $good-records;
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"duplicate line for %04d/%02d/%02d\n"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">last</span><span style="color: #0000FF;">)</span>
say 'Repeated timestamps:';
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
say ' ', $_ for grep { %dates{$_} > 1 }, sort keys %dates;</lang>
<span style="color: #000000;">last</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">curr</span>

<span style="color: #000000;">all_good</span> <span style="color: #0000FF;">+=</span> <span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">sq_le</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">extract</span><span style="color: #0000FF;">(</span><span style="color: #000000;">r</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">][</span><span style="color: #000000;">4</span><span style="color: #0000FF;">..$],</span><span style="color: #000000;">extset</span><span style="color: #0000FF;">),</span><span style="color: #000000;">0</span><span style="color: #0000FF;">))=</span><span style="color: #000000;">0</span>
Output:
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<pre>Good records: 5017
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
Repeated timestamps:
1990-03-25
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Valid records %d of %d total\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">all_good</span><span style="color: #0000FF;">,</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">)})</span>
1991-03-31
1992-03-29
<span style="color: #0000FF;">?</span><span style="color: #008000;">"done"</span>
1993-03-28
<span style="color: #0000FF;">{}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">wait_key</span><span style="color: #0000FF;">()</span>
1995-03-26</pre>
<!--</syntaxhighlight>-->
The first version demonstrates that you can program Perl&nbsp;6 almost like Perl&nbsp;5. Here's a more idiomatic Perl&nbsp;6 version that runs several times faster:
{{out}}
<lang perl6>my $good-records;
<pre>
my $line;
duplicate line for 1990/03/25
my %dates;
duplicate line for 1991/03/31

duplicate line for 1992/03/29
for lines() {
duplicate line for 1993/03/28
$line++;
duplicate line for 1995/03/26
/ ^
Valid records 5017 of 5471 total
(\d ** 4 '-' \d\d '-' \d\d)
</pre>
[ \h+ \d+'.'\d+ \h+ ('-'?\d+) ] ** 24
$ /
or note "Bad format at line $line" and next;
%dates.push: $0 => $line;
$good-records++ if $1.all >= 1;
}

say "$good-records good records out of $line total";

say 'Repeated timestamps (with line numbers):';
.say for sort %dates.pairs.grep: *.value.elems > 1;</lang>
Output:
<pre>5017 good records out of 5471 total
Repeated timestamps (with line numbers):
1990-03-25 84 85
1991-03-31 455 456
1992-03-29 819 820
1993-03-28 1183 1184
1995-03-26 1910 1911</pre>
Note how this version does validation with a single Perl&nbsp;6 regex that is much more readable than the typical regex, and arguably expresses the data structure more straightforwardly.
Here we use normal quotes for literals, and <tt>\h</tt> for horizontal whitespace.

Variables like <tt>$good-record</tt> that are going to be autoincremented do not need to be initialized. (Perl&nbsp;6 allows hyphens in variable names, as you can see.)

The <tt>.push</tt> method on a hash is magical and loses no information; if a duplicate key is found in the pushed pair, an array of values is automatically created of the old value and the new value pushed. Hence we can easily track all the lines that a particular duplicate occurred at.

The <tt>.all</tt> method does "junctional" logic: it autothreads through comparators as any English speaker would expect. Junctions can also short-circuit as soon as they find a value that doesn't match, and the evaluation order is up to the computer, so it can be optimized or parallelized.

The final line simply greps out the pairs from the hash whose value is an array with more than 1 element. (Those values that are not arrays nevertheless have a <tt>.elems</tt> method that always reports <tt>1</tt>.) The <tt>.pairs</tt> is merely there for clarity; grepping a hash directly has the same effect.
Note that we sort the pairs after we've grepped them, not before; this works fine in Perl&nbsp;6, sorting on the key and value as primary and secondary keys. Finally, pairs and arrays provide a default print format that is sufficient without additional formatting in this case.


=={{header|PHP}}==
=={{header|PHP}}==
<lang php>$handle = fopen("readings.txt", "rb");
<syntaxhighlight lang="php">$handle = fopen("readings.txt", "rb");
$missformcount = 0;
$missformcount = 0;
$totalcount = 0;
$totalcount = 0;
Line 1,479: Line 2,189:
foreach ($duplicates as $key => $val){
foreach ($duplicates as $key => $val){
echo $val . ' at Line : ' . $key . '<br>';
echo $val . ' at Line : ' . $key . '<br>';
}</lang>
}</syntaxhighlight>
<pre>Valid records 5017 of 5471 total
<pre>Valid records 5017 of 5471 total
Duplicates :
Duplicates :
Line 1,487: Line 2,197:
1993-03-28 at Line : 1184
1993-03-28 at Line : 1184
1995-03-26 at Line : 1911</pre>
1995-03-26 at Line : 1911</pre>

=={{header|Picat}}==
<syntaxhighlight lang="picat">import util.

go =>
Readings = [split(Record) : Record in read_file_lines("readings.txt")],
DateStamps = new_map(),
GoodReadings = 0,
foreach({Rec,Id} in zip(Readings,1..Readings.length))
if Rec.length != 49 then printf("Entry %d has bad_length %d\n", Id, Rec.length) end,
Date = Rec[1],
if DateStamps.has_key(Date) then
printf("Entry %d (date %w) is a duplicate of entry %w\n", Id, Date, DateStamps.get(Date))
else
if sum([1: I in 3..2..49, check_field(Rec[I])]) == 0 then
GoodReadings := GoodReadings + 1
end
end,
DateStamps.put(Date, Id)
end,
nl,
printf("Total readings: %d\n",Readings.len),
printf("Good readings: %d\n",GoodReadings),
nl.

check_field(Field) =>
Field == "-2" ; Field == "-1" ; Field == "0".</syntaxhighlight>

{{out}}
<pre>Entry 85 (date 1990-03-25) is a duplicate of entry 84
Entry 456 (date 1991-03-31) is a duplicate of entry 455
Entry 820 (date 1992-03-29) is a duplicate of entry 819
Entry 1184 (date 1993-03-28) is a duplicate of entry 1183
Entry 1911 (date 1995-03-26) is a duplicate of entry 1910

Total readings: 5471
Good readings: 5013</pre>


=={{header|PicoLisp}}==
Put the following into an executable file "checkReadings":
<syntaxhighlight lang="picolisp">#!/usr/bin/picolisp /usr/lib/picolisp/lib.l

(load "@lib/misc.l")

(in (opt)
(until (eof)
(let Lst (split (line) "^I")
(unless
(and
(= 49 (length Lst)) # Check total length
($dat (car Lst) "-") # Check for valid date
(fully # Check data format
'((L F)
(if F # Alternating:
(format L 3) # Number
(>= 9 (format L) -9) ) ) # or flag
(cdr Lst)
'(T NIL .) ) )
(prinl "Bad line format: " (glue " " Lst))
(bye 1) ) ) ) )

(bye)</syntaxhighlight>
Then it can be called as
<pre>$ ./checkReadings readings.txt</pre>


=={{header|PL/I}}==
=={{header|PL/I}}==
<lang pli>
<syntaxhighlight lang="pli">
/* To process readings produced by automatic reading stations. */
/* To process readings produced by automatic reading stations. */


Line 1,542: Line 2,317:
put skip list ('There were ' || k-faulty || ' good readings' );
put skip list ('There were ' || k-faulty || ' good readings' );
end check;
end check;
</syntaxhighlight>
</lang>

=={{header|PicoLisp}}==
Put the following into an executable file "checkReadings":
<lang PicoLisp>#!/usr/bin/picolisp /usr/lib/picolisp/lib.l

(load "@lib/misc.l")

(in (opt)
(until (eof)
(let Lst (split (line) "^I")
(unless
(and
(= 49 (length Lst)) # Check total length
($dat (car Lst) "-") # Check for valid date
(fully # Check data format
'((L F)
(if F # Alternating:
(format L 3) # Number
(>= 9 (format L) -9) ) ) # or flag
(cdr Lst)
'(T NIL .) ) )
(prinl "Bad line format: " (glue " " Lst))
(bye 1) ) ) ) )

(bye)</lang>
Then it can be called as
<pre>$ ./checkReadings readings.txt</pre>


=={{header|PowerShell}}==
=={{header|PowerShell}}==
<lang powershell>$dateHash = @{}
<syntaxhighlight lang="powershell">$dateHash = @{}
$goodLineCount = 0
$goodLineCount = 0
get-content c:\temp\readings.txt |
get-content c:\temp\readings.txt |
Line 1,595: Line 2,343:
}
}
[string]$goodLineCount + " good lines"
[string]$goodLineCount + " good lines"
</syntaxhighlight>
</lang>


Output:
Output:
Line 1,606: Line 2,354:


An alternative using regular expression syntax:
An alternative using regular expression syntax:
<lang powershell>
<syntaxhighlight lang="powershell">
$dateHash = @{}
$dateHash = @{}
$goodLineCount = 0
$goodLineCount = 0
Line 1,627: Line 2,375:
}
}
[string]$goodLineCount + " good lines"
[string]$goodLineCount + " good lines"
</syntaxhighlight>
</lang>


Output:
Output:
Line 1,638: Line 2,386:
5017 good lines
5017 good lines
</pre>
</pre>

=={{header|PureBasic}}==
=={{header|PureBasic}}==
Using regular expressions.
Using regular expressions.
<lang PureBasic>Define filename.s = "readings.txt"
<syntaxhighlight lang="purebasic">Define filename.s = "readings.txt"
#instrumentCount = 24
#instrumentCount = 24


Line 1,711: Line 2,460:
CloseConsole()
CloseConsole()
EndIf
EndIf
EndIf</lang>
EndIf</syntaxhighlight>
Sample output:
Sample output:
<pre>Duplicate date: 1990-03-25 occurs on lines 85 and 84.
<pre>Duplicate date: 1990-03-25 occurs on lines 85 and 84.
Line 1,722: Line 2,471:


=={{header|Python}}==
=={{header|Python}}==
<lang python>import re
<syntaxhighlight lang="python">import re
import zipfile
import zipfile
import StringIO
import StringIO
Line 1,762: Line 2,511:
#readings = StringIO.StringIO(zfs.read('readings.txt'))
#readings = StringIO.StringIO(zfs.read('readings.txt'))
readings = open('readings.txt','r')
readings = open('readings.txt','r')
munge2(readings)</lang>
munge2(readings)</syntaxhighlight>
The results indicate 5013 good records, which differs from the Awk implementation. The final few lines of the output are as follows
The results indicate 5013 good records, which differs from the Awk implementation. The final few lines of the output are as follows
<pre style="height:10ex;overflow:scroll">
<pre style="height:10ex;overflow:scroll">
Line 1,781: Line 2,530:
* Generate mostly summary information that is easier to compare to other solutions.
* Generate mostly summary information that is easier to compare to other solutions.


<lang python>import re
<syntaxhighlight lang="python">import re
import zipfile
import zipfile
import StringIO
import StringIO
Line 1,825: Line 2,574:
readings = open('readings.txt','r')
readings = open('readings.txt','r')
munge2(readings)</lang>
munge2(readings)</syntaxhighlight>
<pre>bash$ /cygdrive/c/Python26/python munge2.py
<pre>bash$ /cygdrive/c/Python26/python munge2.py
Duplicate dates:
Duplicate dates:
Line 1,843: Line 2,592:


=={{header|R}}==
=={{header|R}}==
<lang R># Read in data from file
<syntaxhighlight lang="r"># Read in data from file
dfr <- read.delim("d:/readings.txt", colClasses=c("character", rep(c("numeric", "integer"), 24)))
dfr <- read.delim("d:/readings.txt", colClasses=c("character", rep(c("numeric", "integer"), 24)))
dates <- strptime(dfr[,1], "%Y-%m-%d")
dates <- strptime(dfr[,1], "%Y-%m-%d")
Line 1,855: Line 2,604:
# Number of rows with no bad values
# Number of rows with no bad values
flags <- as.matrix(dfr[,seq(3,49,2)])>0
flags <- as.matrix(dfr[,seq(3,49,2)])>0
sum(apply(flags, 1, all))</lang>
sum(apply(flags, 1, all))</syntaxhighlight>


=={{header|Racket}}==
=={{header|Racket}}==
<lang racket>#lang racket
<syntaxhighlight lang="racket">#lang racket
(read-decimal-as-inexact #f)
(read-decimal-as-inexact #f)
;; files to read is a sequence, so it could be either a list or vector of files
;; files to read is a sequence, so it could be either a list or vector of files
Line 1,907: Line 2,656:


(printf "~a records have good readings for all instruments~%"
(printf "~a records have good readings for all instruments~%"
(text-processing/2 (current-command-line-arguments)))</lang>
(text-processing/2 (current-command-line-arguments)))</syntaxhighlight>
Example session:
Example session:
<pre>$ racket 2.rkt readings/readings.txt
<pre>$ racket 2.rkt readings/readings.txt
Line 1,916: Line 2,665:
duplicate datestamp: 1995-03-26 at line: 1911 (first seen at: 1910)
duplicate datestamp: 1995-03-26 at line: 1911 (first seen at: 1910)
5013 records have good readings for all instruments</pre>
5013 records have good readings for all instruments</pre>

=={{header|Raku}}==
(formerly Perl 6)
{{trans|Perl}}
{{works with|Rakudo|2018.03}}

This version does validation with a single Raku regex that is much more readable than the typical regex, and arguably expresses the data structure more straightforwardly.
Here we use normal quotes for literals, and <tt>\h</tt> for horizontal whitespace.

Variables like <tt>$good-record</tt> that are going to be autoincremented do not need to be initialized.

The <tt>.push</tt> method on a hash is magical and loses no information; if a duplicate key is found in the pushed pair, an array of values is automatically created of the old value and the new value pushed. Hence we can easily track all the lines that a particular duplicate occurred at.

The <tt>.all</tt> method does "junctional" logic: it autothreads through comparators as any English speaker would expect. Junctions can also short-circuit as soon as they find a value that doesn't match, and the evaluation order is up to the computer, so it can be optimized or parallelized.

The final line simply greps out the pairs from the hash whose value is an array with more than 1 element. (Those values that are not arrays nevertheless have a <tt>.elems</tt> method that always reports <tt>1</tt>.) The <tt>.pairs</tt> is merely there for clarity; grepping a hash directly has the same effect.
Note that we sort the pairs after we've grepped them, not before; this works fine in Raku, sorting on the key and value as primary and secondary keys. Finally, pairs and arrays provide a default print format that is sufficient without additional formatting in this case.

<syntaxhighlight lang="raku" line>my $good-records;
my $line;
my %dates;

for lines() {
$line++;
/ ^
(\d ** 4 '-' \d\d '-' \d\d)
[ \h+ \d+'.'\d+ \h+ ('-'?\d+) ] ** 24
$ /
or note "Bad format at line $line" and next;
%dates.push: $0 => $line;
$good-records++ if $1.all >= 1;
}

say "$good-records good records out of $line total";

say 'Repeated timestamps (with line numbers):';
.say for sort %dates.pairs.grep: *.value.elems > 1;</syntaxhighlight>
Output:
<pre>5017 good records out of 5471 total
Repeated timestamps (with line numbers):
1990-03-25 => [84 85]
1991-03-31 => [455 456]
1992-03-29 => [819 820]
1993-03-28 => [1183 1184]
1995-03-26 => [1910 1911]</pre>


=={{header|REXX}}==
=={{header|REXX}}==
This REXX program process the file mentioned in "text processing 1" and does further valiidate on the dates, flags, and data.
This REXX program process the file mentioned in "text processing 1" and does further validate on the dates, flags, and data.
<br><br>
<br><br>
Some of the checks performed are:
Some of the checks performed are:
* checks for duplicated date records.
::* &nbsp; checks for duplicated date records.
* checks for a bad date (YYYY-MM-DD) format, among:
::* &nbsp; checks for a bad date (YYYY-MM-DD) format, among:
* wrong length
::* &nbsp; wrong length
* year > current year
::* &nbsp; year > current year
* year < 1970 (to allow for posthumous data)
::* &nbsp; year < 1970 (to allow for posthumous data)
* mm < 1 or mm > 12
::* &nbsp; mm < 1 or mm > 12
* dd < 1 or dd > days for the month
::* &nbsp; dd < 1 or dd > days for the month
* yyyy, dd, mm isn't numeric
::* &nbsp; yyyy, dd, mm isn't numeric
* missing data (or flags)
::* &nbsp; missing data (or flags)
* flag isn't an integer
::* &nbsp; flag isn't an integer
* flag contains a decimal point
::* &nbsp; flag contains a decimal point
* data isn't numeric
::* &nbsp; data isn't numeric
In addition, all of the presented numbers (may) have commas inserted.
In addition, all of the presented numbers may have commas inserted.
<br><br>
<br><br>
The program has (negated) code to write the report to a file in addition to the console.
The program has (negated) code to write the report to a file in addition to the console.
<lang rexx>/*REXX program to process instrument data from a data file. */
<syntaxhighlight lang="rexx">/*REXX program to process instrument data from a data file. */
numeric digits 20 /*allow for bigger numbers. */
numeric digits 20 /*allow for bigger numbers. */
ifid='READINGS.TXT' /*the input file. */
ifid='READINGS.TXT' /*name of the input file. */
ofid='READINGS.OUT' /*the outut file. */
ofid='READINGS.OUT' /* " " " output " */
grandSum=0 /*grand sum of whole file. */
grandSum=0 /*grand sum of the whole file. */
grandflg=0 /*grand num of flagged data. */
grandFlg=0 /*grand number of flagged data. */
grandOKs=0
grandOKs=0
longFlag=0 /*longest period of flagged data.*/
Lflag=0 /*longest period of flagged data. */
contFlag=0 /*longest continous flagged data.*/
Cflag=0 /*longest continuous flagged data. */
oldDate =0 /*placeholder of penutilmate date*/
oldDate =0 /*placeholder of penultimate date. */
w =16 /*width of fields when displayed.*/
w =16 /*width of fields when displayed. */
dupDates=0 /*count of duplicated timestamps.*/
dupDates=0 /*count of duplicated timestamps. */
badflags=0 /*count of bad flags (¬ integer).*/
badFlags=0 /*count of bad flags (not integer). */
badDates=0 /*count of bad dates (bad format)*/
badDates=0 /*count of bad dates (bad format). */
badData =0 /*count of bad datas (¬ numeric).*/
badData =0 /*count of bad data (not numeric). */
ignoredR=0 /*count of ignored records (bad).*/
ignoredR=0 /*count of ignored records, bad records*/
maxInstruments=24 /*maximum number of instruments. */
maxInstruments=24 /*maximum number of instruments. */
yyyyCurr=right(date(),4) /*get the current year (today). */
yyyyCurr=right(date(),4) /*get the current year (today). */
monDD. =31 /*number of days in every month. */
monDD. =31 /*number of days in every month. */
/*February is figured on the fly.*/
/*# days in Feb. is figured on the fly.*/
monDD.4 =30
monDD.4 =30
monDD.6 =30
monDD.6 =30
Line 1,961: Line 2,755:
monDD.11=30
monDD.11=30


do records=1 while lines(ifid)\==0 /*read until finished. */
do records=1 while lines(ifid)\==0 /*read until finished. */
rec=linein(ifid) /*read the next record (line). */
rec=linein(ifid) /*read the next record (line). */
parse var rec datestamp Idata /*pick off the dateStamp & data. */
parse var rec datestamp Idata /*pick off the the dateStamp and data. */
if datestamp==oldDate then do /*found a duplicate timestamp. */
if datestamp==oldDate then do /*found a duplicate timestamp. */
dupDates=dupDates+1 /*bump the counter.*/
dupDates=dupDates+1 /*bump the dupDate counter*/
call sy datestamp copies('~',30),
call sy datestamp copies('~',30),
'is a duplicate of the',
'is a duplicate of the',
"previous datestamp."
"previous datestamp."
ignoredR=ignoredR+1 /*bump ignoredRecs.*/
ignoredR=ignoredR+1 /*bump # of ignoredRecs.*/
iterate /*ignore this duplicate record. */
iterate /*ignore this duplicate record. */
end
end


parse var datestamp yyyy '-' mm '-' dd /*obtain YYYY, MM, and DD. */
parse var datestamp yyyy '-' mm '-' dd /*obtain YYYY, MM, and the DD. */
monDD.2=28+leapyear(yyyy) /*how long is February in YYYY ? */
monDD.2=28+leapyear(yyyy) /*how long is February in year YYYY ? */
/*check for various bad formats. */
/*check for various bad formats. */
if verify(yyyy||mm||dd,1234567890)\==0 |,
if verify(yyyy||mm||dd,1234567890)\==0 |,
length(datestamp)\==10 |,
length(datestamp)\==10 |,
Line 1,983: Line 2,777:
yyyy<1970 |,
yyyy<1970 |,
yyyy>yyyyCurr |,
yyyy>yyyyCurr |,
mm=0 | dd=0 |,
mm=0 | dd=0 |,
mm>12 | dd>monDD.mm then do
mm>12 | dd>monDD.mm then do
badDates=badDates+1
badDates=badDates+1
call sy datestamp copies('~'),
call sy datestamp copies('~'),
'has an illegal format.'
'has an illegal format.'
ignoredR=ignoredR+1 /*bump ignoredRecs.*/
ignoredR=ignoredR+1 /*bump number ignoredRecs.*/
iterate /*ignore this bad date record. */
iterate /*ignore this bad record. */
end
end
oldDate=datestamp /*save datestamp for next read. */
oldDate=datestamp /*save datestamp for the next read. */
sum=0
sum=0
flg=0
flg=0
OKs=0
OKs=0


do j=1 until Idata='' /*process the instrument data. */
do j=1 until Idata='' /*process the instrument data. */
parse var Idata data.j flag.j Idata
parse var Idata data.j flag.j Idata


if pos('.',flag.j)\==0 |, /*flag have a decimal point -or-*/
if pos('.',flag.j)\==0 |, /*does flag have a decimal point -or- */
\datatype(flag.j,'W') then do /*is the flag not a whole number?*/
\datatype(flag.j,'W') then do /* ··· is the flag not a whole number? */
badflags=badflags+1 /*bump counter.*/
badFlags=badFlags+1 /*bump badFlags counter*/
call sy datestamp copies('~'),
call sy datestamp copies('~'),
'instrument' j "has a bad flag:",
'instrument' j "has a bad flag:",
flag.j
flag.j
iterate /*ignore it & it's data.*/
iterate /*ignore it and it's data. */
end
end


if \datatype(data.j,'N') then do /*is the flag not a whole number?*/
if \datatype(data.j,'N') then do /*is the flag not a whole number?*/
badData=badData+1 /*bump counter.*/
badData=badData+1 /*bump counter.*/
call sy datestamp copies('~'),
call sy datestamp copies('~'),
'instrument' j "has bad data:",
'instrument' j "has bad data:",
data.j
data.j
iterate /*ignore it & it's flag.*/
iterate /*ignore it & it's flag.*/
end
end


if flag.j>0 then do /*if good data, ... */
if flag.j>0 then do /*if good data, ~~~ */
OKs=OKs+1
OKs=OKs+1
sum=sum+data.j
sum=sum+data.j
if contFlag>longFlag then do
if Cflag>Lflag then do
longdate=datestamp
Ldate=datestamp
longFlag=contFlag
Lflag=Cflag
end
end
contFlag=0
Cflag=0
end
end
else do /*flagged data ... */
else do /*flagged data ~~~ */
flg=flg+1
flg=flg+1
contFlag=contFlag+1
Cflag=Cflag+1
end
end
end /*j*/
end /*j*/


if j>maxInstruments then do
if j>maxInstruments then do
badData=badData+1 /*bump counter.*/
badData=badData+1 /*bump the badData counter.*/
call sy datestamp copies('~'),
call sy datestamp copies('~'),
'too many instrument datum'
'too many instrument datum'
end
end


if OKs\==0 then avg=format(sum/OKs,,3)
if OKs\==0 then avg=format(sum/OKs,,3)
else avg='[n/a]'
else avg='[n/a]'
grandOKs=grandOKs+OKs
grandOKs=grandOKs+OKs
_=right(comma(avg),w)
_=right(commas(avg),w)
grandSum=grandSum+sum
grandSum=grandSum+sum
grandFlg=grandFlg+flg
grandFlg=grandFlg+flg
Line 2,047: Line 2,841:
end /*records*/
end /*records*/


records=records-1 /*adjust for reading end-of-file.*/
records=records-1 /*adjust for reading the end─of─file. */
if grandOKs\==0 then grandAvg=format(grandsum/grandOKs,,3)
if grandOKs\==0 then grandAvg=format(grandsum/grandOKs,,3)
else grandAvg='[n/a]'
else grandAvg='[n/a]'
call sy
call sy
call sy copies('=',60)
call sy copies('=',60)
call sy ' records read:' right(comma(records ),w)
call sy ' records read:' right(commas(records ),w)
call sy ' records ignored:' right(comma(ignoredR),w)
call sy ' records ignored:' right(commas(ignoredR),w)
call sy ' grand sum:' right(comma(grandSum),w+4)
call sy ' grand sum:' right(commas(grandSum),w+4)
call sy ' grand average:' right(comma(grandAvg),w+4)
call sy ' grand average:' right(commas(grandAvg),w+4)
call sy ' grand OK data:' right(comma(grandOKs),w)
call sy ' grand OK data:' right(commas(grandOKs),w)
call sy ' grand flagged:' right(comma(grandFlg),w)
call sy ' grand flagged:' right(commas(grandFlg),w)
call sy ' duplicate dates:' right(comma(dupDates),w)
call sy ' duplicate dates:' right(commas(dupDates),w)
call sy ' bad dates:' right(comma(badDates),w)
call sy ' bad dates:' right(commas(badDates),w)
call sy ' bad data:' right(comma(badData ),w)
call sy ' bad data:' right(commas(badData ),w)
call sy ' bad flags:' right(comma(badflags),w)
call sy ' bad flags:' right(commas(badFlags),w)
if Lflag\==0 then call sy ' longest flagged:' right(commas(LFlag),w) " ending at " Ldate
if longFlag\==0 then
call sy ' longest flagged:' right(comma(longFlag),w) " ending at " longdate
call sy copies('=',60)
call sy copies('=',60)
exit /*stick a fork in it, we're all done.*/
call sy
/*────────────────────────────────────────────────────────────────────────────*/
exit /*stick a fork in it, we're done.*/
commas: procedure; parse arg _; n=_'.9'; #=123456789; b=verify(n,#,"M")
/*──────────────────────────────────LEAPYEAR subroutine─────────────────*/
e=verify(n,#'0',,verify(n,#"0.",'M'))-4
leapyear: procedure; arg y /*year could be: Y, YY, YYY, YYYY*/
do j=e to b by -3; _=insert(',',_,j); end /*j*/; return _
if length(y)==2 then y=left(right(date(),4),2)y /*adjust for YY year.*/
/*────────────────────────────────────────────────────────────────────────────*/
if y//4\==0 then return 0 /* not ≈ by 4? Not a leapyear.*/
return y//100\==0 | y//400==0 /*apply 100 and 400 year rule. */
leapyear: procedure; arg y /*year could be: Y, YY, YYY, or YYYY*/
if length(y)==2 then y=left(right(date(),4),2)y /*adjust for YY year.*/
/*──────────────────────────────────SY subroutine───────────────────────*/
if y//4\==0 then return 0 /* not divisible by 4? Not a leapyear*/
sy: procedure; parse arg stuff; say stuff
return y//100\==0 | y//400==0 /*apply the 100 and the 400 year rule.*/
if 1==0 then call lineout ofid,stuff
/*────────────────────────────────────────────────────────────────────────────*/
return
sy: say arg(1); call lineout ofid,arg(1); return</syntaxhighlight>
/*──────────────────────────────────COMMA subroutine────────────────────*/
'''output''' &nbsp; when using the default input file:
comma: procedure; parse arg _,c,p,t;arg ,cu;c=word(c ",",1)
<pre style="height:35ex">
if cu=='BLANK' then c=' ';o=word(p 3,1);p=abs(o);t=word(t 999999999,1)
if \datatype(p,'W')|\datatype(t,'W')|p==0|arg()>4 then return _;n=_'.9'
#=123456789;k=0;if o<0 then do;b=verify(_,' ');if b==0 then return _
e=length(_)-verify(reverse(_),' ')+1;end;else do;b=verify(n,#,"M")
e=verify(n,#'0',,verify(n,#"0.",'M'))-p-1;end
do j=e to b by -p while k<t;_=insert(c,_,j);k=k+1;end;return _</lang>
'''output'''
<pre style="height:35ex;overflow:scroll">
Line 2,113: Line 2,899:


=={{header|Ruby}}==
=={{header|Ruby}}==
<lang ruby>require 'set'
<syntaxhighlight lang="ruby">require 'set'


def munge2(readings, debug=false)
def munge2(readings, debug=false)
Line 2,165: Line 2,951:
open('readings.txt','r') do |readings|
open('readings.txt','r') do |readings|
munge2(readings)
munge2(readings)
end</lang>
end</syntaxhighlight>


=={{header|Scala}}==
=={{header|Scala}}==
{{works with|Scala|2.8}}
{{works with|Scala|2.8}}
<lang scala>object DataMunging2 {
<syntaxhighlight lang="scala">object DataMunging2 {
import scala.io.Source
import scala.io.Source
import scala.collection.immutable.{TreeMap => Map}
import scala.collection.immutable.{TreeMap => Map}
Line 2,207: Line 2,993:
dateMap.valuesIterable.sum))
dateMap.valuesIterable.sum))
}
}
}</lang>
}</syntaxhighlight>


Sample output:
Sample output:
Line 2,223: Line 3,009:
Invalid data records: 454
Invalid data records: 454
Total records: 5471
Total records: 5471
</pre>

=={{header|Sidef}}==
{{trans|Raku}}
<syntaxhighlight lang="ruby">var good_records = 0;
var dates = Hash();

ARGF.each { |line|
var m = /^(\d\d\d\d-\d\d-\d\d)((?:\h+\d+\.\d+\h+-?\d+){24})\s*$/.match(line);
m || (warn "Bad format at line #{$.}"; next);
dates{m[0]} := 0 ++;
var i = 0;
m[1].words.all{|n| i++.is_even || (n.to_num >= 1) } && ++good_records;
}

say "#{good_records} good records out of #{$.} total";
say 'Repeated timestamps:';
say dates.to_a.grep{ .value > 1 }.map { .key }.sort.join("\n");</syntaxhighlight>
{{out}}
<pre>
$ sidef script.sf < readings.txt
5017 good records out of 5471 total
Repeated timestamps:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
</pre>

=={{header|Snobol4}}==

Developed using the Snobol4 dialect Spitbol for Linux, version 4.0

<syntaxhighlight lang="snobol4">* Read text/2

v = array(24)
f = array(24)
tos = char(9) " " ;* break characters are both tab and space
pat1 = break(tos) . dstamp
pat2 = span(tos) break(tos) . *v[i] span(tos) (break(tos) | (len(1) rem)) . *f[i]
rowcount = 0
hold_dstamp = ""
num_bad_rows = 0
num_invalid_rows = 0

in0
row = input :f(endinput)
rowcount = rowcount + 1
row ? pat1 = :f(invalid_row)

* duplicated datestamp?
* if dstamp = hold_dstamp then duplicated
hold_dstamp = differ(hold_dstamp,dstamp) dstamp :s(nodup)
output = dstamp ": datestamp at row " rowcount " duplicates datestamp at " rowcount - 1
nodup

i = 1
in1
row ? pat2 = :f(invalid_row)
i = lt(i,24) i + 1 :s(in1)

* Is this a goodrow?
* if any flag is < 1 then row has bad data
c = 0
goodrow
c = lt(c,24) c + 1 :f(goodrow2)
num_bad_rows = lt(f[c],1) num_bad_rows + 1 :s(goodrow2)f(goodrow)
goodrow2

:(in0)
invalid_row
num_invalid_rows = num_invalid_rows + 1
:(in0)
endinput
output =
output = "Total number of rows : " rowcount
output = "Total number of rows with invalid format: " num_invalid_rows
output = "Total number of rows with bad data : " num_bad_rows
output = "Total number of good rows : " rowcount - num_invalid_rows - num_bad_rows

end

</syntaxhighlight>
{{out}}
<pre>1990-03-25: datestamp at row 85 duplicates datestamp at 84
1991-03-31: datestamp at row 456 duplicates datestamp at 455
1992-03-29: datestamp at row 820 duplicates datestamp at 819
1993-03-28: datestamp at row 1184 duplicates datestamp at 1183
1995-03-26: datestamp at row 1911 duplicates datestamp at 1910

Total number of rows : 5471
Total number of rows with invalid format: 0
Total number of rows with bad data : 454
Total number of good rows : 5017
</pre>
</pre>


=={{header|Tcl}}==
=={{header|Tcl}}==


<lang tcl>set data [lrange [split [read [open "readings.txt" "r"]] "\n"] 0 end-1]
<syntaxhighlight lang="tcl">set data [lrange [split [read [open "readings.txt" "r"]] "\n"] 0 end-1]
set total [llength $data]
set total [llength $data]
set correct $total
set correct $total
Line 2,251: Line 3,134:


puts "$correct records with good readings = [expr $correct * 100.0 / $total]%"
puts "$correct records with good readings = [expr $correct * 100.0 / $total]%"
puts "Total records: $total"</lang>
puts "Total records: $total"</syntaxhighlight>
<pre>$ tclsh munge2.tcl
<pre>$ tclsh munge2.tcl
Duplicate datestamp: 1990-03-25
Duplicate datestamp: 1990-03-25
Line 2,266: Line 3,149:
To demonstate a different method to iterate over the file, and different ways to verify data types:
To demonstate a different method to iterate over the file, and different ways to verify data types:


<lang tcl>set total [set good 0]
<syntaxhighlight lang="tcl">set total [set good 0]
array set seen {}
array set seen {}
set fh [open readings.txt]
set fh [open readings.txt]
Line 2,304: Line 3,187:


puts "total: $total"
puts "total: $total"
puts [format "good: %d = %5.2f%%" $good [expr {100.0 * $good / $total}]]</lang>
puts [format "good: %d = %5.2f%%" $good [expr {100.0 * $good / $total}]]</syntaxhighlight>
Results:
Results:
<pre>duplicate date on line 85: 1990-03-25
<pre>duplicate date on line 85: 1990-03-25
Line 2,317: Line 3,200:
compiled and run in a single step, with the input file accessed as a list of strings
compiled and run in a single step, with the input file accessed as a list of strings
pre-declared in readings_dot_txt
pre-declared in readings_dot_txt
<lang Ursala>#import std
<syntaxhighlight lang="ursala">#import std
#import nat
#import nat


Line 2,330: Line 3,213:
#show+
#show+


main = valid_format?(^C/good_readings duplicate_dates,-[invalid format]-!) readings</lang>
main = valid_format?(^C/good_readings duplicate_dates,-[invalid format]-!) readings</syntaxhighlight>
output:
output:
<pre>5017 good readings
<pre>5017 good readings
Line 2,339: Line 3,222:
1991-03-31
1991-03-31
1990-03-25</pre>
1990-03-25</pre>

=={{header|VBScript}}==
<syntaxhighlight lang="vb">Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(objFSO.GetParentFolderName(WScript.ScriptFullName) &_
"\readings.txt",1)
Set objDateStamp = CreateObject("Scripting.Dictionary")

Total_Records = 0
Valid_Records = 0
Duplicate_TimeStamps = ""

Do Until objFile.AtEndOfStream
line = objFile.ReadLine
If line <> "" Then
token = Split(line,vbTab)
If objDateStamp.Exists(token(0)) = False Then
objDateStamp.Add token(0),""
Total_Records = Total_Records + 1
If IsValid(token) Then
Valid_Records = Valid_Records + 1
End If
Else
Duplicate_TimeStamps = Duplicate_TimeStamps & token(0) & vbCrLf
Total_Records = Total_Records + 1
End If
End If
Loop

Function IsValid(arr)
IsValid = True
Bad_Readings = 0
n = 1
Do While n <= UBound(arr)
If n + 1 <= UBound(arr) Then
If CInt(arr(n+1)) < 1 Then
Bad_Readings = Bad_Readings + 1
End If
End If
n = n + 2
Loop
If Bad_Readings > 0 Then
IsValid = False
End If
End Function

WScript.StdOut.Write "Total Number of Records = " & Total_Records
WScript.StdOut.WriteLine
WScript.StdOut.Write "Total Valid Records = " & Valid_Records
WScript.StdOut.WriteLine
WScript.StdOut.Write "Duplicate Timestamps:"
WScript.StdOut.WriteLine
WScript.StdOut.Write Duplicate_TimeStamps
WScript.StdOut.WriteLine

objFile.Close
Set objFSO = Nothing</syntaxhighlight>

{{Out}}
<pre>
Total Number of Records = 5471
Total Valid Records = 5013
Duplicate Timestamps:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
</pre>


=={{header|Vedit macro language}}==
=={{header|Vedit macro language}}==
Line 2,347: Line 3,298:
* Reads flag value and checks if it is positive
* Reads flag value and checks if it is positive
* Requires 24 value/flag pairs on each line
* Requires 24 value/flag pairs on each line
<lang vedit>#50 = Buf_Num // Current edit buffer (source data)
<syntaxhighlight lang="vedit">#50 = Buf_Num // Current edit buffer (source data)
File_Open("|(PATH_ONLY)\output.txt")
File_Open("|(PATH_ONLY)\output.txt")
#51 = Buf_Num // Edit buffer for output file
#51 = Buf_Num // Edit buffer for output file
Line 2,394: Line 3,345:
IT("Date format errors: ") Num_Ins(#14)
IT("Date format errors: ") Num_Ins(#14)
IT("Invalid data records:") Num_Ins(#15)
IT("Invalid data records:") Num_Ins(#15)
IT("Total records: ") Num_Ins(#12)</lang>
IT("Total records: ") Num_Ins(#12)</syntaxhighlight>
Sample output:
Sample output:
<lang vedit>1990-03-25: duplicate record at 85
<syntaxhighlight lang="vedit">1990-03-25: duplicate record at 85
1991-03-31: duplicate record at 456
1991-03-31: duplicate record at 456
1992-03-29: duplicate record at 820
1992-03-29: duplicate record at 820
Line 2,406: Line 3,357:
Date format errors: 0
Date format errors: 0
Invalid data records: 454
Invalid data records: 454
Total records: 5471</lang>
Total records: 5471</syntaxhighlight>

=={{header|Wren}}==
{{trans|Kotlin}}
{{libheader|Wren-pattern}}
{{libheader|Wren-fmt}}
{{libheader|Wren-sort}}
<syntaxhighlight lang="wren">import "io" for File
import "./pattern" for Pattern
import "./fmt" for Fmt
import "./sort" for Sort

var p = Pattern.new("+1/s")
var fileName = "readings.txt"
var lines = File.read(fileName).trimEnd().split("\r\n")
var count = 0
var invalid = 0
var allGood = 0
var map = {}
for (line in lines) {
count = count + 1
var fields = p.splitAll(line)
var date = fields[0]
if (fields.count == 49) {
map[date] = map.containsKey(date) ? map[date] + 1 : 1
var good = 0
var i = 2
while (i < fields.count) {
if (Num.fromString(fields[i]) >= 1) good = good + 1
i = i + 2
}
if (good == 24) allGood = allGood + 1
} else {
invalid = invalid + 1
}
}

Fmt.print("File = $s", fileName)
System.print("\nDuplicated dates:")
var keys = map.keys.toList
Sort.quick(keys)
for (k in keys) {
var v = map[k]
if (v > 1) Fmt.print(" $s ($d times)", k, v)
}
Fmt.print("\nTotal number of records : $d", count)
var percent = invalid/count * 100
Fmt.print("Number of invalid records : $d ($5.2f)\%", invalid, percent)
percent = allGood/count * 100
Fmt.print("Number which are all good : $d ($5.2f)\%", allGood, percent)</syntaxhighlight>

{{out}}
<pre>
File = readings.txt

Duplicated dates:
1990-03-25 (2 times)
1991-03-31 (2 times)
1992-03-29 (2 times)
1993-03-28 (2 times)
1995-03-26 (2 times)

Total number of records : 5471
Number of invalid records : 0 ( 0.00)%
Number which are all good : 5017 (91.70)%
</pre>


=={{header|zkl}}==
=={{header|zkl}}==
<lang zkl> // the RegExp engine has a low limit on groups so
<syntaxhighlight lang="zkl"> // the RegExp engine has a low limit on groups so
// I can't use it to select all fields, only verify them
// I can't use it to select all fields, only verify them
re:=RegExp(0'|^(\d+-\d+-\d+)| + 0'|\s+\d+\.\d+\s+-*\d+| * 24 + ".+$");
re:=RegExp(0'|^(\d+-\d+-\d+)| + 0'|\s+\d+\.\d+\s+-*\d+| * 24 + ".+$");
w:=Utils.Helpers.zipW(File("readings.txt"),[1..]); //-->lazy (line,line #)
w:=[1..].zip(File("readings.txt")); //-->lazy (line #,line)
reg datep,N, good=0, dd=0;
reg datep,N, good=0, dd=0;
foreach line,n in (w){
foreach n,line in (w){
N=n; // since n is local to this scope
N=n; // since n is local to this scope
if (not re.search(line)){ println("Line %d: malformed".fmt(n)); continue; }
if (not re.search(line)){ println("Line %d: malformed".fmt(n)); continue; }
Line 2,421: Line 3,437:
datep=date;
datep=date;
if (line.replace("\t"," ").split(" ").filter()[1,*] // blow fields apart, drop date
if (line.replace("\t"," ").split(" ").filter()[1,*] // blow fields apart, drop date
.pump(Void,T.fp(Void.Read,1), // get (reading,status)
.pump(Void,Void.Read, // get (reading,status)
fcn(_,s){ // stop on first problem status and return True
fcn(_,s){ // stop on first problem status and return True
if(s.strip().toInt()<1) T(Void.Stop,True) else False
if(s.strip().toInt()<1) T(Void.Stop,True) else False
Line 2,427: Line 3,443:
good+=1;
good+=1;
}
}
println("%d records read, %d duplicate dates, %d valid".fmt(N,dd,good));</lang>
println("%d records read, %d duplicate dates, %d valid".fmt(N,dd,good));</syntaxhighlight>
{{out}}
{{out}}
<pre>
<pre>

Latest revision as of 11:41, 14 February 2024

Task
Text processing/2
You are encouraged to solve this task according to the task description, using any language you may know.

The following task concerns data that came from a pollution monitoring station with twenty-four instruments monitoring twenty-four aspects of pollution in the air. Periodically a record is added to the file, each record being a line of 49 fields separated by white-space, which can be one or more space or tab characters.

The fields (from the left) are:

 DATESTAMP [ VALUEn FLAGn ] * 24

i.e. a datestamp followed by twenty-four repetitions of a floating-point instrument value and that instrument's associated integer flag. Flag values are >= 1 if the instrument is working and < 1 if there is some problem with it, in which case that instrument's value should be ignored.

A sample from the full data file readings.txt, which is also used in the Text processing/1 task, follows:

Data is no longer available at that link. Zipped mirror available here

1991-03-30	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1
1991-03-31	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	10.000	1	20.000	1	20.000	1	20.000	1	35.000	1	50.000	1	60.000	1	40.000	1	30.000	1	30.000	1	30.000	1	25.000	1	20.000	1	20.000	1	20.000	1	20.000	1	20.000	1	35.000	1
1991-03-31	40.000	1	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2	0.000	-2
1991-04-01	0.000	-2	13.000	1	16.000	1	21.000	1	24.000	1	22.000	1	20.000	1	18.000	1	29.000	1	44.000	1	50.000	1	43.000	1	38.000	1	27.000	1	27.000	1	24.000	1	23.000	1	18.000	1	12.000	1	13.000	1	14.000	1	15.000	1	13.000	1	10.000	1
1991-04-02	8.000	1	9.000	1	11.000	1	12.000	1	12.000	1	12.000	1	27.000	1	26.000	1	27.000	1	33.000	1	32.000	1	31.000	1	29.000	1	31.000	1	25.000	1	25.000	1	24.000	1	21.000	1	17.000	1	14.000	1	15.000	1	12.000	1	12.000	1	10.000	1
1991-04-03	10.000	1	9.000	1	10.000	1	10.000	1	9.000	1	10.000	1	15.000	1	24.000	1	28.000	1	24.000	1	18.000	1	14.000	1	12.000	1	13.000	1	14.000	1	15.000	1	14.000	1	15.000	1	13.000	1	13.000	1	13.000	1	12.000	1	10.000	1	10.000	1
Task
  1. Confirm the general field format of the file.
  2. Identify any DATESTAMPs that are duplicated.
  3. Report the number of records that have good readings for all instruments.



11l

Translation of: Python
V debug = 0B
V datePat = re:‘\d{4}-\d{2}-\d{2}’
V valuPat = re:‘[-+]?\d+\.\d+’
V statPat = re:‘-?\d+’
V totalLines = 0
Set[String] dupdate
Set[String] badform
Set[String] badlen
V badreading = 0
Set[String] datestamps

L(line) File(‘readings.txt’).read().rtrim("\n").split("\n")
   totalLines++
   V fields = line.split("\t")
   V date = fields[0]
   V pairs = (1 .< fields.len).step(2).map(i -> (@fields[i], @fields[i + 1]))

   V lineFormatOk = datePat.match(date)
      & all(pairs.map(p -> :valuPat.match(p[0])))
      & all(pairs.map(p -> :statPat.match(p[1])))
   I !lineFormatOk
      I debug
         print(‘Bad formatting ’line)
      badform.add(date)

   I pairs.len != 24 | any(pairs.map(p -> Int(p[1]) < 1))
      I debug
         print(‘Missing values ’line)
   I pairs.len != 24
      badlen.add(date)
   I any(pairs.map(p -> Int(p[1]) < 1))
      badreading++

   I date C datestamps
      I debug
         print(‘Duplicate datestamp ’line)
      dupdate.add(date)

   datestamps.add(date)

print("Duplicate dates:\n  "sorted(Array(dupdate)).join("\n  "))
print("Bad format:\n  "sorted(Array(badform)).join("\n  "))
print("Bad number of fields:\n  "sorted(Array(badlen)).join("\n  "))
print("Records with good readings: #. = #2.2%\n".format(
   totalLines - badreading, (totalLines - badreading) / Float(totalLines) * 100))
print(‘Total records:  ’totalLines)
Output:
Duplicate dates:
  1990-03-25
  1991-03-31
  1992-03-29
  1993-03-28
  1995-03-26
Bad format:

Bad number of fields:

Records with good readings: 5017 = 91.70%

Total records:  5471

Ada

with Ada.Calendar;           use Ada.Calendar;
with Ada.Text_IO;            use Ada.Text_IO;
with Strings_Edit;           use Strings_Edit;
with Strings_Edit.Floats;    use Strings_Edit.Floats;
with Strings_Edit.Integers;  use Strings_Edit.Integers;

with Generic_Map;

procedure Data_Munging_2 is
   package Time_To_Line is new Generic_Map (Time, Natural);
   use Time_To_Line;
   File    : File_Type;
   Line_No : Natural := 0;
   Count   : Natural := 0;
   Stamps  : Map;
begin
   Open (File, In_File, "readings.txt");
   loop
      declare
         Line    : constant String := Get_Line (File);
         Pointer : Integer := Line'First;
         Flag    : Integer;
         Year, Month, Day : Integer;
         Data    : Float;
         Stamp   : Time;
         Valid   : Boolean := True;
      begin
         Line_No := Line_No + 1;
         Get (Line, Pointer, SpaceAndTab);
         Get (Line, Pointer, Year);
         Get (Line, Pointer, Month);
         Get (Line, Pointer, Day);
         Stamp := Time_Of (Year_Number (Year), Month_Number (-Month), Day_Number (-Day));
         begin
            Add (Stamps, Stamp, Line_No);
         exception
            when Constraint_Error =>
               Put (Image (Year) & Image (Month) & Image (Day) & ": record at " & Image (Line_No));
               Put_Line (" duplicates record at " & Image (Get (Stamps, Stamp)));
         end;
         Get (Line, Pointer, SpaceAndTab);
         for Reading in 1..24 loop
            Get (Line, Pointer, Data);
            Get (Line, Pointer, SpaceAndTab);
            Get (Line, Pointer, Flag);
            Get (Line, Pointer, SpaceAndTab);
            Valid := Valid and then Flag >= 1;
         end loop;
         if Pointer <= Line'Last then
            Put_Line ("Unrecognized tail at " & Image (Line_No) & ':' & Image (Pointer));
         elsif Valid then
            Count := Count + 1;
         end if;
      exception
         when End_Error | Data_Error | Constraint_Error | Time_Error =>
            Put_Line ("Syntax error at " & Image (Line_No) & ':' & Image (Pointer));
      end;
   end loop;
exception
   when End_Error =>
      Close (File);
      Put_Line ("Valid records " & Image (Count) & " of " & Image (Line_No) & " total");
end Data_Munging_2;

Sample output

1990-3-25: record at 85 duplicates record at 84
1991-3-31: record at 456 duplicates record at 455
1992-3-29: record at 820 duplicates record at 819
1993-3-28: record at 1184 duplicates record at 1183
1995-3-26: record at 1911 duplicates record at 1910
Valid records 5017 of 5471 total

Aime

check_format(list l)
{
    integer i;
    text s;

    if (~l != 49) {
        error("bad field count");
    }

    s = l[0];
    if (match("????-??-??", s)) {
        error("bad date format");
    }
    l[0] = s.delete(7).delete(4).atoi;

    i = 1;
    while (i < 49) {
        atof(l[i]);
        i += 1;
        l[i >> 1] = atoi(l[i]);
        i += 1;
    }

    l.erase(25, -1);
}

main(void)
{
    integer goods, i, v;
    file f;
    list l;
    index x;

    goods = 0;

    f.affix("readings.txt");

    while (f.list(l, 0) != -1) {
        if (!trap(check_format, l)) {
            if ((x[v = lf_x_integer(l)] += 1) != 1) {
                v_form("duplicate ~ line\n", v);
            }

            i = 1;
            l.ucall(min_i, 1, i);
            goods += iclip(0, i, 1);
        }
    }

    o_(goods, " good lines\n");

    0;
}
Output:

(the "reading.txt" needs to be converted to UNIX end-of-line)

duplicate 19900325 line
duplicate 19910331 line
duplicate 19920329 line
duplicate 19930328 line
duplicate 19950326 line
5017 good lines


Amazing Hopper

Translation of: AWK
#include <basico.h>

algoritmo

     número de campos correcto = `awk 'NF != 49' basica/readings.txt`

     fechas repetidas = `awk '++count[$1] >= 2{print $1, "(",count[$1],")"}' basica/readings.txt`

     resultados buenos = `awk '{rec++;ok=1; for(i=0;i<24;i++){if($(2*i+3)<1){ok=0}}; recordok += ok} END {print "Total records",rec,"OK records", recordok, "or", recordok/rec*100,"%"}'  basica/readings.txt`
     

     "Check field number by line: ", #( !(number(número de campos correcto)) ? "Ok\n" : "Nok\n";),\
     "\nCheck duplicated dates:\n", fechas repetidas,NL, \
     "Number of records have good readings for all instruments:\n",resultados buenos,\
     "(including "
          fijar separador( NL )
          contar tokens en 'fechas repetidas'
     " duplicated records)\n", luego imprime todo

terminar
Output:
Check field number by line: Ok

Check duplicated dates:
1990-03-25 ( 2 )
1991-03-31 ( 2 )
1992-03-29 ( 2 )
1993-03-28 ( 2 )
1995-03-26 ( 2 )

Number of records have good readings for all instruments:
Total records 5471 OK records 5017 or 91,7017 %
(including 5 duplicated records)

AutoHotkey

; Author: AlephX Aug 17 2011
data = %A_scriptdir%\readings.txt

Loop, Read, %data%
	{
	Lines := A_Index	
    StringReplace, dummy, A_LoopReadLine, %A_Tab%,, All UseErrorLevel
		
    Loop, parse, A_LoopReadLine, %A_Tab%
		{
		wrong := 0
		if A_index = 1
			{
			Date := A_LoopField
			if (Date == OldDate)
				{
				WrongDates = %WrongDates%%OldDate% at %Lines%`n
				TotwrongDates++
				Wrong := 1
				break
				}
			}
		else
			{		
			if (A_loopfield/1 < 0)
				{
				Wrong := 1
				break
				}

			}
		}

	if (wrong == 1)
		totwrong++
	else
		valid++
	
	if (errorlevel <> 48)
		{
		if (wrong == 0)
			{	
			totwrong++
			valid--
			}
		unvalidformat++
		}	
		
	olddate := date
	}
	
msgbox, Duplicate Dates:`n%wrongDates%`nRead Lines: %lines%`nValid Lines: %valid%`nwrong lines: %totwrong%`nDuplicates: %TotWrongDates%`nWrong Formatted: %unvalidformat%`n

Sample Output:

Duplicate Dates:
1990-03-25 at 85
1991-03-31 at 456
1992-03-29 at 820
1993-03-28 at 1184
1995-03-26 at 1911

Read Lines: 5471
Valid Lines: 5129
wrong lines: 342
Duplicates: 5
Wrong Formatted: 0

AWK

A series of AWK one-liners are shown as this is often what is done. If this information were needed repeatedly, (and this is not known), a more permanent shell script might be created that combined multi-line versions of the scripts below.

Gradually tie down the format.

(In each case offending lines will be printed)

If their are any scientific notation fields then their will be an e in the file:

bash$ awk '/[eE]/' readings.txt
bash$

Quick check on the number of fields:

bash$ awk 'NF != 49' readings.txt
bash$

Full check on the file format using a regular expression:

bash$ awk '!(/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+)+$/ && NF==49)' readings.txt
bash$

Full check on the file format as above but using regular expressions allowing intervals (gnu awk):

bash$ awk  --re-interval '!(/^[0-9]{4}-[0-9]{2}-[0-9]{2}([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+){24}+$/ )' readings.txt
bash$


Identify any DATESTAMPs that are duplicated.

Accomplished by counting how many times the first field occurs and noting any second occurrences.

bash$ awk '++count[$1]==2{print $1}' readings.txt
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
bash$


What number of records have good readings for all instruments.

bash$ awk '{rec++;ok=1; for(i=0;i<24;i++){if($(2*i+3)<1){ok=0}}; recordok += ok} END {print "Total records",rec,"OK records", recordok, "or", recordok/rec*100,"%"}'  readings.txt
Total records 5471 OK records 5017 or 91.7017 %
bash$

C

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

typedef struct { const char *s; int ln, bad; } rec_t;
int cmp_rec(const void *aa, const void *bb)
{
	const rec_t *a = aa, *b = bb;
	return a->s == b->s ? 0 : !a->s ? 1 : !b->s ? -1 : strncmp(a->s, b->s, 10);
}

int read_file(const char *fn)
{
	int fd = open(fn, O_RDONLY);
	if (fd == -1) return 0;

	struct stat s;
	fstat(fd, &s);

	char *txt = malloc(s.st_size);
	read(fd, txt, s.st_size);
	close(fd);

	int i, j, lines = 0, k, di, bad;
	for (i = lines = 0; i < s.st_size; i++)
		if (txt[i] == '\n') {
			txt[i] = '\0';
			lines++;
		}

	rec_t *rec = calloc(sizeof(rec_t), lines);
	const char *ptr, *end;
	rec[0].s = txt;
	rec[0].ln = 1;
	for (i = 0; i < lines; i++) {
		if (i + 1 < lines) {
			rec[i + 1].s = rec[i].s + strlen(rec[i].s) + 1;
			rec[i + 1].ln = i + 2;
		}
		if (sscanf(rec[i].s, "%4d-%2d-%2d", &di, &di, &di) != 3) {
			printf("bad line %d: %s\n", i, rec[i].s);
			rec[i].s = 0;
			continue;
		}
		ptr = rec[i].s + 10;

		for (j = k = 0; j < 25; j++) {
			if (!strtod(ptr, (char**)&end) && end == ptr) break;
			k++, ptr = end;
			if (!(di = strtol(ptr, (char**)&end, 10)) && end == ptr) break;
			k++, ptr = end;
			if (di < 1) rec[i].bad = 1;
		}

		if (k != 48) {
			printf("bad format at line %d: %s\n", i, rec[i].s);
			rec[i].s = 0;
		}
	}

	qsort(rec, lines, sizeof(rec_t), cmp_rec);
	for (i = 1, bad = rec[0].bad, j = 0; i < lines && rec[i].s; i++) {
		if (rec[i].bad) bad++;
		if (strncmp(rec[i].s, rec[j].s, 10)) {
			j = i;
		} else
			printf("dup line %d: %.10s\n", rec[i].ln, rec[i].s);
	}

	free(rec);
	free(txt);
	printf("\n%d out %d lines good\n", lines - bad, lines);
	return 0;
}

int main()
{
	read_file("readings.txt");
	return 0;
}
Output:
dup line 85: 1990-03-25
dup line 456: 1991-03-31
dup line 820: 1992-03-29
dup line 1184: 1993-03-28
dup line 1911: 1995-03-26

5017 out 5471 lines good

C#

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.IO;

namespace TextProc2
{
    class Program
    {
        static void Main(string[] args)
        {
            Regex multiWhite = new Regex(@"\s+");
            Regex dateEx = new Regex(@"^\d{4}-\d{2}-\d{2}$");
            Regex valEx = new Regex(@"^\d+\.{1}\d{3}$");
            Regex flagEx = new Regex(@"^[1-9]{1}$");
            
            int missformcount = 0, totalcount = 0;
            Dictionary<int, string> dates = new Dictionary<int, string>();

            using (StreamReader sr = new StreamReader("readings.txt"))
            {
                string line = sr.ReadLine();
                while (line != null)
                {
                    line = multiWhite.Replace(line, @" ");                    
                    string[] splitLine = line.Split(' ');
                    if (splitLine.Length != 49)
                        missformcount++;
                    if (!dateEx.IsMatch(splitLine[0]))                        
                        missformcount++;                    
                    else
                        dates.Add(totalcount + 1, dateEx.Match(splitLine[0]).ToString());
                    int err = 0;                    
                    for (int i = 1; i < splitLine.Length; i++)
                    {
                        if (i%2 != 0)
                        {
                            if (!valEx.IsMatch(splitLine[i]))                          
                                err++;
                        }
                        else
                        {
                            if (!flagEx.IsMatch(splitLine[i]))
                                err++;                                                        
                        }                        
                    }
                    if (err != 0) missformcount++;
                    line = sr.ReadLine();
                    totalcount++;                    
                }
            }

            int goodEntries = totalcount - missformcount;
            Dictionary<string,List<int>> dateReverse = new Dictionary<string,List<int>>();

            foreach (KeyValuePair<int, string> kvp in dates)
            {
                if (!dateReverse.ContainsKey(kvp.Value))
                    dateReverse[kvp.Value] = new List<int>();
                dateReverse[kvp.Value].Add(kvp.Key);
            }

            Console.WriteLine(goodEntries + " valid Records out of " + totalcount);

            foreach (KeyValuePair<string, List<int>> kvp in dateReverse)
            {
                if (kvp.Value.Count > 1)
                    Console.WriteLine("{0} is duplicated at Lines : {1}", kvp.Key, string.Join(",", kvp.Value));                    
            }
        }
    }
}
5017 valid Records out of 5471
1990-03-25 is duplicated at Lines : 84,85
1991-03-31 is duplicated at Lines : 455,456
1992-03-29 is duplicated at Lines : 819,820
1993-03-28 is duplicated at Lines : 1183,1184
1995-03-26 is duplicated at Lines : 1910,1911

C++

Library: Boost
#include <boost/regex.hpp>
#include <fstream>
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <cstdlib>
#include <algorithm>
using namespace std ;

boost::regex e ( "\\s+" ) ; 

int main( int argc , char *argv[ ] ) { 
   ifstream infile( argv[ 1 ] ) ; 
   vector<string> duplicates ;
   set<string> datestamps ; //for the datestamps
   if ( ! infile.is_open( ) ) { 
      cerr << "Can't open file " << argv[ 1 ] << '\n' ;
      return 1 ; 
   }   
   int all_ok = 0  ;//all_ok for lines in the given pattern e
   int pattern_ok = 0 ; //overall field pattern of record is ok
   while ( infile ) { 
      string eingabe ;
      getline( infile , eingabe ) ;
      boost::sregex_token_iterator i ( eingabe.begin( ), eingabe.end( ) , e , -1 ), j  ;//we tokenize on empty fields
      vector<string> fields( i, j ) ;
      if ( fields.size( ) == 49 ) //we expect 49 fields in a record
         pattern_ok++ ;
      else
         cout << "Format not ok!\n" ;
      if ( datestamps.insert( fields[ 0 ] ).second ) { //not duplicated
         int howoften = ( fields.size( ) - 1 ) / 2 ;//number of measurement
                                                    //devices and values
         for ( int n = 1 ; atoi( fields[ 2 * n ].c_str( ) ) >= 1 ; n++ ) {
            if ( n == howoften ) {
               all_ok++ ;
               break ;
            }
         }
      }
      else {
         duplicates.push_back( fields[ 0 ] ) ;//first field holds datestamp
      }
   }
   infile.close( ) ;
   cout << "The following " << duplicates.size() << " datestamps were duplicated:\n" ;
   copy( duplicates.begin( ) , duplicates.end( ) ,
         ostream_iterator<string>( cout , "\n" ) ) ;
   cout << all_ok << " records were complete and ok!\n" ;
   return 0 ;
}
Output:
Format not ok!
The following 6 datestamps were duplicated:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
2004-12-31


Clojure

(defn parse-line [s]
  (let [[date & data-toks] (str/split s #"\s+")
        data-fields (map read-string data-toks)
        valid-date? (fn [s] (re-find #"\d{4}-\d{2}-\d{2}" s))
        valid-line? (and (valid-date? date)
                         (= 48 (count data-toks))
                         (every? number? data-fields))
        readings    (for [[v flag] (partition 2 data-fields)]
                      {:val v :flag flag})]
    (when (not valid-line?)
      (println "Malformed Line: " s))
    {:date date
     :no-missing-readings? (and (= 48 (count data-toks))
                                (every? pos? (map :flag readings)))}))

(defn analyze-file [path]
  (reduce (fn [m line]
            (let [{:keys [all-dates dupl-dates n-full-recs invalid-lines]} m
                  this-date (:date line)
                  dupl? (contains? all-dates this-date)
                  full? (:no-missing-readings? line)]
              (cond-> m
                dupl? (update-in [:dupl-dates]  conj this-date)
                full? (update-in [:n-full-recs] inc)
                true  (update-in [:all-dates]   conj this-date))))
          {:dupl-dates #{} :all-dates #{} :n-full-recs 0}
          (->> (slurp path)
               clojure.string/split-lines
               (map parse-line))))

(defn report-summary [path]
  (let [m (analyze-file path)]
    (println (format "%d unique dates" (count (:all-dates m))))
    (println (format "%d duplicated dates [%s]"
                     (count (:dupl-dates m))
                     (clojure.string/join " " (sort (:dupl-dates m)))))
    (println (format "%d lines with no missing data" (:n-full-recs m)))))
Output:
5466 unique dates
5 duplicated dates [1990-03-25 1991-03-31 1992-03-29 1993-03-28 1995-03-26]
5017 lines with no missing data

COBOL

Works with: OpenCOBOL
       IDENTIFICATION DIVISION.
       PROGRAM-ID. text-processing-2.

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT readings ASSIGN Input-File-Path
               ORGANIZATION LINE SEQUENTIAL
               FILE STATUS file-status.
       
       DATA DIVISION.
       FILE SECTION.
       FD  readings.
       01  reading-record.
           03  date-stamp          PIC X(10).
           03  FILLER              PIC X.
           03  input-data          PIC X(300).

       LOCAL-STORAGE SECTION.
       78  Input-File-Path         VALUE "readings.txt".
       78  Num-Data-Points         VALUE 48.

       01  file-status             PIC XX.

       01  current-line            PIC 9(5).

       01  num-date-stamps-read    PIC 9(5).
       01  read-date-stamps-area.
           03  read-date-stamps    PIC X(10) OCCURS 1 TO 10000 TIMES
                                   DEPENDING ON num-date-stamps-read
                                   INDEXED BY date-stamp-idx.

       01  offset                  PIC 999.
       01  data-len                PIC 999.
       01  data-flag               PIC X.
           88  data-not-found      VALUE "N".

       01  data-field              PIC X(25).

       01  i                       PIC 99.

       01  num-good-readings       PIC 9(5).

       01  reading-flag            PIC X.
           88 bad-reading          VALUE "B".

       01  delim                   PIC X.

       PROCEDURE DIVISION.
       DECLARATIVES.
       readings-error SECTION.
           USE AFTER ERROR ON readings

           DISPLAY "An error occurred while using " Input-File-Path
           DISPLAY "Error code " file-status
           DISPLAY "The program will terminate."

           CLOSE readings
           GOBACK
           .
       END DECLARATIVES.

       main-line.
           OPEN INPUT readings

           *> Process each line of the file.
           PERFORM FOREVER
               READ readings
                   AT END
                       EXIT PERFORM
               END-READ

               ADD 1 TO current-line

               IF reading-record = SPACES
                   DISPLAY "Line " current-line " is blank."
                   EXIT PERFORM CYCLE
               END-IF

               PERFORM check-duplicate-date-stamp

               *> Check there are 24 data pairs and see if all the
               *> readings are ok.
               INITIALIZE offset, reading-flag, data-flag
               PERFORM VARYING i FROM 1 BY 1 UNTIL Num-Data-Points < i
                   PERFORM get-next-field
                   IF data-not-found
                       DISPLAY "Line " current-line " has missing "
                           "fields."
                       SET bad-reading TO TRUE
                       EXIT PERFORM
                   END-IF

                   *> Every other data field is the instrument flag.
                   IF FUNCTION MOD(i, 2) = 0 AND NOT bad-reading
                       IF FUNCTION NUMVAL(data-field) <= 0
                           SET bad-reading TO TRUE
                       END-IF
                   END-IF

                   ADD data-len TO offset
               END-PERFORM

               IF NOT bad-reading
                   ADD 1 TO num-good-readings
               END-IF
           END-PERFORM

           CLOSE readings

           *> Display results.
           DISPLAY SPACE
           DISPLAY current-line " lines read."
           DISPLAY num-good-readings " have good readings for all "
               "instruments."

           GOBACK
           .
       check-duplicate-date-stamp.
           SEARCH read-date-stamps
               AT END
                   ADD 1 TO num-date-stamps-read
                   MOVE date-stamp
                       TO read-date-stamps (num-date-stamps-read)

               WHEN read-date-stamps (date-stamp-idx) = date-stamp
                   DISPLAY "Date " date-stamp " is duplicated at "
                       "line " current-line "."
           END-SEARCH
           .
       get-next-field.
           INSPECT input-data (offset:) TALLYING offset
               FOR LEADING X"09"

           *> The fields are normally delimited by a tab.
           MOVE X"09" TO delim
           PERFORM find-num-chars-before-delim

           *> If the delimiter was not found...
           IF FUNCTION SUM(data-len, offset) > 300
               *> The data may be delimited by a space if it is at the
               *> end of the line.
               MOVE SPACE TO delim
               PERFORM find-num-chars-before-delim

               IF FUNCTION SUM(data-len, offset) > 300
                   SET data-not-found TO TRUE
                   EXIT PARAGRAPH
               END-IF
           END-IF

           IF data-len = 0
               SET data-not-found TO TRUE
               EXIT PARAGRAPH
           END-IF

           MOVE input-data (offset:data-len) TO data-field
           .
       find-num-chars-before-delim.
           INITIALIZE data-len
           INSPECT input-data (offset:) TALLYING data-len
               FOR CHARACTERS BEFORE delim
           .
Output:
Date 1990-03-25 is duplicated at line 00084.
Date 1991-03-31 is duplicated at line 00455.
Date 1992-03-29 is duplicated at line 00819.
Date 1993-03-28 is duplicated at line 01183.
Date 1995-03-26 is duplicated at line 01910.
 
05470 lines read.
05016 have good readings for all instruments.

D

void main() {
    import std.stdio, std.array, std.string, std.regex, std.conv,
           std.algorithm;

    auto rxDate = `^\d\d\d\d-\d\d-\d\d$`.regex;
    // Works but eats lot of RAM in DMD 2.064.
    // auto rxDate = ctRegex!(`^\d\d\d\d-\d\d-\d\d$`);

    int[string] repeatedDates;
    int goodReadings;
    foreach (string line; "readings.txt".File.lines) {
        try {
            auto parts = line.split;
            if (parts.length != 49)
                throw new Exception("Wrong column count");
            if (parts[0].match(rxDate).empty)
                throw new Exception("Date is wrong");
            repeatedDates[parts[0]]++;
            bool noProblem = true;
            for (int i = 1; i < 48; i += 2) {
                if (parts[i + 1].to!int < 1)
                    // don't break loop because it's validation too.
                    noProblem = false;
                if (!parts[i].isNumeric)
                    throw new Exception("Reading is wrong: "~parts[i]);
            }
            if (noProblem)
                goodReadings++;
        } catch(Exception ex) {
            writefln(`Problem in line "%s": %s`, line, ex);
        }
    }

    writefln("Duplicated timestamps: %-(%s, %)",
            repeatedDates.byKey.filter!(k => repeatedDates[k] > 1));
    writeln("Good reading records: ", goodReadings);
}
Output:
Duplicated timestamps: 1990-03-25, 1991-03-31, 1992-03-29, 1993-03-28, 1995-03-26
Good reading records: 5017

Eiffel

class
	APPLICATION

create
	make

feature

	make
			-- Finds double date stamps and wrong formats.
		local
			found: INTEGER
			double: STRING
		do
			read_wordlist
			fill_hash_table
			across
				hash as h
			loop
				if h.key.has_substring ("_double") then
					io.put_string ("Double date stamp: %N")
					double := h.key
					double.remove_tail (7)
					io.put_string (double)
					io.new_line
				end
				if h.item.count /= 24 then
					io.put_string (h.key.out + " has the wrong format. %N")
					found := found + 1
				end
			end
			io.put_string (found.out + " records have not 24 readings.%N")
			good_records
		end

	good_records
			-- Number of records that have flag values > 0 for all readings.
		local
			count, total: INTEGER
			end_date: STRING
		do
			create end_date.make_empty
			across
				hash as h
			loop
				count := 0
				across
					h.item as d
				loop
					if d.item.flag > 0 then
						count := count + 1
					end
				end
				if count = 24 then
					total := total + 1
				end
			end
			io.put_string ("%NGood records: " + total.out + ". %N")
		end

	original_list: STRING = "readings.txt"

	read_wordlist
			--Preprocesses data in 'data'.
		local
			l_file: PLAIN_TEXT_FILE
		do
			create l_file.make_open_read_write (original_list)
			l_file.read_stream (l_file.count)
			data := l_file.last_string.split ('%N')
			l_file.close
		end

	data: LIST [STRING]

	fill_hash_table
			--Fills 'hash' using the date as key.
		local
			by_dates: LIST [STRING]
			date: STRING
			data_tup: TUPLE [val: REAL; flag: INTEGER]
			data_arr: ARRAY [TUPLE [val: REAL; flag: INTEGER]]
			i: INTEGER
		do
			create hash.make (data.count)
			across
				data as d
			loop
				if not d.item.is_empty then
					by_dates := d.item.split ('%T')
					date := by_dates [1]
					by_dates.prune (date)
					create data_tup
					create data_arr.make_empty
					from
						i := 1
					until
						i > by_dates.count - 1
					loop
						data_tup := [by_dates [i].to_real, by_dates [i + 1].to_integer]
						data_arr.force (data_tup, data_arr.count + 1)
						i := i + 2
					end
					hash.put (data_arr, date)
					if not hash.inserted then
						date.append ("_double")
						hash.put (data_arr, date)
					end
				end
			end
		end

	hash: HASH_TABLE [ARRAY [TUPLE [val: REAL; flag: INTEGER]], STRING]

end
Output:
Double date stamp:
1990-03-25
Double date stamp:
1991-03-31
Double date stamp:
1992-03-29
Double date stamp:
1993-03-28
Double date stamp:
1995-03-26
0 records have not 24 readings.

Good records: 5017.

Erlang

Uses function from Text_processing/1. It does some correctness checks for us.

-module( text_processing2 ).

-export( [task/0] ).

task() ->
	Name = "priv/readings.txt",
	try
	File_contents = text_processing:file_contents( Name ),
	[correct_field_format(X) || X<- File_contents],
	{_Previous, Duplicates} = lists:foldl( fun date_duplicates/2, {"", []}, File_contents ),
	io:fwrite( "Duplicates: ~p~n", [Duplicates] ),
	Good = [X || X <- File_contents, is_all_good_readings(X)],
	io:fwrite( "Good readings: ~p~n", [erlang:length(Good)] )

	catch
	_:Error ->
		io:fwrite( "Error: Failed when checking ~s: ~p~n", [Name, Error] )
	end.



correct_field_format( {_Date, Value_flags} ) ->
	Corret_number = value_flag_records(),
	{correct_field_format, Corret_number} = {correct_field_format, erlang:length(Value_flags)}.

date_duplicates( {Date, _Value_flags}, {Date, Acc} ) -> {Date, [Date | Acc]};
date_duplicates( {Date, _Value_flags}, {_Other, Acc} ) -> {Date, Acc}.

is_all_good_readings( {_Date, Value_flags} ) -> value_flag_records() =:= erlang:length( [ok || {_Value, ok} <-  Value_flags] ).

value_flag_records() -> 24.
Output:
12> text_processing2:task().
Duplicates: ["1995-03-26","1993-03-28","1992-03-29","1991-03-31","1990-03-25"]
Good readings: 5017

F#

let file = @"readings.txt"

let dates = HashSet(HashIdentity.Structural)
let mutable ok = 0

do
  for line in System.IO.File.ReadAllLines file do
    match String.split [' '; '\t'] line with
    | [] -> ()
    | date::xys ->
        if dates.Contains date then
          printf "Date %s is duplicated\n" date
        else
          dates.Add date
        let f (b, t) h = not b, if b then int h::t else t
        let _, states = Seq.fold f (false, []) xys
        if Seq.forall (fun s -> s >= 1) states then
          ok <- ok + 1
  printf "%d records were ok\n" ok

Prints:

Date 1990-03-25 is duplicated
Date 1991-03-31 is duplicated
Date 1992-03-29 is duplicated
Date 1993-03-28 is duplicated
Date 1995-03-26 is duplicated
5017 records were ok

Factor

Works with: Factor version 0.99 2020-03-02
USING: io io.encodings.ascii io.files kernel math math.parser
prettyprint sequences sequences.extras sets splitting ;

: check-format ( seq -- )
    [ " \t" split length 49 = ] all?
    "Format okay." "Format not okay." ? print ;

"readings.txt" ascii file-lines [ check-format ] keep
[ "Duplicates:" print [ "\t" split1 drop ] map duplicates . ]
[ [ " \t" split rest <odds> [ string>number 0 <= ] none? ] count ]
bi pprint " records were good." print
Output:
Format okay.
Duplicates:
{
    "1990-03-25"
    "1991-03-31"
    "1992-03-29"
    "1993-03-28"
    "1995-03-26"
}
5017 records were good.

Fortran

The trouble with the dates rather suggests that they should be checked for correctness in themselves, and that the sequence check should be that each new record advances the date by one day. Daynumber calculations were long ago presented by H. F. Fliegel and T.C. van Flandern, in Communications of the ACM, Vol. 11, No. 10 (October, 1968).

Rather than copy today's data to a PDATA holder so that on the next read the new data may be compared to the old, a two-row array is used, with IT flip-flopping 1,2,1,2,1,2,... Comparison of the data as numerical values rather than text strings means that different texts that evoke the same value will not be regarded as different. If the data format were invalid, there would be horrible messages. There aren't, so ... the values should be read and plotted...

Crunches a set of hourly data. Starts with a date, then 24 pairs of value,indicator for that day, on one line.
      INTEGER Y,M,D		!Year, month, and day.
      INTEGER GOOD(24,2)	!The indicators.
      REAL*8     V(24,2)	!The grist.
      CHARACTER*10 DATE(2)	!Along with the starting date.
      INTEGER IT,TI		!A flipper and its antiflipper.
      INTEGER NV		!Number of entirely good records.
      INTEGER I,NREC,HIC	!Some counters.
      LOGICAL INGOOD		!State flipper for the runs of data.
      INTEGER IN,MSG		!I/O mnemonics.
      CHARACTER*666 ACARD	!Scratchpad, of sufficient length for all expectation.
      IN = 10		!Unit number for the input file.
      MSG = 6		!Output.
      OPEN (IN,FILE="Readings1.txt", FORM="FORMATTED",	!This should be a function.
     1 STATUS ="OLD",ACTION="READ")			!Returning success, or failure.
      NV = 0		!No pure records seen.
      NREC = 0		!No records read.
      HIC = 0		!Provoking no complaints.
      DATE = "snargle"	!No date should look like this!
      IT = 2		!Syncopation for the 1-2 flip flop.
Chew into the file.
   10 READ (IN,11,END=100,ERR=666) L,ACARD(1:MIN(L,LEN(ACARD)))	!With some protection.
      NREC = NREC + 1		!So, a record has been read.
   11 FORMAT (Q,A)		!Obviously, Q ascertains the length of the record being read.
      READ (ACARD,12,END=600,ERR=601) Y,M,D	!The date part is trouble, as always.
   12 FORMAT (I4,2(1X,I2))				!Because there are no delimiters between the parts.
      TI = IT			!Thus finger the previous value.
      IT = 3 - IT		!Flip between 1 and 2.
      DATE(IT) = ACARD(1:10)	!Save the date field.
      READ (ACARD(11:L),*,END=600,ERR=601) (V(I,IT),GOOD(I,IT),I = 1,24)	!But after the date, delimiters abound.
Comparisons. Should really convert the date to a daynumber, check it by reversion, and then check for + 1 day only.
   20 IF (DATE(IT).EQ.DATE(TI)) THEN	!Same date?
        IF (ALL(V(:,IT)   .EQ.V(:,TI)) .AND.	!Yes. What about the data?
     1      ALL(GOOD(:,IT).EQ.GOOD(:,TI))) THEN	!This disregards details of the spacing of the data.
          WRITE (MSG,21) NREC,DATE(IT),"same."	!Also trailing zeroes, spurious + signs, blah blah.
   21     FORMAT ("Record",I8," Duplicate date field (",A,"), data ",A)	!Say it.
         ELSE				!But if they're not all equal,
          WRITE (MSG,21) NREC,DATE(IT),"different!"	!They're different!
        END IF					!So much for comparing the data.
      END IF				!So much for just comparing the date's text.
      IF (ALL(GOOD(:,IT).GT.0)) NV = NV + 1	!A fully healthy record, either way?
      GO TO 10		!More! More! I want more!!

Complaints. Should really distinguish between trouble in the date part and in the data part.
  600 WRITE (MSG,*) '"END" declared - insufficient data?'	!Not enough numbers, presumably.
      GO TO 602				!Reveal the record.
  601 WRITE (MSG,*) '"ERR" declared - improper number format?'	!Ah, but which number?
  602 WRITE (MSG,603) NREC,L,ACARD(1:L)	!Anyway, reveal the uninterpreted record.
  603 FORMAT("Record",I8,", length ",I0," reads ",A)	!Just so.
      HIC = HIC + 1			!This may grow into a habit.
      IF (HIC.LE.12) GO TO 10		!But if not yet, try the next record.
      STOP "Enough distaste."		!Or, give up.
  666 WRITE (MSG,101) NREC,"format error!"	!For A-style data? Should never happen!
      GO TO 900				!But if it does, give up!

Closedown.
  100 WRITE (MSG,101) NREC,"then end-of-file"	!Discovered on the next attempt.
  101 FORMAT ("Record",I8,": ",A)		!A record number plus a remark.
      WRITE (MSG,102) NV	!The overall results.
  102 FORMAT ("  with",I8," having all values good.")	!This should do.
  900 CLOSE(IN)		!Done.
      END	!Spaghetti rules.

Output:

Record      85 Duplicate date field (1990-03-25), data different!
Record     456 Duplicate date field (1991-03-31), data different!
Record     820 Duplicate date field (1992-03-29), data different!
Record    1184 Duplicate date field (1993-03-28), data different!
Record    1911 Duplicate date field (1995-03-26), data different!
Record    5471: then end-of-file
  with    5017 having all values good.

Go

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"strconv"
	"strings"
	"time"
)

const (
	filename   = "readings.txt"
	readings   = 24             // per line
	fields     = readings*2 + 1 // per line
	dateFormat = "2006-01-02"
)

func main() {
	file, err := os.Open(filename)
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()
	var allGood, uniqueGood int
	// map records not only dates seen, but also if an all-good record was
	// seen for the key date.
	m := make(map[time.Time]bool)
	s := bufio.NewScanner(file)
	for s.Scan() {
		f := strings.Fields(s.Text())
		if len(f) != fields {
			log.Fatal("unexpected format,", len(f), "fields.")
		}
		ts, err := time.Parse(dateFormat, f[0])
		if err != nil {
			log.Fatal(err)
		}
		good := true
		for i := 1; i < fields; i += 2 {
			flag, err := strconv.Atoi(f[i+1])
			if err != nil {
				log.Fatal(err)
			}
			if flag > 0 { // value is good
				_, err := strconv.ParseFloat(f[i], 64)
				if err != nil {
					log.Fatal(err)
				}
			} else { // value is bad
				good = false
			}
		}
		if good {
			allGood++
		}
		previouslyGood, seen := m[ts]
		if seen {
			fmt.Println("Duplicate datestamp:", f[0])
		}
		m[ts] = previouslyGood || good
		if !previouslyGood && good {
			uniqueGood++
		}
	}
	if err := s.Err(); err != nil {
		log.Fatal(err)
	}

	fmt.Println("\nData format valid.")
	fmt.Println(allGood, "records with good readings for all instruments.")
	fmt.Println(uniqueGood,
		"unique dates with good readings for all instruments.")
}
Output:
Duplicate datestamp: 1990-03-25
Duplicate datestamp: 1991-03-31
Duplicate datestamp: 1992-03-29
Duplicate datestamp: 1993-03-28
Duplicate datestamp: 1995-03-26

Data format valid.
5017 records with good readings for all instruments.
5013 unique dates with good readings for all instruments.

Haskell

import Data.List (nub, (\\))

data Record = Record {date :: String, recs :: [(Double, Int)]}
 
duplicatedDates rs = rs \\ nub rs

goodRecords = filter ((== 24) . length . filter ((>= 1) . snd) . recs)

parseLine l = let ws = words l in Record (head ws) (mapRecords (tail ws))
 
mapRecords [] = []
mapRecords [_] = error "invalid data"
mapRecords (value:flag:tail) = (read value, read flag) : mapRecords tail
 
main = do
  inputs <- (map parseLine . lines) `fmap` readFile "readings.txt"
  putStr (unlines ("duplicated dates:": duplicatedDates (map date inputs)))
  putStrLn ("number of good records: " ++ show (length $ goodRecords inputs))

this script outputs:

duplicated dates:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
number of good records: 5017

Icon and Unicon

The following works in both languages. It assumes there is nothing wrong with duplicated timestamps that are on well-formed records.

procedure main(A)
    dups := set()
    goodRecords := 0
    lastDate := badFile := &null
    f := A[1] | "readings.txt"
    fin := open(f) | stop("Cannot open file '",f,"'")
 
    while (fields := 0, badReading := &null, line := read(fin)) do {
        line ? {
            ldate := tab(many(&digits ++ '-')) | (badFile := "yes", next)
            if \lastDate == ldate then insert(dups, ldate)
            lastDate := ldate
            while tab(many(' \t')) do {
                (value := real(tab(many(&digits++'-.'))),
                 tab(many(' \t')),
                 flag := integer(tab(many(&digits++'-'))),
                 fields +:= 1) | (badFile := "yes")
                if flag < 1 then badReading := "yes"
                }
            }
        if fields = 24 then goodRecords +:= (/badReading, 1)
        else badFile := "yes"
        }

    if (\badFile) then write(f," has field format issues.")
    write("There are ",goodRecords," records with all good readings.")
    if *dups > 0 then {
        write("The following dates have multiple records:")
        every writes(" ",!sort(dups))
        write()
        }
 
end

Sample run:

->tp2
There are 5017 records with all good readings.
The following dates have multiple records:
 1990-03-25 1991-03-31 1992-03-29 1993-03-28 1995-03-26
->

J

   require 'tables/dsv dates'
   dat=: TAB readdsv jpath '~temp/readings.txt'
   Dates=: getdate"1 >{."1 dat
   Vals=:  _99 ". >(1 + +: i.24){"1 dat
   Flags=: _99 ". >(2 + +: i.24){"1 dat

   # Dates                      NB. Total # lines
5471
   +/ *./"1 ] 0 = Dates         NB. # lines with invalid date formats
0
   +/ _99 e."1 Vals,.Flags      NB. # lines with invalid value or flag formats
0
   +/ *./"1   [0 < Flags        NB. # lines with only valid flags
5017  
   ~. (#~ (i.~ ~: i:~)) Dates   NB. Duplicate dates
1990 3 25
1991 3 31
1992 3 29
1993 3 28
1995 3 26

Java

Translation of: C++
Works with: Java version 1.5+
import java.util.*;
import java.util.regex.*;
import java.io.*;

public class DataMunging2 {

    public static final Pattern e = Pattern.compile("\\s+");

    public static void main(String[] args) {
        try {
            BufferedReader infile = new BufferedReader(new FileReader(args[0]));
            List<String> duplicates = new ArrayList<String>();
            Set<String> datestamps = new HashSet<String>(); //for the datestamps

            String eingabe;
            int all_ok = 0;//all_ok for lines in the given pattern e
            while ((eingabe = infile.readLine()) != null) { 
                String[] fields = e.split(eingabe); //we tokenize on empty fields
                if (fields.length != 49) //we expect 49 fields in a record
                    System.out.println("Format not ok!");
                if (datestamps.add(fields[0])) { //not duplicated
                    int howoften = (fields.length - 1) / 2 ; //number of measurement
                                                             //devices and values
                    for (int n = 1; Integer.parseInt(fields[2*n]) >= 1; n++) {
                        if (n == howoften) {
                            all_ok++ ;
                            break ;
                        }
                    }
                } else {
                    duplicates.add(fields[0]); //first field holds datestamp
                }
            }
            infile.close();
            System.out.println("The following " + duplicates.size() + " datestamps were duplicated:");
            for (String x : duplicates)
                System.out.println(x);
            System.out.println(all_ok + " records were complete and ok!");
        } catch (IOException e) {
            System.err.println("Can't open file " + args[0]);
            System.exit(1);
        }
    }
}

The program produces the following output:

The following 5 datestamps were duplicated:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
5013 records were complete and ok!

JavaScript

Works with: JScript
// wrap up the counter variables in a closure.
function analyze_func(filename) {
    var dates_seen = {};
    var format_bad = 0;
    var records_all = 0;
    var records_good = 0;
    return function() {
        var fh = new ActiveXObject("Scripting.FileSystemObject").openTextFile(filename, 1); // 1 = for reading
        while ( ! fh.atEndOfStream) {
            records_all ++;
            var allOK = true;
            var line = fh.ReadLine();
            var fields = line.split('\t');
            if (fields.length != 49) {
                format_bad ++;
                continue;
            }

            var date = fields.shift();
            if (has_property(dates_seen, date)) 
                WScript.echo("duplicate date: " + date);
            else
                dates_seen[date] = 1;

            while (fields.length > 0) {
                var value = parseFloat(fields.shift());
                var flag = parseInt(fields.shift(), 10);
                if (isNaN(value) || isNaN(flag)) {
                    format_bad ++;
                }
                else if (flag <= 0) {
                    allOK = false;
                }
            }
            if (allOK)
                records_good ++;
        }
        fh.close();
        WScript.echo("total records: " + records_all);
        WScript.echo("Wrong format: " + format_bad);
        WScript.echo("records with no bad readings: " + records_good);
    }
}

function has_property(obj, propname) {
    return typeof(obj[propname]) == "undefined" ? false : true;
}

var analyze = analyze_func('readings.txt');
analyze();

jq

Works with: jq version with regex support

For this problem, it is convenient to use jq in a pipeline: the first invocation of jq will convert the text file into a stream of JSON arrays (one array per line):

$ jq -R '[splits("[ \t]+")]' Text_processing_2.txt

The second part of the pipeline performs the task requirements. The following program is used in the second invocation of jq.

Generic Utilities

# Given any array, produce an array of [item, count] pairs for each run.
def runs:
  reduce .[] as $item
    ( [];
      if . == [] then [ [ $item, 1] ] 
      else  .[length-1] as $last
            | if $last[0] == $item then (.[0:length-1] + [ [$item, $last[1] + 1] ] )
              else . + [[$item, 1]]
              end
      end ) ;

def is_float: test("^[-+]?[0-9]*[.][0-9]*([eE][-+]?[0-9]+)?$");

def is_integral: test("^[-+]?[0-9]+$");

def is_date: test("[12][0-9]{3}-[0-9][0-9]-[0-9][0-9]");

Validation:

# Report line and column numbers using conventional numbering (IO=1).
def validate_line(nr):
  def validate_date:
    if is_date then empty else "field 1 in line \(nr) has an invalid date: \(.)" end;
  def validate_length(n):
    if length == n then empty else "line \(nr) has \(length) fields" end;
  def validate_pair(i):
    ( .[2*i + 1] as $n
      | if ($n | is_float) then empty else "field \(2*i + 2) in line \(nr) is not a float: \($n)" end),
    ( .[2*i + 2] as $n
      | if ($n | is_integral) then empty else "field \(2*i + 3) in line \(nr) is not an integer: \($n)" end);
      
  (.[0] | validate_date),
  (validate_length(49)),
  (range(0; (length-1) / 2) as $i | validate_pair($i)) ;

def validate_lines:
 . as $in
 | range(0; length) as $i | ($in[$i] | validate_line($i + 1));

Check for duplicate timestamps

def duplicate_timestamps:
  [.[][0]] | sort | runs | map( select(.[1]>1) );

Number of valid readings for all instruments:

# The following ignores any issues with respect to duplicate dates,
# but does check the validity of the record, including the date format:
def number_of_valid_readings:
  def check:
    . as $in
    | (.[0] | is_date) 
      and length == 49 
      and all(range(0; 24) | $in[2*. + 1] | is_float) 
      and all(range(0; 24) | $in[2*. + 2] | (is_integral and tonumber >= 1) );

   map(select(check)) | length ;

Generate Report

validate_lines,
"\nChecking for duplicate timestamps:",
duplicate_timestamps,
"\nThere are \(number_of_valid_readings) valid rows altogether."
Output:

Part 1: Simple demonstration

To illustrate that the program does report invalid lines, we first use the six lines at the top but mangle the last line.

$ jq -R  '[splits("[ \t]+")]' Text_processing_2.txt | jq -s -r -f  Text_processing_2.jq
field 1 in line 6 has an invalid date: 991-04-03
line 6 has 47 fields
field 2 in line 6 is not a float: 10000
field 3 in line 6 is not an integer: 1.0
field 47 in line 6 is not an integer: x

Checking for duplicate timestamps:
[
  [
    "1991-03-31",
    2
  ]
]

There are 5 valid rows altogether.

Part 2: readings.txt

$ jq -R  '[splits("[ \t]+")]' readings.txt | jq -s -r -f  Text_processing_2.jq
Checking for duplicate timestamps:
[
  [
    "1990-03-25",
    2
  ],
  [
    "1991-03-31",
    2
  ],
  [
    "1992-03-29",
    2
  ],
  [
    "1993-03-28",
    2
  ],
  [
    "1995-03-26",
    2
  ]
]

There are 5017 valid rows altogether.

Julia

Refer to the code at https://rosettacode.org/wiki/Text_processing/1#Julia. Add at the end of that code the following:

dupdate = df[nonunique(df[:,[:Date]]),:][:Date]
println("The following rows have duplicate DATESTAMP:")
println(df[df[:Date] .== dupdate,:])
println("All values good in these rows:")
println(df[df[:ValidValues] .== 24,:])
Output:
The following rows have duplicate DATESTAMP:
2×29 DataFrames.DataFrame
│ Row │ Date                │ Mean    │ ValidValues │ MaximumGap │ GapPosition │ 0:00 │ 1:00 │ 2:00 │ 3:00 │ 4:00 │
├─────┼─────────────────────┼─────────┼─────────────┼────────────┼─────────────┼──────┼──────┼──────┼──────┼──────┤
│ 1   │ 1991-03-31T00:00:00 │ 23.5417 │ 24          │ 0          │ 0           │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2   │ 1991-03-31T00:00:00 │ 40.0    │ 1           │ 23         │ 2           │ 40.0 │ NaN  │ NaN  │ NaN  │ NaN  │

│ Row │ 5:00 │ 6:00 │ 7:00 │ 8:00 │ 9:00 │ 10:00 │ 11:00 │ 12:00 │ 13:00 │ 14:00 │ 15:00 │ 16:00 │ 17:00 │ 18:00 │
├─────┼──────┼──────┼──────┼──────┼──────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ 10.0 │ 10.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0  │ 50.0  │ 60.0  │ 40.0  │ 30.0  │ 30.0  │ 30.0  │ 25.0  │ 20.0  │
│ 2   │ NaN  │ NaN  │ NaN  │ NaN  │ NaN  │ NaN   │ NaN   │ NaN   │ NaN   │ NaN   │ NaN   │ NaN   │ NaN   │ NaN   │

│ Row │ 19:00 │ 20:00 │ 21:00 │ 22:00 │ 23:00 │
├─────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ 20.0  │ 20.0  │ 20.0  │ 20.0  │ 35.0  │
│ 2   │ NaN   │ NaN   │ NaN   │ NaN   │ NaN   │
All values good in these rows:
4×29 DataFrames.DataFrame
│ Row │ Date                │ Mean    │ ValidValues │ MaximumGap │ GapPosition │ 0:00 │ 1:00 │ 2:00 │ 3:00 │ 4:00 │
├─────┼─────────────────────┼─────────┼─────────────┼────────────┼─────────────┼──────┼──────┼──────┼──────┼──────┤
│ 1   │ 1991-03-30T00:00:00 │ 10.0    │ 24          │ 0          │ 0           │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 2   │ 1991-03-31T00:00:00 │ 23.5417 │ 24          │ 0          │ 0           │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │
│ 3   │ 1991-04-02T00:00:00 │ 19.7917 │ 24          │ 0          │ 0           │ 8.0  │ 9.0  │ 11.0 │ 12.0 │ 12.0 │
│ 4   │ 1991-04-03T00:00:00 │ 13.9583 │ 24          │ 0          │ 0           │ 10.0 │ 9.0  │ 10.0 │ 10.0 │ 9.0  │

│ Row │ 5:00 │ 6:00 │ 7:00 │ 8:00 │ 9:00 │ 10:00 │ 11:00 │ 12:00 │ 13:00 │ 14:00 │ 15:00 │ 16:00 │ 17:00 │ 18:00 │
├─────┼──────┼──────┼──────┼──────┼──────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0 │ 10.0  │ 10.0  │ 10.0  │ 10.0  │ 10.0  │ 10.0  │ 10.0  │ 10.0  │ 10.0  │
│ 2   │ 10.0 │ 10.0 │ 20.0 │ 20.0 │ 20.0 │ 35.0  │ 50.0  │ 60.0  │ 40.0  │ 30.0  │ 30.0  │ 30.0  │ 25.0  │ 20.0  │
│ 3   │ 12.0 │ 27.0 │ 26.0 │ 27.0 │ 33.0 │ 32.0  │ 31.0  │ 29.0  │ 31.0  │ 25.0  │ 25.0  │ 24.0  │ 21.0  │ 17.0  │
│ 4   │ 10.0 │ 15.0 │ 24.0 │ 28.0 │ 24.0 │ 18.0  │ 14.0  │ 12.0  │ 13.0  │ 14.0  │ 15.0  │ 14.0  │ 15.0  │ 13.0  │

│ Row │ 19:00 │ 20:00 │ 21:00 │ 22:00 │ 23:00 │
├─────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ 10.0  │ 10.0  │ 10.0  │ 10.0  │ 10.0  │
│ 2   │ 20.0  │ 20.0  │ 20.0  │ 20.0  │ 35.0  │
│ 3   │ 14.0  │ 15.0  │ 12.0  │ 12.0  │ 10.0  │
│ 4   │ 13.0  │ 13.0  │ 12.0  │ 10.0  │ 10.0  │

Kotlin

// version 1.2.31

import java.io.File

fun main(args: Array<String>) {
    val rx = Regex("""\s+""")
    val file = File("readings.txt")
    var count = 0
    var invalid = 0
    var allGood = 0
    var map = mutableMapOf<String, Int>()
    file.forEachLine { line ->
        count++
        val fields = line.split(rx)
        val date = fields[0]
        if (fields.size == 49) {
            if (map.containsKey(date))
                map[date] = map[date]!! + 1
            else
                map.put(date, 1)
            var good = 0
            for (i in 2 until fields.size step 2) {
                if (fields[i].toInt() >= 1) {
                    good++
                }
            }
            if (good == 24) allGood++
        }
        else invalid++
    }

    println("File = ${file.name}")
    println("\nDuplicated dates:")
    for ((k,v) in map) {
        if (v > 1) println("  $k ($v times)")
    }
    println("\nTotal number of records   : $count")
    var percent = invalid.toDouble() / count * 100.0
    println("Number of invalid records : $invalid (${"%5.2f".format(percent)}%)")
    percent = allGood.toDouble() / count * 100.0
    println("Number which are all good : $allGood (${"%5.2f".format(percent)}%)")
}
Output:
File = readings.txt

Duplicated dates:
  1990-03-25 (2 times)
  1991-03-31 (2 times)
  1992-03-29 (2 times)
  1993-03-28 (2 times)
  1995-03-26 (2 times)

Total number of records   : 5471
Number of invalid records : 0 ( 0.00%)
Number which are all good : 5017 (91.70%)

Lua

filename = "readings.txt"
io.input( filename )

dates = {}
duplicated, bad_format = {}, {}
num_good_records, lines_total = 0, 0

while true do
    line = io.read( "*line" )
    if line == nil then break end
    
    lines_total = lines_total + 1

    date = string.match( line, "%d+%-%d+%-%d+" )
    if dates[date] ~= nil then
        duplicated[#duplicated+1] = date
    end    
    dates[date] = 1
    
    count_pairs, bad_values = 0, false
    for v, w in string.gmatch( line, "%s(%d+[%.%d+]*)%s(%-?%d)" ) do        
        count_pairs = count_pairs + 1        
        if tonumber(w) <= 0 then 
            bad_values = true 
        end        
    end
    if count_pairs ~= 24 then 
        bad_format[#bad_format+1] = date
    end
    if not bad_values then
        num_good_records = num_good_records + 1
    end
end

print( "Lines read:", lines_total )
print( "Valid records: ", num_good_records )
print( "Duplicate dates:" )
for i = 1, #duplicated do
    print( "   ", duplicated[i] )
end
print( "Bad format:" )
for i = 1, #bad_format do
    print( "   ", bad_format[i] )
end

Output:

Lines read:	5471
Valid records: 	5017
Duplicate dates:
   	1990-03-25
   	1991-03-31
   	1992-03-29
   	1993-03-28
   	1995-03-26
Bad format:

M2000 Interpreter

File is in user dir. Use Win Dir$ to open the explorer window and copy there the readings.txt

Module TestThis {
	Document a$, exp$
	\\ automatic find the enconding and the line break
	Load.doc a$, "readings.txt"
	m=0
	n=doc.par(a$)
	k=list
	nl$={
	}
	l=0
	exp$=format$("Records: {0}", n)+nl$
	For i=1 to n
		b$=paragraph$(a$, i)
		If exist(k,Left$(b$, 10)) then
			m++ : where=eval(k)
			exp$=format$("Duplicate for {0} at {1}",where, i)+nl$
		Else
			Append k, Left$(b$, 10):=i
		End if
		Stack New {
			Stack  Mid$(Replace$(chr$(9)," ", b$), 11)
			while not empty {
				Read a, b
				if b<=0 then l++ : exit
			}
		}
	Next
	exp$= format$("Duplicates {0}",m)+nl$
	exp$= format$("Valid Records {0}",n-l)+nl$
	clipboard exp$
	report exp$
}
TestThis
Output:
Records: 5471
Duplicate for 84 at 85
Duplicate for 455 at 456
Duplicate for 819 at 820
Duplicate for 1183 at 1184
Duplicate for 1910 at 1911
Duplicates 5
Valid Records 5017

Mathematica/Wolfram Language

data = Import["Readings.txt","TSV"]; Print["duplicated dates: "];
Select[Tally@data[[;;,1]], #[[2]]>1&][[;;,1]]//Column
Print["number of good records: ", Count[(Times@@#[[3;;All;;2]])& /@ data, 1],
" (out of a total of ", Length[data], ")"]
Output:
duplicated dates: 
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
number of good records: 5017 (out of a total of 5471)

MATLAB / Octave

function [val,count] = readdat(configfile)
% READDAT reads readings.txt file 
%
% The value of boolean parameters can be tested with 
%    exist(parameter,'var')

if nargin<1, 
   filename = 'readings.txt';
end;

fid = fopen(filename); 
if fid<0, error('cannot open file %s\n',a); end; 
[val,count] = fscanf(fid,'%04d-%02d-%02d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d %f %d \n');
fclose(fid); 

count = count/51;

if (count<1) || count~=floor(count),
     error('file has incorrect format\n')
end;

val = reshape(val,51,count)';   % make matrix with 51 rows and count columns, then transpose it. 

d = datenum(val(:,1:3));	% compute timestamps 

printf('The following records are followed by a duplicate:');
dix = find(diff(d)==0)		% check for to consequtive timestamps with zero difference

printf('number of valid records: %i\n ', sum( all( val(:,5:2:end) >= 1, 2) ) );
>> [val,count]=readdat;
The following records are followed by a duplicate:dix =

     84
    455
    819
   1183
   1910

number of valid records: 5017

Nim

import strutils, tables

const NumFields = 49
const DateField = 0
const FlagGoodValue = 1

var badRecords: int       # Number of records that have invalid formatted values.
var totalRecords: int     # Total number of records in the file.
var badInstruments: int   # Total number of records that have at least one instrument showing error.
var seenDates: Table[string, bool]  # Table to keep track of what dates we have seen.

proc checkFloats(floats: seq[string]): bool =
  ## Ensure we can parse all records as floats (except the date stamp).
  for index in 1..<NumFields:
    try:
      # We're assuming all instrument flags are floats not integers.
      discard parseFloat(floats[index])
    except ValueError:
      return false
  true

proc areAllFlagsOk(instruments: seq[string]): bool =
  ## Ensure that all sensor flags are ok.

  # Flags start at index 2, and occur every 2 fields.
  for index in countup(2, NumFields, 2):
    # We're assuming all instrument flags are floats not integers
    var flag = parseFloat(instruments[index])
    if flag < FlagGoodValue: return false
  true


# Note: we're not checking the format of the date stamp.

# Main.

var currentLine = 0
for line in "readings.txt".lines:
  currentLine.inc
  if line.len == 0: continue    # Empty lines don't count as records.

  var tokens = line.split({' ', '\t'})
  totalRecords.inc

  if tokens.len != NumFields:
    badRecords.inc
    continue

  if not checkFloats(tokens):
    badRecords.inc
    continue

  if not areAllFlagsOk(tokens):
    badInstruments.inc

  if seenDates.hasKeyOrPut(tokens[DateField], true):
    echo tokens[DateField], " duplicated on line ", currentLine

let goodRecords = totalRecords - badRecords
let goodInstruments = goodRecords - badInstruments

echo "Total Records: ", totalRecords
echo "Records with wrong format: ", badRecords
echo "Records where all instruments were OK: ", goodInstruments
Output:
1990-03-25 duplicated on line 85
1991-03-31 duplicated on line 456
1992-03-29 duplicated on line 820
1993-03-28 duplicated on line 1184
1995-03-26 duplicated on line 1911
Total Records: 5471
Records with wrong format: 0
Records where all instruments were OK: 5017

OCaml

#load "str.cma"
open Str

let strip_cr str =
  let last = pred (String.length str) in
  if str.[last] <> '\r' then str else String.sub str 0 last

let map_records =
  let rec aux acc = function
    | value::flag::tail ->
        let e = (float_of_string value, int_of_string flag) in
        aux (e::acc) tail
    | [_] -> invalid_arg "invalid data"
    | [] -> List.rev acc
  in
  aux [] ;;

let duplicated_dates =
  let same_date (d1,_) (d2,_) = (d1 = d2) in
  let date (d,_) = d in
  let rec aux acc = function
    | a::b::tl when same_date a b ->
        aux (date a::acc) tl
    | _::tl ->
        aux acc tl
    | [] ->
        List.rev acc
  in
  aux [] ;;

let record_ok (_,record) =
  let is_ok (_,v) = v >= 1 in
  let sum_ok =
    List.fold_left (fun sum this ->
      if is_ok this then succ sum else sum) 0 record
  in
  sum_ok = 24

let num_good_records =
  List.fold_left  (fun sum record ->
    if record_ok record then succ sum else sum) 0 ;;

let parse_line line =
  let li = split (regexp "[ \t]+") line in
  let records = map_records (List.tl li)
  and date = List.hd li in
  (date, records)

let () =
  let ic = open_in "readings.txt" in
  let rec read_loop acc =
    let line_opt = try Some (strip_cr (input_line ic))
                   with End_of_file -> None
    in
    match line_opt with
      None -> close_in ic; List.rev acc
    | Some line -> read_loop (parse_line line :: acc)
  in
  let inputs = read_loop [] in

  Printf.printf "%d total lines\n" (List.length inputs);

  Printf.printf "duplicated dates:\n";
  let dups = duplicated_dates inputs in
  List.iter print_endline dups;

  Printf.printf "number of good records: %d\n" (num_good_records inputs);
;;

this script outputs:

5471 total lines
duplicated dates:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26
number of good records: 5017

Perl

use List::MoreUtils 'natatime';
use constant FIELDS => 49;

binmode STDIN, ':crlf';
  # Read the newlines properly even if we're not running on
  # Windows.

my ($line, $good_records, %dates) = (0, 0);
while (<>)
   {++$line;
    my @fs = split /\s+/;
    @fs == FIELDS or die "$line: Bad number of fields.\n";
    for (shift @fs)
       {/\d{4}-\d{2}-\d{2}/ or die "$line: Bad date format.\n";
        ++$dates{$_};}
    my $iterator = natatime 2, @fs;
    my $all_flags_okay = 1;
    while ( my ($val, $flag) = $iterator->() )
       {$val =~ /\d+\.\d+/ or die "$line: Bad value format.\n";
        $flag =~ /\A-?\d+/ or die "$line: Bad flag format.\n";
        $flag < 1 and $all_flags_okay = 0;}
    $all_flags_okay and ++$good_records;}

print "Good records: $good_records\n",
   "Repeated timestamps:\n",
   map {"  $_\n"}
   grep {$dates{$_} > 1}
   sort keys %dates;

Output:

Good records: 5017
Repeated timestamps:
  1990-03-25
  1991-03-31
  1992-03-29
  1993-03-28
  1995-03-26

Phix

-- demo\rosetta\TextProcessing2.exw
with javascript_semantics -- (include version/first of next three lines only)
include readings.e -- global constant lines, or:
--assert(write_lines("readings.txt",lines)!=-1) -- first run, then:
--constant lines = read_lines("readings.txt")
 
include builtins\timedate.e
 
integer all_good = 0
 
string fmt = "%d-%d-%d\t"&join(repeat("%f",48),'\t')
sequence extset = sq_mul(tagset(24),2), -- {2,4,6,..48}
         curr, last 

for i=1 to length(lines) do
    string li = lines[i]
    sequence r = scanf(li,fmt)
    if length(r)!=1 then
        printf(1,"bad line [%d]:%s\n",{i,li})
    else
        curr = r[1][1..3]
        if i>1 and curr=last then
            printf(1,"duplicate line for %04d/%02d/%02d\n",last)
        end if
        last = curr
        all_good += sum(sq_le(extract(r[1][4..$],extset),0))=0
    end if
end for
 
printf(1,"Valid records %d of %d total\n",{all_good, length(lines)})

?"done"
{} = wait_key()
Output:
duplicate line for 1990/03/25
duplicate line for 1991/03/31
duplicate line for 1992/03/29
duplicate line for 1993/03/28
duplicate line for 1995/03/26
Valid records 5017 of 5471 total

PHP

$handle = fopen("readings.txt", "rb");
$missformcount = 0;
$totalcount = 0;
$dates = array();
while (!feof($handle)) {
    $buffer = fgets($handle);
	$line = preg_replace('/\s+/',' ',$buffer);
	$line = explode(' ',trim($line));
	$datepattern = '/^\d{4}-\d{2}-\d{2}$/';
	$valpattern = '/^\d+\.{1}\d{3}$/';
	$flagpattern = '/^[1-9]{1}$/';
	
	if(count($line) != 49) $missformcount++;
	if(!preg_match($datepattern,$line[0],$check)) $missformcount++;
	else $dates[$totalcount+1] = $check[0];
	
	$errcount = 0;
	for($i=1;$i<count($line);$i++){
		if($i%2!=0){
			if(!preg_match($valpattern,$line[$i],$check)) $errcount++;
		}else{
			if(!preg_match($flagpattern,$line[$i],$check)) $errcount++;
		}
	}
	if($errcount != 0) $missformcount++;
	$totalcount++;
}
fclose ($handle);
$good = $totalcount - $missformcount;
$duplicates = array_diff_key( $dates , array_unique( $dates ));
echo 'Valid records ' . $good . ' of ' . $totalcount . ' total<br>';
echo 'Duplicates : <br>';
foreach ($duplicates as $key => $val){
	echo $val . ' at Line : ' . $key . '<br>';
}
Valid records 5017 of 5471 total
Duplicates :
1990-03-25 at Line : 85
1991-03-31 at Line : 456
1992-03-29 at Line : 820
1993-03-28 at Line : 1184
1995-03-26 at Line : 1911

Picat

import util.

go =>
  Readings = [split(Record) : Record in read_file_lines("readings.txt")],
  DateStamps = new_map(),
  GoodReadings = 0,
  foreach({Rec,Id} in zip(Readings,1..Readings.length))
    if Rec.length != 49 then printf("Entry %d has bad_length %d\n", Id, Rec.length) end,
    Date = Rec[1],
    if DateStamps.has_key(Date) then
      printf("Entry %d (date %w) is a duplicate of entry %w\n", Id, Date, DateStamps.get(Date))
    else 
      if sum([1: I in 3..2..49, check_field(Rec[I])]) == 0 then
         GoodReadings := GoodReadings + 1
      end
    end,
    DateStamps.put(Date, Id)
  end,
  nl,
  printf("Total readings: %d\n",Readings.len),
  printf("Good readings: %d\n",GoodReadings),    
  nl.

check_field(Field) =>
  Field == "-2" ; Field == "-1" ; Field == "0".
Output:
Entry 85 (date 1990-03-25) is a duplicate of entry 84
Entry 456 (date 1991-03-31) is a duplicate of entry 455
Entry 820 (date 1992-03-29) is a duplicate of entry 819
Entry 1184 (date 1993-03-28) is a duplicate of entry 1183
Entry 1911 (date 1995-03-26) is a duplicate of entry 1910

Total readings: 5471
Good readings: 5013


PicoLisp

Put the following into an executable file "checkReadings":

#!/usr/bin/picolisp /usr/lib/picolisp/lib.l

(load "@lib/misc.l")

(in (opt)
   (until (eof)
      (let Lst (split (line) "^I")
         (unless
            (and
               (= 49 (length Lst))     # Check total length
               ($dat (car Lst) "-")    # Check for valid date
               (fully                  # Check data format
                  '((L F)
                     (if F                         # Alternating:
                        (format L 3)               # Number
                        (>= 9 (format L) -9) ) )   # or flag
                  (cdr Lst)
                  '(T NIL .) ) )
            (prinl "Bad line format: " (glue " " Lst))
            (bye 1) ) ) ) )

(bye)

Then it can be called as

$ ./checkReadings readings.txt

PL/I

/* To process readings produced by automatic reading stations. */

check: procedure options (main);
   declare 1 date, 2 (yy, mm, dd) character (2),
           (j1, j2) character (1);
   declare old_date character (6);
   declare line character (330) varying;
   declare R(24) fixed decimal, Machine(24) fixed binary;
   declare (i, k, n, faulty static initial (0)) fixed binary;
   declare input file;

   open file (input) title ('/READINGS.TXT,TYPE(CRLF),RECSIZE(300)');

   on endfile (input) go to done;

   old_date = '';
   k = 0;
   do forever;
      k = k + 1;

      get file (input) edit (line) (L);
      get string(line) edit (yy, j1, mm, j2, dd) (a(4), a(1), a(2), a(1), a(2));

      line = substr(line, 11);

      do i = 1 to length(line);
         if substr(line, i, 1) = '09'x then substr(line, i, 1) = ' ';
      end;
      line = trim(line);
      n = tally(line, ' ') - tally (line, '  ') + 1;

      if n ^= 48 then
         do;
            put skip list ('There are ' || n || ' readings in line ' || k);
         end;

      n = n/2;
      line = line || ' ';

      get string(line) list ((R(i), Machine(i) do i = 1 to n));

      if any(Machine < 1) ^= '0'B then
         faulty = faulty + 1;
      if old_date ^= ' ' then if old_date = string(date) then
         put skip list ('Dates are the same at line' || k);
      old_date = string(date);
   end;
done:
   put skip list ('There were ' || k || ' sets of readings');
   put skip list ('There were ' || faulty || ' faulty readings' );
   put skip list ('There were ' || k-faulty || ' good readings' );
end check;

PowerShell

$dateHash = @{}
$goodLineCount = 0
get-content c:\temp\readings.txt |
    ForEach-Object {
        $line = $_.split(" |`t",2)
        if ($dateHash.containskey($line[0])) {
            $line[0] + " is duplicated"
        } else {
            $dateHash.add($line[0], $line[1])
        }
        $readings = $line[1].split()
        $goodLine = $true
        if ($readings.count -ne 48) { $goodLine = $false; "incorrect line length : $line[0]"  }
        for ($i=0; $i -lt $readings.count; $i++) {
            if ($i % 2 -ne 0) {                                
                if ([int]$readings[$i] -lt 1) {
                    $goodLine = $false
                }
            }
        }
        if ($goodLine) { $goodLineCount++ } 
    }
[string]$goodLineCount + " good lines"

Output:

1990-03-25 is duplicated
1991-03-31 is duplicated
1992-03-29 is duplicated
1993-03-28 is duplicated
1995-03-26 is duplicated
5017

An alternative using regular expression syntax:

$dateHash = @{}
$goodLineCount = 0
ForEach ($rawLine in ( get-content c:\temp\readings.txt) ){
    $line = $rawLine.split(" |`t",2)
    if ($dateHash.containskey($line[0])) {
        $line[0] + " is duplicated"
    } else {
        $dateHash.add($line[0], $line[1])
    }
    $readings = [regex]::matches($line[1],"\d+\.\d+\s-?\d")
    if ($readings.count -ne 24) { "incorrect number of readings for date " + $line[0] }
    $goodLine = $true
    foreach ($flagMatch in [regex]::matches($line[1],"\d\.\d*\s(?<flag>-?\d)")) {
        if ([int][string]$flagMatch.groups["flag"].value -lt 1) { 
            $goodLine = $false 
        }
    }
    if ($goodLine) { $goodLineCount++}
}
[string]$goodLineCount + " good lines"

Output:

1990-03-25 is duplicated
1991-03-31 is duplicated
1992-03-29 is duplicated
1993-03-28 is duplicated
1995-03-26 is duplicated
5017 good lines

PureBasic

Using regular expressions.

Define filename.s = "readings.txt"
#instrumentCount = 24

Enumeration
  #exp_date
  #exp_instruments
  #exp_instrumentStatus
EndEnumeration

Structure duplicate
  date.s
  firstLine.i
  line.i
EndStructure

NewMap dates() ;records line date occurs first
NewList duplicated.duplicate()
NewList syntaxError()
Define goodRecordCount, totalLines, line.s, i
Dim inputDate.s(0)
Dim instruments.s(0)
  
If ReadFile(0, filename)
  CreateRegularExpression(#exp_date, "\d+-\d+-\d+")
  CreateRegularExpression(#exp_instruments, "(\t|\x20)+(\d+\.\d+)(\t|\x20)+\-?\d")
  CreateRegularExpression(#exp_instrumentStatus, "(\t|\x20)+(\d+\.\d+)(\t|\x20)+")
  Repeat
    line = ReadString(0, #PB_Ascii)
    If line = "": Break: EndIf
    totalLines + 1
  
    ExtractRegularExpression(#exp_date, line, inputDate())
    If FindMapElement(dates(), inputDate(0))
      AddElement(duplicated())
      duplicated()\date = inputDate(0)
      duplicated()\firstLine = dates()
      duplicated()\line = totalLines
    Else
      dates(inputDate(0)) = totalLines
    EndIf
    
    ExtractRegularExpression(#exp_instruments, Mid(line, Len(inputDate(0)) + 1), instruments())
    Define pairsCount = ArraySize(instruments()), containsBadValues = #False
    For i =  0 To pairsCount
      If Val(ReplaceRegularExpression(#exp_instrumentStatus, instruments(i), "")) < 1
        containsBadValues = #True
        Break
      EndIf
    Next
    
    If pairsCount <> #instrumentCount - 1
      AddElement(syntaxError()): syntaxError() = totalLines
    EndIf
    If Not containsBadValues
      goodRecordCount + 1
    EndIf
  ForEver
  CloseFile(0)
  
  If OpenConsole()
    ForEach duplicated()
      PrintN("Duplicate date: " + duplicated()\date + " occurs on lines " + Str(duplicated()\line) + " and " + Str(duplicated()\firstLine) + ".")
    Next
    ForEach syntaxError()
      PrintN( "Syntax error in line " + Str(syntaxError()))
    Next
    PrintN(#CRLF$ + Str(goodRecordCount) + " of " + Str(totalLines) + " lines read were valid records.")
    
    Print(#CRLF$ + #CRLF$ + "Press ENTER to exit"): Input()
    CloseConsole()
  EndIf
EndIf

Sample output:

Duplicate date: 1990-03-25 occurs on lines 85 and 84.
Duplicate date: 1991-03-31 occurs on lines 456 and 455.
Duplicate date: 1992-03-29 occurs on lines 820 and 819.
Duplicate date: 1993-03-28 occurs on lines 1184 and 1183.
Duplicate date: 1995-03-26 occurs on lines 1911 and 1910.

5017 of 5471 lines read were valid records.

Python

import re
import zipfile
import StringIO

def munge2(readings):

   datePat = re.compile(r'\d{4}-\d{2}-\d{2}')
   valuPat = re.compile(r'[-+]?\d+\.\d+')
   statPat = re.compile(r'-?\d+')
   allOk, totalLines = 0, 0
   datestamps = set([])
   for line in readings:
      totalLines += 1
      fields = line.split('\t')
      date = fields[0]
      pairs = [(fields[i],fields[i+1]) for i in range(1,len(fields),2)]

      lineFormatOk = datePat.match(date) and \
         all( valuPat.match(p[0]) for p in pairs ) and \
         all( statPat.match(p[1]) for p in pairs )
      if not lineFormatOk:
         print 'Bad formatting', line
         continue
		
      if len(pairs)!=24 or any( int(p[1]) < 1 for p in pairs ):
         print 'Missing values', line
         continue

      if date in datestamps:
         print 'Duplicate datestamp', line
         continue
      datestamps.add(date)
      allOk += 1

   print 'Lines with all readings: ', allOk
   print 'Total records: ', totalLines

#zfs = zipfile.ZipFile('readings.zip','r')
#readings = StringIO.StringIO(zfs.read('readings.txt'))
readings = open('readings.txt','r')
munge2(readings)

The results indicate 5013 good records, which differs from the Awk implementation. The final few lines of the output are as follows

Missing values 2004-12-29	2.900	1	2.700	1	2.800	1	3.300	1	2.900	1	2.300	1	0.000	0	1.700	1	1.900	1	2.300	1	2.600	1	2.900	1	2.600	1	2.600	1	2.600	1	2.700	1	2.300	1	2.200	1	2.100	1	2.000	1	2.100	1	2.100	1	2.300	1	2.400	1

Missing values 2004-12-30	2.400	1	2.600	1	2.600	1	2.600	1	3.000	1	0.000	0	3.300	1	2.600	1	2.900	1	2.400	1	2.300	1	2.900	1	3.500	1	3.700	1	3.600	1	4.000	1	3.400	1	2.400	1	2.500	1	2.600	1	2.600	1	2.800	1	2.400	1	2.200	1

Missing values 2004-12-31	2.400	1	2.500	1	2.500	1	2.400	1	0.000	0	2.400	1	2.400	1	2.400	1	2.200	1	2.400	1	2.500	1	2.000	1	1.700	1	1.400	1	1.500	1	1.900	1	1.700	1	2.000	1	2.000	1	2.200	1	1.700	1	1.500	1	1.800	1	1.800	1

Lines with all readings:  5013
Total records:  5471

Second Version

Modification of the version above to:

  • Remove continue statements so it counts as the AWK example does.
  • Generate mostly summary information that is easier to compare to other solutions.
import re
import zipfile
import StringIO
 
def munge2(readings, debug=False):
 
   datePat = re.compile(r'\d{4}-\d{2}-\d{2}')
   valuPat = re.compile(r'[-+]?\d+\.\d+')
   statPat = re.compile(r'-?\d+')
   totalLines = 0
   dupdate, badform, badlen, badreading = set(), set(), set(), 0
   datestamps = set([])
   for line in readings:
      totalLines += 1
      fields = line.split('\t')
      date = fields[0]
      pairs = [(fields[i],fields[i+1]) for i in range(1,len(fields),2)]
 
      lineFormatOk = datePat.match(date) and \
         all( valuPat.match(p[0]) for p in pairs ) and \
         all( statPat.match(p[1]) for p in pairs )
      if not lineFormatOk:
         if debug: print 'Bad formatting', line
         badform.add(date)
         
      if len(pairs)!=24 or any( int(p[1]) < 1 for p in pairs ):
         if debug: print 'Missing values', line
      if len(pairs)!=24: badlen.add(date)
      if any( int(p[1]) < 1 for p in pairs ): badreading += 1
 
      if date in datestamps:
         if debug: print 'Duplicate datestamp', line
         dupdate.add(date)

      datestamps.add(date)

   print 'Duplicate dates:\n ', '\n  '.join(sorted(dupdate)) 
   print 'Bad format:\n ', '\n  '.join(sorted(badform)) 
   print 'Bad number of fields:\n ', '\n  '.join(sorted(badlen)) 
   print 'Records with good readings: %i = %5.2f%%\n' % (
      totalLines-badreading, (totalLines-badreading)/float(totalLines)*100 )
   print 'Total records: ', totalLines
 
readings = open('readings.txt','r')
munge2(readings)
bash$  /cygdrive/c/Python26/python  munge2.py 
Duplicate dates:
  1990-03-25
  1991-03-31
  1992-03-29
  1993-03-28
  1995-03-26
Bad format:
  
Bad number of fields:
  
Records with good readings: 5017 = 91.70%

Total records:  5471
bash$ 

R

# Read in data from file
dfr <- read.delim("d:/readings.txt", colClasses=c("character", rep(c("numeric", "integer"), 24)))
dates <- strptime(dfr[,1], "%Y-%m-%d")

# Any bad values?
dfr[which(is.na(dfr))]

# Any duplicated dates
dates[duplicated(dates)]

# Number of rows with no bad values
flags <- as.matrix(dfr[,seq(3,49,2)])>0
sum(apply(flags, 1, all))

Racket

#lang racket
(read-decimal-as-inexact #f)
;; files to read is a sequence, so it could be either a list or vector of files
(define (text-processing/2 files-to-read)
  (define seen-datestamps (make-hash))
  (define (datestamp-seen? ds) (hash-ref seen-datestamps ds #f))
  (define (datestamp-seen! ds pos) (hash-set! seen-datestamps ds pos))
  
  (define (fold-into-pairs l (acc null))
    (match l ['() (reverse acc)]
      [(list _) (reverse (cons l acc))]
      [(list-rest a b tl) (fold-into-pairs tl (cons (list a b) acc))]))
  
  (define (match-valid-field line pos)
    (match (string-split line)
      ;; if we don't hit an error, then the file is valid
      ((list-rest (not (pregexp #px"[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}")) _)
       (error 'match-valid-field "invalid format non-datestamp at head: ~a~%" line))
      
      ;; check for duplicates
      ((list-rest (? datestamp-seen? ds) _)
       (printf "duplicate datestamp: ~a at line: ~a (first seen at: ~a)~%"
               ds pos (datestamp-seen? ds))
       #f)
      
      ;; register the datestamp as seen, then move on to rest of match
      ((list-rest ds _) (=> next-match-rule) (datestamp-seen! ds pos) (next-match-rule))
      
      ((list-rest
        _
        (app fold-into-pairs
             (list (list (app string->number (and (? number?) vs))
                         (app string->number (and (? integer?) statuss)))
                   ...)))
       (=> next-match-rule)
       (unless (= (length vs) 24) (next-match-rule))
       (not (for/first ((s statuss) #:unless (positive? s)) #t)))
      
      ;; if we don't hit an error, then the file is valid
      (else (error 'match-valid-field "bad field format: ~a~%" line))))
  
  (define (sub-t-p/1)
    (for/sum ((line (in-lines))
              (line-number (in-naturals 1)))
      (if (match-valid-field line line-number) 1 0)))  
  (for/sum ((file-name files-to-read))
    (with-input-from-file file-name sub-t-p/1)))

(printf "~a records have good readings for all instruments~%"
        (text-processing/2 (current-command-line-arguments)))

Example session:

$ racket 2.rkt readings/readings.txt
duplicate datestamp: 1990-03-25 at line: 85 (first seen at: 84)
duplicate datestamp: 1991-03-31 at line: 456 (first seen at: 455)
duplicate datestamp: 1992-03-29 at line: 820 (first seen at: 819)
duplicate datestamp: 1993-03-28 at line: 1184 (first seen at: 1183)
duplicate datestamp: 1995-03-26 at line: 1911 (first seen at: 1910)
5013 records have good readings for all instruments

Raku

(formerly Perl 6)

Translation of: Perl
Works with: Rakudo version 2018.03

This version does validation with a single Raku regex that is much more readable than the typical regex, and arguably expresses the data structure more straightforwardly. Here we use normal quotes for literals, and \h for horizontal whitespace.

Variables like $good-record that are going to be autoincremented do not need to be initialized.

The .push method on a hash is magical and loses no information; if a duplicate key is found in the pushed pair, an array of values is automatically created of the old value and the new value pushed. Hence we can easily track all the lines that a particular duplicate occurred at.

The .all method does "junctional" logic: it autothreads through comparators as any English speaker would expect. Junctions can also short-circuit as soon as they find a value that doesn't match, and the evaluation order is up to the computer, so it can be optimized or parallelized.

The final line simply greps out the pairs from the hash whose value is an array with more than 1 element. (Those values that are not arrays nevertheless have a .elems method that always reports 1.) The .pairs is merely there for clarity; grepping a hash directly has the same effect. Note that we sort the pairs after we've grepped them, not before; this works fine in Raku, sorting on the key and value as primary and secondary keys. Finally, pairs and arrays provide a default print format that is sufficient without additional formatting in this case.

my $good-records;
my $line;
my %dates;

for lines() {
    $line++;
    / ^
    (\d ** 4 '-' \d\d '-' \d\d)
    [ \h+ \d+'.'\d+ \h+ ('-'?\d+) ] ** 24
    $ /
        or note "Bad format at line $line" and next;
    %dates.push: $0 => $line;
    $good-records++ if $1.all >= 1;
}

say "$good-records good records out of $line total";

say 'Repeated timestamps (with line numbers):';
.say for sort %dates.pairs.grep: *.value.elems > 1;

Output:

5017 good records out of 5471 total
Repeated timestamps (with line numbers):
1990-03-25 => [84 85]
1991-03-31 => [455 456]
1992-03-29 => [819 820]
1993-03-28 => [1183 1184]
1995-03-26 => [1910 1911]

REXX

This REXX program process the file mentioned in "text processing 1" and does further validate on the dates, flags, and data.

Some of the checks performed are:

  •   checks for duplicated date records.
  •   checks for a bad date (YYYY-MM-DD) format, among:
  •   wrong length
  •   year > current year
  •   year < 1970 (to allow for posthumous data)
  •   mm < 1 or mm > 12
  •   dd < 1 or dd > days for the month
  •   yyyy, dd, mm isn't numeric
  •   missing data (or flags)
  •   flag isn't an integer
  •   flag contains a decimal point
  •   data isn't numeric

In addition, all of the presented numbers may have commas inserted.

The program has (negated) code to write the report to a file in addition to the console.

/*REXX program to process  instrument data  from a  data file.                */
numeric digits 20                      /*allow for bigger numbers.            */
ifid='READINGS.TXT'                    /*name of the   input  file.           */
ofid='READINGS.OUT'                    /*  "   "  "   output    "             */
grandSum=0                             /*grand sum of the whole file.         */
grandFlg=0                             /*grand number of flagged data.        */
grandOKs=0
Lflag=0                                /*longest period of flagged data.      */
Cflag=0                                /*longest continuous flagged data.     */
oldDate =0                             /*placeholder of penultimate date.     */
w       =16                            /*width of fields when displayed.      */
dupDates=0                             /*count of duplicated timestamps.      */
badFlags=0                             /*count of bad flags  (not integer).   */
badDates=0                             /*count of bad dates  (bad format).    */
badData =0                             /*count of bad data   (not numeric).   */
ignoredR=0                             /*count of ignored records, bad records*/
maxInstruments=24                      /*maximum number of instruments.       */
yyyyCurr=right(date(),4)               /*get the current year (today).        */
monDD.  =31                            /*number of days in every month.       */
                                       /*# days in Feb. is figured on the fly.*/
monDD.4 =30
monDD.6 =30
monDD.9 =30
monDD.11=30

  do records=1  while lines(ifid)\==0  /*read until finished.                 */
  rec=linein(ifid)                     /*read the next record (line).         */
  parse var rec datestamp Idata        /*pick off the the dateStamp and data. */
  if datestamp==oldDate  then do       /*found a duplicate timestamp.         */
                              dupDates=dupDates+1   /*bump the dupDate counter*/
                              call sy datestamp copies('~',30),
                                       'is a duplicate of the',
                                       "previous datestamp."
                              ignoredR=ignoredR+1     /*bump # of ignoredRecs.*/
                              iterate  /*ignore this duplicate record.        */
                              end

  parse var datestamp yyyy '-' mm '-' dd   /*obtain YYYY, MM, and the DD.     */
  monDD.2=28+leapyear(yyyy)            /*how long is February in year  YYYY ? */
                                       /*check for various bad formats.       */
  if verify(yyyy||mm||dd,1234567890)\==0 |,
     length(datestamp)\==10   |,
     length(yyyy)\==4         |,
     length(mm  )\==2         |,
     length(dd  )\==2         |,
     yyyy<1970                |,
     yyyy>yyyyCurr            |,
     mm=0  | dd=0             |,
     mm>12 | dd>monDD.mm  then do
                               badDates=badDates+1
                               call sy datestamp copies('~'),
                                                 'has an illegal format.'
                               ignoredR=ignoredR+1  /*bump number ignoredRecs.*/
                               iterate              /*ignore this bad record. */
                               end
  oldDate=datestamp                    /*save datestamp for the next read.    */
  sum=0
  flg=0
  OKs=0

    do j=1  until Idata=''             /*process the instrument data.         */
    parse var Idata data.j flag.j Idata

    if pos('.',flag.j)\==0 |,          /*does flag have a decimal point  -or- */
       \datatype(flag.j,'W')  then do  /* ··· is the flag not a whole number? */
                                   badFlags=badFlags+1 /*bump badFlags counter*/
                                   call sy datestamp copies('~'),
                                           'instrument' j "has a bad flag:",
                                           flag.j
                                   iterate       /*ignore it and it's data.   */
                                   end

    if \datatype(data.j,'N')  then do  /*is the flag not a whole number?*/
                                   badData=badData+1      /*bump counter.*/
                                   call sy datestamp copies('~'),
                                           'instrument' j "has bad data:",
                                           data.j
                                   iterate       /*ignore it & it's flag.*/
                                   end

    if flag.j>0  then do               /*if good data, ~~~                    */
                      OKs=OKs+1
                      sum=sum+data.j
                      if Cflag>Lflag  then do
                                           Ldate=datestamp
                                           Lflag=Cflag
                                           end
                      Cflag=0
                      end
                 else do               /*flagged data ~~~                     */
                      flg=flg+1
                      Cflag=Cflag+1
                      end
    end   /*j*/

  if j>maxInstruments then do
                           badData=badData+1       /*bump the badData counter.*/
                           call sy datestamp copies('~'),
                                   'too many instrument datum'
                           end

  if OKs\==0  then avg=format(sum/OKs,,3)
              else avg='[n/a]'
  grandOKs=grandOKs+OKs
  _=right(commas(avg),w)
  grandSum=grandSum+sum
  grandFlg=grandFlg+flg
  if flg==0  then  call sy datestamp ' average='_
             else  call sy datestamp ' average='_ '  flagged='right(flg,2)
  end   /*records*/

records=records-1                      /*adjust for reading the  end─of─file. */
if grandOKs\==0  then grandAvg=format(grandsum/grandOKs,,3)
                 else grandAvg='[n/a]'
call sy
call sy copies('=',60)
call sy '      records read:'  right(commas(records ),w)
call sy '   records ignored:'  right(commas(ignoredR),w)
call sy '     grand     sum:'  right(commas(grandSum),w+4)
call sy '     grand average:'  right(commas(grandAvg),w+4)
call sy '     grand OK data:'  right(commas(grandOKs),w)
call sy '     grand flagged:'  right(commas(grandFlg),w)
call sy '   duplicate dates:'  right(commas(dupDates),w)
call sy '         bad dates:'  right(commas(badDates),w)
call sy '         bad  data:'  right(commas(badData ),w)
call sy '         bad flags:'  right(commas(badFlags),w)
if Lflag\==0 then call sy '   longest flagged:' right(commas(LFlag),w) " ending at " Ldate
call sy copies('=',60)
exit                                   /*stick a fork in it,  we're all  done.*/
/*────────────────────────────────────────────────────────────────────────────*/
commas: procedure;  parse arg _;   n=_'.9';    #=123456789;    b=verify(n,#,"M")
        e=verify(n,#'0',,verify(n,#"0.",'M'))-4
           do j=e  to b  by -3;   _=insert(',',_,j);    end  /*j*/;     return _
/*────────────────────────────────────────────────────────────────────────────*/
leapyear: procedure; arg y             /*year could be:  Y,  YY,  YYY, or YYYY*/
if length(y)==2 then y=left(right(date(),4),2)y      /*adjust for   YY   year.*/
if y//4\==0     then return 0          /* not divisible by 4?   Not a leapyear*/
return y//100\==0 | y//400==0          /*apply the 100  and the 400 year rule.*/
/*────────────────────────────────────────────────────────────────────────────*/
sy:     say arg(1);               call lineout ofid,arg(1);             return

output   when using the default input file:

  ∙
  ∙
  ∙
1991-03-31  average=          23.542
1991-03-31 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ is a duplicate of the previous datestamp.
1991-04-01  average=          23.217   flagged= 1
1991-04-02  average=          19.792
1991-04-03  average=          13.958
  ∙
  ∙
  ∙
============================================================
      records read:            5,471
   records ignored:                5
     grand     sum:        1,357,152.400
     grand average:               10.496
     grand OK data:          129,306
     grand flagged:            1,878
   duplicate dates:                5
         bad dates:                0
         bad  data:                0
         bad flags:                0
   longest flagged:              589  ending at  1993-03-05
============================================================

Ruby

require 'set'

def munge2(readings, debug=false)
   datePat = /^\d{4}-\d{2}-\d{2}/
   valuPat = /^[-+]?\d+\.\d+/
   statPat = /^-?\d+/
   totalLines = 0
   dupdate, badform, badlen, badreading = Set[], Set[], Set[], 0
   datestamps = Set[[]]
   for line in readings
      totalLines += 1
      fields = line.split(/\t/)
      date = fields.shift
      pairs = fields.enum_slice(2).to_a
 
      lineFormatOk = date =~ datePat &&
        pairs.all? { |x,y| x =~ valuPat && y =~ statPat }
      if !lineFormatOk
         puts 'Bad formatting ' + line if debug
         badform << date
      end
         
      if pairs.length != 24 ||
           pairs.any? { |x,y| y.to_i < 1 }
         puts 'Missing values ' + line if debug
      end
      if pairs.length != 24
         badlen << date
      end
      if pairs.any? { |x,y| y.to_i < 1 }
         badreading += 1
      end
 
      if datestamps.include?(date)
         puts 'Duplicate datestamp ' + line if debug
         dupdate << date
      end

      datestamps << date
   end

   puts 'Duplicate dates:', dupdate.sort.map { |x| '  ' + x }
   puts 'Bad format:', badform.sort.map { |x| '  ' + x }
   puts 'Bad number of fields:', badlen.sort.map { |x| '  ' + x }
   puts 'Records with good readings: %i = %5.2f%%' % [
      totalLines-badreading, (totalLines-badreading)/totalLines.to_f*100 ]
   puts
   puts 'Total records:  %d' % totalLines
end

open('readings.txt','r') do |readings|
   munge2(readings)
end

Scala

Works with: Scala version 2.8
object DataMunging2 {
  import scala.io.Source
  import scala.collection.immutable.{TreeMap => Map}

  val pattern = """^(\d+-\d+-\d+)""" + """\s+(\d+\.\d+)\s+(-?\d+)""" * 24 + "$" r;

  def main(args: Array[String]) {
    val files = args map (new java.io.File(_)) filter (file => file.isFile && file.canRead)
    val (numFormatErrors, numValidRecords, dateMap) =
      files.iterator.flatMap(file => Source fromFile file getLines ()).
        foldLeft((0, 0, new Map[String, Int] withDefaultValue 0)) {
        case ((nFE, nVR, dM), line) => pattern findFirstMatchIn line map (_.subgroups) match {
          case Some(List(date, rawData @ _*)) =>
            val allValid = (rawData map (_ toDouble) iterator) grouped 2 forall (_.last > 0)
            (nFE, nVR + (if (allValid) 1 else 0), dM(date) += 1)
          case None => (nFE + 1, nVR, dM)
        }
      }

    dateMap foreach {
      case (date, repetitions) if repetitions > 1 => println(date+": "+repetitions+" repetitions")
      case _ =>
    }

    println("""|
               |Valid records: %d
               |Duplicated dates: %d
               |Duplicated records: %d
               |Data format errors: %d
               |Invalid data records: %d
               |Total records: %d""".stripMargin format (
              numValidRecords,
              dateMap filter { case (_, repetitions) => repetitions > 1 } size,
              dateMap.valuesIterable filter (_ > 1) map (_ - 1) sum,
              numFormatErrors,
              dateMap.valuesIterable.sum - numValidRecords,
              dateMap.valuesIterable.sum))
  }
}

Sample output:

1990-03-25: 2 repetitions
1991-03-31: 2 repetitions
1992-03-29: 2 repetitions
1993-03-28: 2 repetitions
1995-03-26: 2 repetitions

Valid records: 5017
Duplicated dates: 5
Duplicated records: 5
Data format errors: 0
Invalid data records: 454
Total records: 5471

Sidef

Translation of: Raku
var good_records = 0;
var dates = Hash();

ARGF.each { |line|
    var m = /^(\d\d\d\d-\d\d-\d\d)((?:\h+\d+\.\d+\h+-?\d+){24})\s*$/.match(line);
    m || (warn "Bad format at line #{$.}"; next);
    dates{m[0]} := 0 ++;
    var i = 0;
    m[1].words.all{|n| i++.is_even || (n.to_num >= 1) } && ++good_records;
}

say "#{good_records} good records out of #{$.} total";
say 'Repeated timestamps:';
say dates.to_a.grep{ .value > 1 }.map { .key }.sort.join("\n");
Output:
$ sidef script.sf < readings.txt
5017 good records out of 5471 total
Repeated timestamps:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26

Snobol4

Developed using the Snobol4 dialect Spitbol for Linux, version 4.0

* Read text/2

	v = array(24)
	f = array(24)
	tos = char(9) " " ;* break characters are both tab and space
	pat1 = break(tos) . dstamp
	pat2 = span(tos) break(tos) . *v[i] span(tos) (break(tos) | (len(1) rem)) . *f[i]
	rowcount = 0
	hold_dstamp = ""
	num_bad_rows = 0
	num_invalid_rows = 0

in0
	row = input :f(endinput)
	rowcount = rowcount + 1
	row ? pat1 = :f(invalid_row)

* duplicated datestamp?
* if dstamp = hold_dstamp then duplicated
	hold_dstamp = differ(hold_dstamp,dstamp) dstamp :s(nodup)
	output = dstamp ": datestamp at row " rowcount " duplicates datestamp at " rowcount - 1
nodup

	i = 1
in1
	row ? pat2 = :f(invalid_row)
	i = lt(i,24) i + 1 :s(in1)

* Is this a goodrow?
* if any flag is < 1 then row has bad data
	c = 0
goodrow
	c = lt(c,24) c + 1 :f(goodrow2)
	num_bad_rows = lt(f[c],1) num_bad_rows + 1 :s(goodrow2)f(goodrow)
goodrow2

	:(in0)
	
invalid_row
	num_invalid_rows = num_invalid_rows + 1
	:(in0)
	
endinput
	output =
	output = "Total number of rows                    : " rowcount
	output = "Total number of rows with invalid format: " num_invalid_rows
	output = "Total number of rows with bad data      : " num_bad_rows
	output = "Total number of good rows               : " rowcount - num_invalid_rows - num_bad_rows 

end
Output:
1990-03-25: datestamp at row 85 duplicates datestamp at 84
1991-03-31: datestamp at row 456 duplicates datestamp at 455
1992-03-29: datestamp at row 820 duplicates datestamp at 819
1993-03-28: datestamp at row 1184 duplicates datestamp at 1183
1995-03-26: datestamp at row 1911 duplicates datestamp at 1910

Total number of rows                    : 5471
Total number of rows with invalid format: 0
Total number of rows with bad data      : 454
Total number of good rows               : 5017

Tcl

set data [lrange [split [read [open "readings.txt" "r"]] "\n"] 0 end-1]
set total [llength $data]
set correct $total
set datestamps {}

foreach line $data {
    set formatOk true
    set hasAllMeasurements true

    set date [lindex $line 0]
    if {[llength $line] != 49} { set formatOk false }
    if {![regexp {\d{4}-\d{2}-\d{2}} $date]} { set formatOk false }
    if {[lsearch $datestamps $date] != -1} { puts "Duplicate datestamp: $date" } {lappend datestamps $date}

    foreach {value flag} [lrange $line 1 end] {
        if {$flag < 1} { set hasAllMeasurements false }

        if {![regexp -- {[-+]?\d+\.\d+} $value] || ![regexp -- {-?\d+} $flag]} {set formatOk false}
    }   
    if {!$hasAllMeasurements} { incr correct -1 }
    if {!$formatOk} { puts "line \"$line\" has wrong format" }
}

puts "$correct records with good readings = [expr $correct * 100.0 / $total]%"
puts "Total records: $total"
$ tclsh munge2.tcl 
Duplicate datestamp: 1990-03-25
Duplicate datestamp: 1991-03-31
Duplicate datestamp: 1992-03-29
Duplicate datestamp: 1993-03-28
Duplicate datestamp: 1995-03-26
5017 records with good readings = 91.7016998721%
Total records: 5471

Second version

To demonstate a different method to iterate over the file, and different ways to verify data types:

set total [set good 0]
array set seen {}
set fh [open readings.txt]
while {[gets $fh line] != -1} {
    incr total
    set fields [regexp -inline -all {[^ \t\r\n]+} $line]
    if {[llength $fields] != 49} {
        puts "bad format: not 49 fields on line $total"
        continue
    }
    if { ! [regexp {^(\d{4}-\d\d-\d\d)$} [lindex $fields 0] -> date]} {
        puts "bad format: invalid date on line $total: '$date'"
        continue
    }

    if {[info exists seen($date)]} {
        puts "duplicate date on line $total: $date"
    }
    incr seen($date)
    
    set line_format_ok true
    set readings_ignored 0
    foreach {value flag} [lrange $fields 1 end] {
        if { ! [string is double -strict $value]} {
            puts "bad format: value not a float on line $total: '$value'"
            set line_format_ok false
        }
        if { ! [string is int -strict $flag]} {
            puts "bad format: flag not an integer on line $total: '$flag'"
            set line_format_ok false
        }
        if {$flag < 1} {incr readings_ignored}
    }
    if {$line_format_ok && $readings_ignored == 0} {incr good}
}
close $fh

puts "total: $total"
puts [format "good:  %d = %5.2f%%" $good [expr {100.0 * $good / $total}]]

Results:

duplicate date on line 85: 1990-03-25
duplicate date on line 456: 1991-03-31
duplicate date on line 820: 1992-03-29
duplicate date on line 1184: 1993-03-28
duplicate date on line 1911: 1995-03-26
total: 5471
good:  5017 = 91.70%

Ursala

compiled and run in a single step, with the input file accessed as a list of strings pre-declared in readings_dot_txt

#import std
#import nat

readings        = (*F ~&c/;digits+ rlc ==+ ~~ -={` ,9%cOi&,13%cOi&}) readings_dot_txt 

valid_format    = all -&length==49,@tK27 all ~&w/`.&& ~&jZ\digits--'-.',@tK28 all ~&jZ\digits--'-'&-

duplicate_dates = :/'duplicated dates:'+ ~&hK2tFhhPS|| -[(none)]-!

good_readings   = --' good readings'@h+ %nP+ length+ *~ @tK28 all ~='0'&& ~&wZ/`-

#show+

main = valid_format?(^C/good_readings duplicate_dates,-[invalid format]-!) readings

output:

5017 good readings
duplicated dates:
1995-03-26
1993-03-28
1992-03-29
1991-03-31
1990-03-25

VBScript

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(objFSO.GetParentFolderName(WScript.ScriptFullName) &_
			"\readings.txt",1)
Set objDateStamp = CreateObject("Scripting.Dictionary")

Total_Records = 0
Valid_Records = 0
Duplicate_TimeStamps = ""

Do Until objFile.AtEndOfStream
	line = objFile.ReadLine
	If line <> "" Then
		token = Split(line,vbTab)
		If objDateStamp.Exists(token(0)) = False Then
			objDateStamp.Add token(0),""
			Total_Records = Total_Records + 1
			If IsValid(token) Then
				Valid_Records = Valid_Records + 1
			End If
		Else
			Duplicate_TimeStamps = Duplicate_TimeStamps & token(0) & vbCrLf
			Total_Records = Total_Records + 1
		End If
	End If 	
Loop

Function IsValid(arr)
	IsValid = True
	Bad_Readings = 0
	n = 1
	Do While n <= UBound(arr)
		If n + 1 <= UBound(arr) Then
			If CInt(arr(n+1)) < 1 Then
				Bad_Readings = Bad_Readings + 1	
			End If 
		End If
		n = n + 2
	Loop
	If Bad_Readings > 0 Then
		IsValid = False
	End If
End Function

WScript.StdOut.Write "Total Number of Records = " & Total_Records
WScript.StdOut.WriteLine
WScript.StdOut.Write "Total Valid Records = " & Valid_Records
WScript.StdOut.WriteLine
WScript.StdOut.Write "Duplicate Timestamps:"
WScript.StdOut.WriteLine
WScript.StdOut.Write Duplicate_TimeStamps
WScript.StdOut.WriteLine

objFile.Close
Set objFSO = Nothing
Output:
Total Number of Records = 5471
Total Valid Records = 5013
Duplicate Timestamps:
1990-03-25
1991-03-31
1992-03-29
1993-03-28
1995-03-26

Vedit macro language

This implementation does the following checks:

  • Checks for duplicate date fields. Note: duplicates can still be counted as valid records, as in other implementations.
  • Checks date format.
  • Checks that value fields have 1 or more digits followed by decimal point followed by 3 digits
  • Reads flag value and checks if it is positive
  • Requires 24 value/flag pairs on each line
#50 = Buf_Num           // Current edit buffer (source data)
File_Open("|(PATH_ONLY)\output.txt")
#51 = Buf_Num           // Edit buffer for output file
Buf_Switch(#50)

#11 = #12 = #13 = #14 = #15 = 0
Reg_Set(15, "xxx")

While(!At_EOF) {
    #10 = 0
    #12++

    // Check for repeated date field
    if (Match(@15) == 0) {
        #20 = Cur_Line
        Buf_Switch(#51)   // Output file
        Reg_ins(15) IT(": duplicate record at ") Num_Ins(#20)
        Buf_Switch(#50)   // Input file
        #13++
    }

    // Check format of date field
    if (Match("|d|d|d|d-|d|d-|d|d|w", ADVANCE) != 0) {
        #10 = 1
        #14++
    }
    Reg_Copy_Block(15, BOL_pos, Cur_Pos-1)

    // Check data fields and flags:
    Repeat(24) {
        if ( Match("|d|*.|d|d|d|w", ADVANCE) != 0 || Num_Eval(ADVANCE) < 1) {
            #10 = 1
            #15++
            Break
        }
        Match("|W", ADVANCE)
    }
    if (#10 == 0) { #11++ }             // record was OK
    Line(1, ERRBREAK)
}

Buf_Switch(#51)         // buffer for output data
IN
IT("Valid records:       ") Num_Ins(#11)
IT("Duplicates:          ") Num_Ins(#13)
IT("Date format errors:  ") Num_Ins(#14)
IT("Invalid data records:") Num_Ins(#15)
IT("Total records:       ") Num_Ins(#12)

Sample output:

1990-03-25: duplicate record at    85
1991-03-31: duplicate record at   456
1992-03-29: duplicate record at   820
1993-03-28: duplicate record at  1184
1995-03-26: duplicate record at  1911

Valid records:        5017
Duplicates:              5
Date format errors:      0
Invalid data records:  454
Total records:        5471

Wren

Translation of: Kotlin
Library: Wren-pattern
Library: Wren-fmt
Library: Wren-sort
import "io" for File
import "./pattern" for Pattern
import "./fmt" for Fmt
import "./sort" for Sort

var p = Pattern.new("+1/s")
var fileName = "readings.txt"
var lines = File.read(fileName).trimEnd().split("\r\n")
var count = 0
var invalid = 0
var allGood = 0
var map = {}
for (line in lines) {
    count = count + 1
    var fields = p.splitAll(line)
    var date = fields[0]
    if (fields.count == 49) {
        map[date] = map.containsKey(date) ? map[date] + 1 : 1
        var good = 0
        var i = 2
        while (i < fields.count) {
            if (Num.fromString(fields[i]) >= 1) good = good + 1
            i = i + 2
        }
        if (good == 24) allGood = allGood + 1
    } else {
        invalid = invalid + 1
   }
}

Fmt.print("File = $s", fileName)
System.print("\nDuplicated dates:")
var keys = map.keys.toList
Sort.quick(keys)
for (k in keys) {
    var v = map[k]
    if (v > 1) Fmt.print("  $s  ($d times)", k, v)
}
Fmt.print("\nTotal number of records   : $d", count)
var percent = invalid/count * 100
Fmt.print("Number of invalid records : $d ($5.2f)\%", invalid, percent)
percent = allGood/count * 100
Fmt.print("Number which are all good : $d ($5.2f)\%", allGood, percent)
Output:
File = readings.txt

Duplicated dates:
  1990-03-25  (2 times)
  1991-03-31  (2 times)
  1992-03-29  (2 times)
  1993-03-28  (2 times)
  1995-03-26  (2 times)

Total number of records   : 5471
Number of invalid records : 0 ( 0.00)%
Number which are all good : 5017 (91.70)%

zkl

   // the RegExp engine has a low limit on groups so
   // I can't use it to select all fields, only verify them
re:=RegExp(0'|^(\d+-\d+-\d+)| + 0'|\s+\d+\.\d+\s+-*\d+| * 24 + ".+$");
w:=[1..].zip(File("readings.txt"));  //-->lazy (line #,line)
reg datep,N, good=0, dd=0;
foreach n,line in (w){
   N=n;		// since n is local to this scope
   if (not re.search(line)){ println("Line %d: malformed".fmt(n)); continue; }
   date:=line[re.matchedNs[1].xplode()];  // I can group the date field
   if (datep==date){ dd+=1; println("Line %4d: dup date: %s".fmt(n,date)); }
   datep=date;
   if (line.replace("\t"," ").split(" ").filter()[1,*]  // blow fields apart, drop date
         .pump(Void,Void.Read, // get (reading,status)
            fcn(_,s){  // stop on first problem status and return True
               if(s.strip().toInt()<1) T(Void.Stop,True) else False
       })) continue;
   good+=1;
}
println("%d records read, %d duplicate dates, %d valid".fmt(N,dd,good));
Output:
Line   85: dup date: 1990-03-25
Line  456: dup date: 1991-03-31
Line  820: dup date: 1992-03-29
Line 1184: dup date: 1993-03-28
Line 1911: dup date: 1995-03-26
5471 records read, 5 duplicate dates, 5017 valid