Text processing/2: Difference between revisions

Line 1,877:

=={{header|Perl 6}}==

This version does validation with a single Perl 6 regex that is much more readable than the typical regex, and arguably expresses the data structure more straightforwardly.

<lang perl6>my $fields = 49;

⚫

Here we use normal quotes for literals, and <tt>\h</tt> for horizontal whitespace.

⚫

Variables like <tt>$good-record</tt> that are going to be autoincremented do not need to be initialized.

my ($good-records, %dates) = 0;

for 1 .. * Z $*IN.lines -> $line, $s {

my @fs = split /\s+/, $s;

@fs == $fields or die "$line: Bad number of fields";

given shift @fs {

m/\d**4 \- \d**2 \- \d**2/ or die "$line: Bad date format";

++%dates{$_};

}

my $all-flags-okay = True;

for @fs -> $val, $flag {

$val ~~ /\d+ \. \d+/ or die "$line: Bad value format";

$flag ~~ /^ \-? \d+/ or die "$line: Bad flag format";

$flag < 1 and $all-flags-okay = False;

}

$all-flags-okay and ++$good-records;

}

⚫

The <tt>.push</tt> method on a hash is magical and loses no information; if a duplicate key is found in the pushed pair, an array of values is automatically created of the old value and the new value pushed. Hence we can easily track all the lines that a particular duplicate occurred at.

say 'Good records: ', $good-records;

say 'Repeated timestamps:';

⚫

The <tt>.all</tt> method does "junctional" logic: it autothreads through comparators as any English speaker would expect. Junctions can also short-circuit as soon as they find a value that doesn't match, and the evaluation order is up to the computer, so it can be optimized or parallelized.

say ' ', $_ for grep { %dates{$_} > 1 }, sort keys %dates;</lang>

⚫

The final line simply greps out the pairs from the hash whose value is an array with more than 1 element. (Those values that are not arrays nevertheless have a <tt>.elems</tt> method that always reports <tt>1</tt>.) The <tt>.pairs</tt> is merely there for clarity; grepping a hash directly has the same effect.

⚫

Note that we sort the pairs after we've grepped them, not before; this works fine in Perl 6, sorting on the key and value as primary and secondary keys. Finally, pairs and arrays provide a default print format that is sufficient without additional formatting in this case.

Output:

<pre>Good records: 5017

Repeated timestamps:

1990-03-25

1991-03-31

1992-03-29

1993-03-28

1995-03-26</pre>

The first version demonstrates that you can program Perl 6 almost like Perl 5. Here's a more idiomatic Perl 6 version that runs several times faster:

<lang perl6>my $good-records;

my $line;

Line 1,933:

Line 1,913:

<pre>5017 good records out of 5471 total

Repeated timestamps (with line numbers):

1990-03-25 84 85

1990-03-25 => [84 85]

1991-03-31 455 456

1991-03-31 => [455 456]

1992-03-29 819 820

1992-03-29 => [819 820]

1993-03-28 1183 1184

1993-03-28 => [1183 1184]

1995-03-26 1910 1911</pre>

1995-03-26 => [1910 1911]</pre>

⚫

~~Note how this~~ version does validation with a single Perl 6 regex that is much more readable than the typical regex, and arguably expresses the data structure more straightforwardly.

⚫

Here we use normal quotes for literals, and <tt>\h</tt> for horizontal whitespace.

⚫

Variables like <tt>$good-record</tt> that are going to be autoincremented do not need to be initialized. ~~(Perl 6 allows hyphens in variable names, as you can see.)~~

⚫

The <tt>.push</tt> method on a hash is magical and loses no information; if a duplicate key is found in the pushed pair, an array of values is automatically created of the old value and the new value pushed. Hence we can easily track all the lines that a particular duplicate occurred at.

⚫

The <tt>.all</tt> method does "junctional" logic: it autothreads through comparators as any English speaker would expect. Junctions can also short-circuit as soon as they find a value that doesn't match, and the evaluation order is up to the computer, so it can be optimized or parallelized.

⚫

The final line simply greps out the pairs from the hash whose value is an array with more than 1 element. (Those values that are not arrays nevertheless have a <tt>.elems</tt> method that always reports <tt>1</tt>.) The <tt>.pairs</tt> is merely there for clarity; grepping a hash directly has the same effect.

⚫

Note that we sort the pairs after we've grepped them, not before; this works fine in Perl 6, sorting on the key and value as primary and secondary keys. Finally, pairs and arrays provide a default print format that is sufficient without additional formatting in this case.

=={{header|PHP}}==