Text processing/2: Difference between revisions

m
Fixed lang tags.
m (Fixed lang tags.)
Line 25:
=={{header|Ada}}==
{{libheader|Simple components for Ada}}
<lang ada>with Ada.Calendar; use Ada.Calendar;
<lang ada>
with Ada.Calendar; use Ada.Calendar;
with Ada.Text_IO; use Ada.Text_IO;
with Strings_Edit; use Strings_Edit;
Line 88 ⟶ 87:
Close (File);
Put_Line ("Valid records " & Image (Count) & " of " & Image (Line_No) & " total");
end Data_Munging_2;</lang>
</lang>
Sample output
<pre>
Line 108 ⟶ 106:
 
If their are any scientific notation fields then their will be an e in the file:
<prelang awk>bash$ awk '/[eE]/' readings.txt
bash$ </prelang>
Quick check on the number of fields:
<prelang awk>bash$ awk 'NF != 49' readings.txt
bash$ </prelang>
Full check on the file format using a regular expression:
<prelang awk>bash$ awk '!(/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+)+$/ && NF==49)' readings.txt
bash$ </prelang>
Full check on the file format as above but using regular expressions allowing intervals (gnu awk):
<prelang awk>bash$ awk --re-interval '!(/^[0-9]{4}-[0-9]{2}-[0-9]{2}([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+){24}+$/ )' readings.txt
bash$ </prelang>
 
 
Line 124 ⟶ 122:
 
Accomplished by counting how many times the first field occurs and noting any second occurrences.
<prelang awk>bash$ awk '++count[$1]==2{print $1}' readings.txt
1990-03-25
1991-03-31
Line 130 ⟶ 128:
1993-03-28
1995-03-26
bash$ </prelang>
 
 
'''What number of records have good readings for all instruments.'''
 
<prelang awk>bash$ awk '{rec++;ok=1; for(i=0;i<24;i++){if($(2*i+3)<1){ok=0}}; recordok += ok} END {print "Total records",rec,"OK records", recordok, "or", recordok/rec*100,"%"}' readings.txt
Total records 5471 OK records 5017 or 91.7017 %
bash$ </prelang>
 
=={{header|C++}}==
Line 615 ⟶ 613:
readings = open('readings.txt','r')
munge2(readings)</lang>
</lang>
<pre>bash$ /cygdrive/c/Python26/python munge2.py
Duplicate dates:
Line 634 ⟶ 631:
 
=={{header|R}}==
<lang R># Read in data from file
# Read in data from file
dfr <- read.delim("d:/readings.txt", colClasses=c("character", rep(c("numeric", "integer"), 24)))
dates <- strptime(dfr[,1], "%Y-%m-%d")
Line 647 ⟶ 643:
# Number of rows with no bad values
flags <- as.matrix(dfr[,seq(3,49,2)])>0
sum(apply(flags, 1, all))</lang>
</lang>
 
=={{header|Ruby}}==
<lang ruby>require 'set'
require 'set'
 
def munge2(readings, debug=false)
Line 704 ⟶ 698:
open('readings.txt','r') do |readings|
munge2(readings)
end</lang>
</lang>
 
=={{header|Tcl}}==
 
<lang tcl> set data [lrange [split [read [open "readings.txt" "r"]] "\n"] 0 end-1]
set total [llength $data]
set correct $total
set datestamps {}
 
foreach line $data {
set formatOk true
set hasAllMeasurements true
 
set date [lindex $line 0]
if {[llength $line] != 49} { set formatOk false }
if {![regexp {\d{4}-\d{2}-\d{2}} $date]} { set formatOk false }
if {[lsearch $datestamps $date] != -1} { puts "Duplicate datestamp: $date" } {lappend datestamps $date}
 
foreach {value flag} [lrange $line 1 end] {
if {$flag < 1} { set hasAllMeasurements false }
 
if {![regexp -- {[-+]?\d+\.\d+} $value] || ![regexp -- {-?\d+} $flag]} {set formatOk false}
}
if {!$hasAllMeasurements} { incr correct -1 }
if {!$formatOk} { puts "line \"$line\" has wrong format" }
}
 
puts "$correct records with good readings = [expr $correct * 100.0 / $total]%"
puts "Total records: $total"</lang>
</lang>
<pre>$ tclsh munge2.tcl
Duplicate datestamp: 1990-03-25
Line 830 ⟶ 822:
* Reads flag value and checks if it is positive
* Requires 24 value/flag pairs on each line
<lang vedit>#50 = Buf_Num // Current edit buffer (source data)
<pre>
#50 = Buf_Num // Current edit buffer (source data)
File_Open("|(PATH_ONLY)\output.txt")
#51 = Buf_Num // Edit buffer for output file
Line 878 ⟶ 869:
IT("Date format errors: ") Num_Ins(#14)
IT("Invalid data records:") Num_Ins(#15)
IT("Total records: ") Num_Ins(#12)</lang>
</pre>
Sample output:
<lang vedit>1990-03-25: duplicate record at 85
<pre>
1990-03-25: duplicate record at 85
1991-03-31: duplicate record at 456
1992-03-29: duplicate record at 820
Line 892 ⟶ 881:
Date format errors: 0
Invalid data records: 454
Total records: 5471</lang>
</pre>
Anonymous user