Revision as of 04:58, 14 November 2009 (view source) Tikkanz (talk \| contribs) m (→‎{{header\|J}}) ← Older edit		Revision as of 21:47, 19 November 2009 (view source) rosettacode>UnderBot m (Fixed lang tags.) Newer edit →
Line 25: =={{header\|Ada}}== {{libheader\|Simple components for Ada}} <lang ada>with Ada.Calendar; use Ada.Calendar;▼ ~~<lang ada>~~ ▲with Ada.Calendar; use Ada.Calendar; with Ada.Text_IO; use Ada.Text_IO; with Strings_Edit; use Strings_Edit; Line 88 ⟶ 87: Close (File); Put_Line ("Valid records " & Image (Count) & " of " & Image (Line_No) & " total"); end Data_Munging_2;</lang> ~~</lang>~~ Sample output <pre> Line 108 ⟶ 106: If their are any scientific notation fields then their will be an e in the file: <~~pre~~lang awk>bash$ awk '/[eE]/' readings.txt bash$ </~~pre~~lang> Quick check on the number of fields: <~~pre~~lang awk>bash$ awk 'NF != 49' readings.txt bash$ </~~pre~~lang> Full check on the file format using a regular expression: <~~pre~~lang awk>bash$ awk '!(/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+)+$/ && NF==49)' readings.txt bash$ </~~pre~~lang> Full check on the file format as above but using regular expressions allowing intervals (gnu awk): <~~pre~~lang awk>bash$ awk --re-interval '!(/^[0-9]{4}-[0-9]{2}-[0-9]{2}([ \t]+[-]?[0-9]+\.[0-9]+[\t ]+[-]?[0-9]+){24}+$/ )' readings.txt bash$ </~~pre~~lang> Line 124 ⟶ 122: Accomplished by counting how many times the first field occurs and noting any second occurrences. <~~pre~~lang awk>bash$ awk '++count[$1]==2{print $1}' readings.txt 1990-03-25 1991-03-31 Line 130 ⟶ 128: 1993-03-28 1995-03-26 bash$ </~~pre~~lang> '''What number of records have good readings for all instruments.''' <~~pre~~lang awk>bash$ awk '{rec++;ok=1; for(i=0;i<24;i++){if($(2i+3)<1){ok=0}}; recordok += ok} END {print "Total records",rec,"OK records", recordok, "or", recordok/rec100,"%"}' readings.txt Total records 5471 OK records 5017 or 91.7017 % bash$ </~~pre~~lang> =={{header\|C++}}== Line 615 ⟶ 613: readings = open('readings.txt','r') munge2(readings)</lang> ~~</lang>~~ <pre>bash$ /cygdrive/c/Python26/python munge2.py Duplicate dates: Line 634 ⟶ 631: =={{header\|R}}== <lang R># Read in data from file ~~# Read in data from file~~ dfr <- read.delim("d:/readings.txt", colClasses=c("character", rep(c("numeric", "integer"), 24))) dates <- strptime(dfr[,1], "%Y-%m-%d") Line 647 ⟶ 643: # Number of rows with no bad values flags <- as.matrix(dfr[,seq(3,49,2)])>0 sum(apply(flags, 1, all))</lang> ~~</lang>~~ =={{header\|Ruby}}== <lang ruby>require 'set' ~~require 'set'~~ def munge2(readings, debug=false) Line 704 ⟶ 698: open('readings.txt','r') do \|readings\| munge2(readings) end</lang> ~~</lang>~~ =={{header\|Tcl}}== <lang tcl> set data [lrange [split [read [open "readings.txt" "r"]] "\n"] 0 end-1] set total [llength $data] set correct $total set datestamps {} foreach line $data { set formatOk true set hasAllMeasurements true set date [lindex $line 0] if {[llength $line] != 49} { set formatOk false } if {![regexp {\d{4}-\d{2}-\d{2}} $date]} { set formatOk false } if {[lsearch $datestamps $date] != -1} { puts "Duplicate datestamp: $date" } {lappend datestamps $date} foreach {value flag} [lrange $line 1 end] { if {$flag < 1} { set hasAllMeasurements false } if {![regexp -- {[-+]?\d+\.\d+} $value] \|\| ![regexp -- {-?\d+} $flag]} {set formatOk false} } if {!$hasAllMeasurements} { incr correct -1 } if {!$formatOk} { puts "line \"$line\" has wrong format" } } puts "$correct records with good readings = [expr $correct * 100.0 / $total]%" puts "Total records: $total"</lang> ~~</lang>~~ <pre>$ tclsh munge2.tcl Duplicate datestamp: 1990-03-25 Line 830 ⟶ 822: * Reads flag value and checks if it is positive * Requires 24 value/flag pairs on each line <lang vedit>#50 = Buf_Num // Current edit buffer (source data)▼ ~~<pre>~~ ▲#50 = Buf_Num // Current edit buffer (source data) File_Open("\|(PATH_ONLY)\output.txt") #51 = Buf_Num // Edit buffer for output file Line 878 ⟶ 869: IT("Date format errors: ") Num_Ins(#14) IT("Invalid data records:") Num_Ins(#15) IT("Total records: ") Num_Ins(#12)</lang> ~~</pre>~~ Sample output: <lang vedit>1990-03-25: duplicate record at 85▼ ~~<pre>~~ ▲1990-03-25: duplicate record at 85 1991-03-31: duplicate record at 456 1992-03-29: duplicate record at 820 Line 892 ⟶ 881: Date format errors: 0 Invalid data records: 454 Total records: 5471</lang> ~~</pre>~~

Text processing/2: Difference between revisions

Text processing/2 (view source)

Revision as of 21:47, 19 November 2009