Selective file copy: Difference between revisions

Content added Content deleted

Inline

Revision as of 13:32, 13 May 2016

Copy part of input records to an output file.

Show how file processing as known from PL/I or COBOL can be implemented in the language of your choice.
Here, a file is not 'just' a sequence of bytes or lines but a sequence of recods (structured data). The structure is usually described by declarations contained in an INCLUDE file (PL/I) or COPY BOOK (COBOL).
The by name assignment is a little extra available in PL/I.
Data conversions may be necessary (as shown here for data element c in the Go listing).

COBOL

Seeing as this task mentioned COBOL, here is a sample, but there is no JSON involved, just a simple data record layout and a MOVE CORRESPONDING.

Works with: GnuCOBOL

This is GnuCOBOL on a GNU/Linux system, so let's start with a text file, field A through D, 5 bytes each, 21 character lines, counting the newline, converted into a more COBOLish fixed length form of 20 bytes per record with no newlines.

prompt$ cat selective-input-file.txt
A    B    0001+D
AA   BB   0002+DD
AAA  BBB  0003+DDD
AAAA BBBB 0004-DDDD
AAAAABBBBB0005-DDDDD

prompt$ $ dd if=selective-input-file.txt of=selective-input-file cbs=20 conv=block
0+1 records in
0+1 records out
100 bytes (100 B) copied, 0.000451344 s, 222 kB/s

There are no newlines in the SEQUENTIAL access file now; just 5, 20 byte records. The numeric field-c is explained below.

copy books In the selective-copy.cob example, instead of some COPY directives with input and output record layout copy books, an inline REPLACE directive is used instead, to keep the source file encapsulated. The data between the "==" markers after the REPLACE BY phrase would normally be saved in a separate file, and read in during compile with COPY filename. Those copy books could then be used by any source module that would need the record layout. In this example, the :INPUT-RECORD: and :OUTPUT-RECORD: pseudo-text markers are where the COPY directives would be placed.

Far more complicated record hierarchies can be created with COBOL data level groupings than the one demonstrated here. Every larger level number becomes a group to all the higher in the hierarchy lower valued level numbers. Each new level can be treated as entire groups of fields, or separately down at the highest numbered elementary items. Levels from 01 to 49 can be used to create data record layouts, with subscripted repeats (arrays) allowed at almost all levels. In this example only levels 01, and 05 are used.

      01 ws-input-record.
      :INPUT-RECORD:

</lang> will become

      01 ws-input-record.
         05 field-a    pic x(5).
         05 field-b    pic x(5).
         05 field-c    pic s9(5).
         05 field-d    pic x(5).

</lang> after the REPLACE preprocessor directive is finished.

And finally, the example selective-copy.cob <lang COBOL>

     *> Tectonics:
     *>   cobc -xj selective-copy.cob
     *>   cobc -xjd -DSHOWING selective-copy.cob
     *> ***************************************************************
      identification division.
      program-id. selective-copy.

      environment division.
      configuration section.
      repository.
          function all intrinsic.

      input-output section.
      file-control.
          select input-file
          assign to input-filename
          organization is sequential
          status is input-status.

          select output-file
          assign to output-filename
          organization is sequential
          status is output-status.

     *> emulate a COPY book, with an inline REPLACE

      REPLACE ==:INPUT-RECORD:== BY
      ==
         05 field-a    pic x(5).
         05 field-b    pic x(5).
         05 field-c    pic s9(4) sign is trailing separate.
         05 field-d    pic x(5).
      ==

      ==:OUTPUT-RECORD:== BY
      ==
         05 field-a    pic x(5).
         05 field-c    pic ----9.
         05 field-x    pic x(5).
      ==.

      data division.
      file section.
      fd input-file.
      01 fd-input-record.
      :INPUT-RECORD:

      fd output-file.
      01 fd-output-record.
      :OUTPUT-RECORD:

      working-storage section.
      01 input-filename.
         05 filler            value "selective-input-file".
      01 input-status         pic xx.
         88 ok-input          values '00' thru '09'.
         88 eof-input         value '10'.
      01 ws-input-record.
      :INPUT-RECORD:

      01 output-filename.
         05 filler            value "selective-output-file".
      01 output-status        pic xx.
         88 ok-output         values '00' thru '09'.
         88 eof-output        value '10'.
      01 ws-output-record.
      :OUTPUT-RECORD:

      77 file-action          pic x(11).

      77 math pic s9(5).

     *> ***************************************************************
      procedure division.
      main-routine.
      perform open-files

      perform read-input-file
      perform until eof-input
      >>IF SHOWING IS DEFINED
          display "input  :" ws-input-record ":"
      >>END-IF
          move corresponding ws-input-record to ws-output-record
          move "XXXXX" to field-x in ws-output-record
          perform write-output-record
          perform read-input-file
      end-perform

      perform close-files
      goback.
      
     *> ***************************************************************
      open-files.
      open input input-file
      move "open input" to file-action
      perform check-input-file

      open output output-file
      move "open output" to file-action
      perform check-output-file
      .

     *> **********************
      read-input-file.
      read input-file into ws-input-record
      move "reading" to file-action
      perform check-input-with-eof
      .

     *> **********************
      write-output-record.
      write fd-output-record from ws-output-record
      move "writing" to file-action
      perform check-output-file
      >>IF SHOWING IS DEFINED
          display "output :" ws-output-record ":"
      >>END-IF
      .

     *> **********************
      close-files.
      close input-file output-file
      perform check-input-with-eof
      perform check-output-file
      .

     *> **********************
      check-input-file.
      if not ok-input then
          perform input-file-error
      end-if
      .

     *> **********************
      check-input-with-eof.
      if not ok-input and not eof-input then
          perform input-file-error
      end-if
      .

     *> **********************
      input-file-error.
      display "error " file-action space input-filename
              space input-status upon syserr
      move 1 to return-code
      goback
      .

     *> **********************
      check-output-file.
      if not ok-output then
          display "error " file-action space output-filename
                  space output-status upon syserr
          move 1 to return-code
          goback
      end-if
      .

      end program selective-copy.</lang>

The output file has no newlines, so normal cat type commands are not of much use, so we turn to the dd command again, this time with an unblock conversion.

Output:

prompt$ cobc -xj selective-copy.cob
prompt$ dd if=selective-output-file cbs=15 conv=unblock
A        1XXXXX
AA       2XXXXX
AAA      3XXXXX
AAAA    -4XXXXX
AAAAA   -5XXXXX
0+1 records in
0+1 records out
80 bytes (80 B) copied, 0.000173387 s, 461 kB/s

80 bytes output, 5 records of 15 bytes each, with a newline added to each of the 5 records. The dd option status=none will turn off the statistics report, making the output more cat like, just showing the records as lines of data on standard out.

Another wrinkle, is the numeric values in field-c. Normally COBOL will store USAGE DISPLAY numerics using one byte per digit, with an overpunch for the sign of a value, when saving to or reading from disk or memory. A sign bit is or'ed into one of the digits, either first or last, depending on platform and compile time options. This can be overridden in source code with SIGN IS LEADING (or TRAILING) SEPARATE. It was set to TRAILING SEPARATE in this example. Making for a 4 digit number with a sign field, for the 5 character field.

The output record then converts this field-c input form to a more human friendly NUMERIC-EDITED format. This version of field-c is no longer NUMERIC in a disk/memory sense, as the spaces in the field are not numerical values. Spaces are not digits in COBOL raw form, and are not assumed to be zeroes. Financial institutions seem to like it that way.

An alternative would be COMPUTATIONAL (or USAGE BINARY) storage format, in which case the values would not look like text at all. Just as would happen when C saves an int value directly to disk or memory. During terminal output, with a DISPLAY statement, the numbers look like text, but the memory representation would be computational bit patterns.

Go

JSON is popular these days for structured data and Go has support for this kind of selective copy in the JSON reader. The reader also supports custom conversions on fields. The common idiom for record construction and field initialization is shown. <lang go>package main

import (

   "encoding/json"
   "log"
   "os"
   "bytes"
   "errors"
   "strings"

)

// structure of test file, generated and then used as input. type s1 struct {

   A string
   B string
   C int
   D string

}

// structure of output file type s2 struct {

   A string
   C intString // it's a string, but unmarshals from a JSON int.
   X string

}

type intString string

func (i *intString) UnmarshalJSON(b []byte) error {

   if len(b) == 0 || bytes.IndexByte([]byte("0123456789-"), b[0]) < 0 {
       return errors.New("Unmarshal intString expected JSON number")
   }
   *i = intString(b)
   return nil

}

// "constructor" initializes X func NewS2() *s2 {

   return &s2{X: "XXXXX"}

}

func main() {

   // generate test file
   o1, err := os.Create("o1.json")
   if err != nil {
       log.Fatal(err)
   }
   e := json.NewEncoder(o1)
   for i := 1; i <= 5; i++ {
       err := e.Encode(s1{
           strings.Repeat("A", i),
           strings.Repeat("B", i),
           i,
           strings.Repeat("D", i),
       })
       if err != nil {
           log.Fatal(err)
       }
   }
   o1.Close()

   // reopen the test file, also open output file
   in, err := os.Open("o1.json")
   if err != nil {
       log.Fatal(err)
   }
   out, err := os.Create("out.json")
   if err != nil {
       log.Fatal(err)
   }
   // copy input to output, streaming
   d := json.NewDecoder(in)
   e = json.NewEncoder(out)
   for d.More() {
       // a little different than the PL/I example.  PL/I reads into s1, then
       // does the selective copy in memory.  The Go JSON reader can read the
       // s1 formated JSON directly into the s2 Go struct without needing any
       // intermediate s1 struct.
       s := NewS2()
       if err = d.Decode(s); err != nil {
           log.Fatal(err)
       }
       if err = e.Encode(s); err != nil {
           log.Fatal(err)
       }
   }

}</lang>

o1.json:

{"A":"A","B":"B","C":1,"D":"D"}
{"A":"AA","B":"BB","C":2,"D":"DD"}
{"A":"AAA","B":"BBB","C":3,"D":"DDD"}
{"A":"AAAA","B":"BBBB","C":4,"D":"DDDD"}
{"A":"AAAAA","B":"BBBBB","C":5,"D":"DDDDD"}

out.json:

{"A":"A","C":"1","X":"XXXXX"}
{"A":"AA","C":"2","X":"XXXXX"}
{"A":"AAA","C":"3","X":"XXXXX"}
{"A":"AAAA","C":"4","X":"XXXXX"}
{"A":"AAAAA","C":"5","X":"XXXXX"}

Java

With a little help from my friens <lang java>import java.io.BufferedWriter; import java.io.FileWriter; import java.io.File; import java.io.IOException; import java.util.Scanner;

class CopysJ {

 public static void main(String[] args) {
   String ddname_IN  = "copys.in.txt";
   String ddname_OUT = "copys.out.txt";
   if (args.length >= 1) { ddname_IN  = args[0].length() > 0 ? args[0] : ddname_IN; }
   if (args.length >= 2) { ddname_OUT = args[1].length() > 0 ? args[1] : ddname_OUT; }

   File dd_IN = new File(ddname_IN);
   File dd_OUT = new File(ddname_OUT);

   try (
     Scanner scanner_IN = new Scanner(dd_IN);
     BufferedWriter writer_OUT = new BufferedWriter(new FileWriter(dd_OUT))
     ) {
     String a;
     String b;
     String c;
     String d;
     String c1;
     String x = "XXXXX";
     String data_IN;
     String data_OUT;
     int ib;

     while (scanner_IN.hasNextLine()) {
       data_IN = scanner_IN.nextLine();
       ib = 0;
       a = data_IN.substring(ib, ib += 5);
       b = data_IN.substring(ib, ib += 5);
       c = data_IN.substring(ib, ib += 4);
       c1=Integer.toHexString(new Byte((c.getBytes())[0]).intValue());
       if (c1.length()<2) { c1="0" + c1; }
       data_OUT = a + c1 + x;
       writer_OUT.write(data_OUT);
       writer_OUT.newLine();
       System.out.println(data_IN);
       System.out.println(data_OUT);
       System.out.println();
     }
   }
   catch (IOException ex) {
     ex.printStackTrace();
   }
   return;
 }

}</lang>

NetRexx

with a little help from a friend <lang netrexx>/* NetRexx */ -- nrc -keepasjava -savelog copys options replace format comments java crossref symbols nobinary

parse arg ddname_IN ddname_OUT . do

 if ddname_IN.length = 0 then ddname_IN = 'copys.in.txt'
 if ddname_OUT.length = 0 then ddname_OUT = 'copys.out.txt'

 dd_IN = File(ddname_IN)
 dd_OUT = File(ddname_OUT)
 scanner_IN = Scanner(dd_IN)
 writer_OUT = BufferedWriter(FileWriter(dd_OUT))

 x = 'XXXXX'
 loop while scanner_IN.hasNextLine()
   data_IN = scanner_IN.nextLine()
   parse data_IN a +5 . /* b */ +5 c +4 . /* d */
   cc=c.left(1).c2x
   data_OUT = a || cc.right(2,0) || x
   writer_OUT.write(data_OUT)
   writer_OUT.newLine()
   end

catch ex = IOException

 ex.printStackTrace()

finally

 do
   if scanner_IN \= null then scanner_IN.close()
   if writer_OUT \= null then writer_OUT.close()
 catch ex = IOException
   ex.printStackTrace()
 end

end</lang>

ooRexx

<lang oorexx>/* REXX */ infile ="in.txt" outfile="out.txt"

s1=.copys~new(infile,outfile) loop i=1 to 5

 s1~~input~~output

end s1~close -- close streams (files) 'type' outfile

class copys

attribute a

attribute b

attribute c

attribute d

method init -- constructor

 expose instream outstream
 parse arg infile, outfile
 instream =.stream~new(infile)~~open
 outstream=.stream~new(outfile)~~open("replace")

method input -- read an input line

 expose instream a b c d
 parse value instream~linein with a +5 b +5 c +5 d +5

method output -- write an output line

 expose outstream a c
 outstream~lineout(a || c~c2x~left(2)'XXXXX')

method close -- close files

 expose instream outstream
 instream~close
 outstream~close</lang>

Output:

AA   01XXXXX
AAA  02XXXXX
AAAA 03XXXXX
AAAAA04XXXXX
AAAAA05XXXXX

PL/I

<lang pli>*process source attributes xref or(!);

copys: Proc Options(Main);
Dcl 1 s1 unal,
     2 a Char(5),
     2 b Char(5),
     2 c Bin Fixed(31),
     2 d Char(5);
Dcl 1 s2,
     2 a Char(5),
     2 c Pic'99',
     2 x Char(5) Init('XXXXX');
Dcl o1  Record Output;   /* create a test file */
Dcl in  Record Input;
Dcl out Record Output;
Do i=1 To 5;
  s1.a=repeat('A',i);
  s1.b=repeat('B',i);
  s1.c=i;
  s1.d=repeat('D',i);
  Write File(o1) From(s1);
  End;
Close File(o1);

On Endfile(in) Goto eoj;
Do i=1 By 1;             /* copy parts of the test file    */
  Read File(in) Into(s1);
  s2=s1, by name;        /* only fields a and c are copied */
  Write File(out) From(s2);
  End;
eoj:
End;

</lang>

Output:

AA   01XXXXX
AAA  02XXXXX
AAAA 03XXXXX
AAAAA04XXXXX
AAAAA05XXXXX

REXX

Translation of: PL/I

<lang rexx>in='in.txt' out='out.txt'; 'erase' out Do While lines(in)>0

 l=linein(in)
 Parse Var l a +5 b +5 c +4 d +5
 chex=c2x(c)
 cpic=left(chex,2)
 call lineout out,a||cpic||'XXXXX'
 End

Call lineout in Call lineout out 'type' out</lang>

Output:

Using the test file produced by PL/I. The data conversion used for c is not very general!

AA   01XXXXX
AAA  02XXXXX
AAAA 03XXXXX
AAAAA04XXXXX
AAAAA05XXXXX