Read a specific line from a file

Some languages have special semantics for obtaining a known line number from a file. The task is to demonstrate how to obtain the contents of a specific line within a file. For the purpose of this task demonstrate how to the contents of the seventh line of a file can be obtained, and store this in a variable or in memory (for potential future use within the program if the code were to become embedded). If the file does not contain seven lines, or the seventh line is empty, or too big to be retrieved, output an appropriate message. If no special semantics are available for obtaining the required line, it is permissible to read line by line. Note that empty lines are considered and should still be counted. Note that for functional languages or languages without variables or storage, it is permissible to output the extracted data to standard output.

C

Mmap file and search for offsets to certain line number. Since mapped file really is memory, there's no extra storage procedure once offsets are found. <lang c>#include <unistd.h>

include <sys/types.h>
include <sys/mman.h>
include <sys/stat.h>
include <fcntl.h>
include <err.h>

/* following code assumes all file operations succeed. In practice,

* return codes from open, close, fstat, mmap, munmap all need to be
* checked for error.

/

int read_file_line(char *path, int line_no) { struct stat s; char *buf; off_t start = -1, end = -1; size_t i; int ln, fd, ret = 1;

if (line_no == 1) start = 0; else if (line_no < 1){ warn("line_no too small"); return 0; /* line_no starts at 1; less is error */ }

line_no--; /* back to zero based, easier */

fd = open(path, O_RDONLY); fstat(fd, &s);

/* Map the whole file. If the file is huge (up to GBs), OS will swap * pages in and out, and because search for lines goes sequentially * and never accesses more than one page at a time, penalty is low. * If the file is HUGE, such that OS can't find an address space to map * it, we got a real problem. In practice one would repeatedly map small * chunks, say 1MB at a time, and find the offsets of the line along the * way. Although, if file is really so huge, the line itself can't be * garanteed small enough to be "stored in memory", so there. */ buf = mmap(0, s.st_size, PROT_READ, MAP_PRIVATE, fd, 0);

/* optional; if the file is large, tell OS to read ahead */ madvise(buf, s.st_size, MADV_SEQUENTIAL);

for (i = ln = 0; i < s.st_size && ln <= line_no; i++) { if (buf[i] != '\n') continue;

if (++ln == line_no) start = i + 1; else if (ln == line_no + 1) end = i + 1; }

if (start >= s.st_size || start < 0) { warn("file does not have line %d", line_no + 1); ret = 0; } else { /* do something with the line here, like write(STDOUT_FILENO, buf + start, end - start); or copy it out, or something */ }

munmap(buf, s.st_size); close(fd);

return ret; }</lang>

Icon and Unicon

The procedure readline uses repeated alternation (i.e. |read()) to generate the lines of the file one at a time and limitation (i.e. \ n) to limit the generation to n results. If the file is not large enough readline will fail.

While it is certainly possible to read at file at specific offsets without reading each line via seek, with files using line feed terminated variable length records something has to read the data to determine the 7th record. This solution uses a combination of repeated alternation and generation limiting to achieve this. The counter is simply to discover if there are enough records.

<lang Icon>procedure main() write(readline("foo.bar.txt",7)|"failed") end

procedure readline(f,n) # return n'th line of file f f := open(\f,"r") | fail # open file every i := n & line := |read() \ n do i -:= 1 # <== here close(f) if i = 0 then return line end </lang>

J

<lang j>readLine=: 4 :0

 (x-1) {:: <;.2 ] 1!:1 boxxopen y

)</lang>

Thus: <lang bash>$ cal 2011 > cal.txt</lang>

<lang j> 7 readLine 'cal.txt'

9 10 11 12 13 14 15  13 14 15 16 17 18 19  13 14 15 16 17 18 19

</lang>

Note that this code assumes that the last character in the file is the line end character, and that the line end character is a part of the line to be retrieved.

Tacit alternative <lang j>require 'files' NB. required for versions before J701 readLineT=: <:@[ {:: 'b'&freads@]</lang> This is not quite equivalent to the code above as it handles cross-platform line-endings and those line end character(s) are removed from the result.

Perl 6

<lang perl6>say lines[6] // die "Short file";</lang> Without an argument, the lines function reads filenames from the command line, or defaults to standard input. It then returns a lazy list, which we subscript to get the 7th element. Assuming this code is in a program called line7:

$ cal 2011 > cal.txt
$ line7 cal.txt
16 17 18 19 20 21 22  20 21 22 23 24 25 26  20 21 22 23 24 25 26  
$

This works even on infinite files because lists are lazy:

$ yes | line7
y
$

Python

<lang python>from itertools import islice

with open('xxx.txt') as f:

   linelist = list(islice(f, 7, 8))
   assert linelist != [], 'Not 7 lines in file'
   line = linelist[0]</lang>

Tcl

This code can deal with very large files with very long lines (up to 1 billion characters in a line should work fine, provided enough memory is available) and will return an empty string when the nth line is empty (as an empty line is still a valid line). <lang tcl>proc getNthLineFromFile {filename n} {

   set f [open $filename]
   while {[incr n -1] > 0} {
       if {[gets $f line] < 0} {
           close $f
           error "no such line"
       }
   }
   close $f
   return $line

}

puts [getNthLineFromFile example.txt 7]</lang> Where it is necessary to provide very fast access to lines of text, it becomes sensible to create an index file describing the locations of the starts of lines so that the reader code can seek directly to the right location. This is rarely needed, but can occasionally be helpful.