Talk:Walk a directory/Recursively

From Rosetta Code

I think the precise statement of the problem is a little too restricted. In some cases it's possible to walk the directory tree, printing all the matching filenames with somewhat less code than it is to walk the tree in order to collect a list of the matches or to perform other operations on them. Also there are many criteria on which one might wish to select files beyond just file names. JimD 19:13, 15 October 2007 (MDT)

This is true. The instructions could be changed to call a function, but I've been hesitant to use that abstraction for the sake of simplicity. Really, I'd just like to leave a comment along the lines of /* do something here */ in the appropriate place, but I'm not sure how to word that. --Short Circuit 20:00, 15 October 2007 (MDT)
The current text of the task:
 Walk a given directory tree and print files matching a given pattern. 
My suggestion:
 Walk a given directory tree, calling a function for every filename which matches
 a given wildcard, UNIX glob, or regex pattern (whichever is easiest for the given language).
Sounds fine to me. Go ahead and make the change. I'll add filling in what I know of globs and regex later.
Question: would we want to create a small set of more complex tree walking tasks which ask how one would do things like: follow (or refrain from following) symbolic links, refrain from crossing UNIX/Linux mount points, select files based on their stat() criteria (such as link count, dates, ownership, group association, permissions, etc) or on their contents?
I could see the components (creating/identifying symbolic links, creating hard links, getting link counts, getting create/modified dates, getting and setting file and directory ownership, getting and setting group association, getting and setting permissions and identifying mount points) as their own tasks. None of them are even UNIX-specific; Even Windows supports symbolic and hard links. But building find alternative is too complicated for your average task, or even a puzzle. --Short Circuit 21:54, 16 October 2007 (MDT)

Is the problem to just find filenames, excluding the path, that match the pattern? That's usually the example, and in that case, many of the snippets here have bugs because they apply the regex to the entire path, not just the filename. (Unsigned comment added by 69.211.121.158 at 22:21, 24 October 2010 69.211.121.158))

Does someone want to clarify the task description? If we're walking a directory tree, we would normally be considering entries within each directory we examine. It may also be worthwhile setting up an example directory structure with anticipated results. (Particularly considering things like matches against directory names, such as a search for \*\.txt, and there being a 'files.txt' directory in the tree.) --Michael Mol 13:09, 25 October 2010 (UTC)

symlinks?

Just curious, how many of these hand rolled solutions can deal with a symlink (or hardlink) to a higher directory (i.e. cyclic graph)? If encountering one, would it bail with "pathname too long", or loop until memory exausted? --Ledrug 04:56, 13 June 2011 (UTC)

In Python, the full docs for os.walk show that by default, symlinks are not followed. There is an optional parameter that allows symlinks to be followed and a banner note states:
Note: Be aware that setting followlinks to True can lead to infinite recursion if a link points to a parent directory of itself. walk() does not keep track of the directories it visited already.
There is also a note and warning about using relative pathnames and the assumption that code will not change the current directory during calls to os.walk. --Paddy3118 06:04, 13 June 2011 (UTC)
Yes, I have no doubt a proper library would have thought about it. But some of the code samples didn't use a library and just used recursion, which can have funny results. I guess it's ok for examples here, though. And I don't know why I said "harlink" above, bah. --Ledrug 06:57, 13 June 2011 (UTC)