Talk:Make a backup file

From Rosetta Code

Assumes unix

This task assumes unix, for example: "keep in mind symlinks". (So presumably it's legal to use os-specific code that would fail on windows?) --Rdm 19:43, 9 November 2011 (UTC) Then again, it specifies "avoid external commands" so presumably this means that libc should not be used from non-C languages? I'm not sure which requirements take precedence over which other requirements here. --Rdm 19:46, 9 November 2011 (UTC)

It looks like later versions of Windows do have symbolic links, but I never used symbolic links very much on any platform, so I don't know how important the operational differences are. --Mwn3d 19:50, 9 November 2011 (UTC)
Interesting, and they seem to work. And the "It is assumed" part of this task presumably means that if the program needs administrative privileges, it can be assumed to have them (I do not know what privileges are needed by default to rename or delete these links but administrator is needed by default to create them). --21:53, 9 November 2011 (UTC)
yes, i don't want to introduce another set of complications. and i certainly hope that links in windows can be manipulated without admin privileges though. but actually, the link itself does not need to be manipulated, only the target of the link.--eMBee 03:11, 10 November 2011 (UTC)
the task is written towards unix because that's all i know. :-) if you can help make it more generic, then i'd appreciate that. one of the points is to work out quirks when you deal with renaming files that actually point to a different location.
libc may be used if it is linked to your language runtime or into the executable. what should be avoided is things like exec or popen which fork an external command, and even more so try not to use things like system which execute a shell to run the string you pass. the last one might make the task be a mere wrapper around any UNIX Shell entry.
the reason for avoiding exec is that in some environments executing other commands has its own set of problems. path issues, security, or even simply availability, and also portability. (a solution that executes mv is going to be harder to port to windows than one that uses some libc rename function)--eMBee 03:11, 10 November 2011 (UTC)
You did not address dynamic loading at all (dlopen, ...). Also, you did not give enough detail for me to decide what to do about the case where file rename is implemented in utility which comes with the language and which uses mv on unix and MoveFileW from kernel32 on windows. --Rdm 14:31, 10 November 2011 (UTC)
i think dlopen is ok, i am not sure though, one question i am interested in is: can i deploy a program without relying on dependencies that i can not control? what if a user wants to run the program in a chroot or jail environment where mv might be missing. dlopen would be ok here only if the library to be opened somehow comes with the language, like it is a standard dependency of the language (as opposed to a dependency of just this task). this partly answers the case where the language implements rename using mv because in such a case mv would also be a standard dependency of the language. although i still would like to prefer a version that doesn't rely on external tools and processes.
in any case, if you can not avoid running an external process or if you dlopen a 3rd party library then please point this out in the description.--eMBee 17:07, 10 November 2011 (UTC)
Hmm... this warrants some thought: libc is the portable (documented) interface to the unix kernel. It can hypothetically be a static library but that is extremely rare nowadays -- almost everything requires an external libc. That said, there's also the question of "which libc", and the one used at build time is probably the right answer to that question (there will be a hard coded path in the executable which references libc for almost every working program out there in unix land). Or, that's how I would like to characterize the problem. And I think this thinking routinely applies in most all chroots. --Rdm 17:14, 10 November 2011 (UTC)
sure, libc is dynamically linked in most cases, but only in a few cases would you access it manually with dlopen.--eMBee 17:40, 10 November 2011 (UTC)
Ok, but one of those cases would be an interpreter which was designed to be portable across a variety of platforms. Here, you might have a core that gets you running and then everything else is done in the interpreter. That said, I can see an argument for providing special case support for libc on unix platforms. --Rdm 17:53, 10 November 2011 (UTC)
as i said above, if the core language implements rename/move using external commands then those commands become a direct dependency of the language and are ok to use. presumably such a language will rely on external commands for other things as well and thus it doesn't make much sense to avoid one and leave the others. in a situation where external commands are not allowed by policy, such a language would not be usable anyways. the limitations should only apply to languages where a portable method is not already available and different options could be chosen. in that case the choice should be made according to the restrictions given.--eMBee 03:11, 11 November 2011 (UTC)
Note that "rename is atomic" assumes unix (or maybe a recent version of windows and an appropriate file system). --Rdm 14:14, 14 November 2011 (UTC)
true, but it is only stated as an advantage not a requirement for this task. even without being atomic rename is cheaper and thus less likely to fail...--eMBee 14:42, 14 November 2011 (UTC)
Note that this still assumes unix -- here's some examples illustrating this point: http://stackoverflow.com/questions/7147577/programmatically-rename-open-file-on-windows and http://stackoverflow.com/questions/1261269/how-to-open-file-in-windows-while-not-blocking-its-renaming --Rdm (talk) 21:33, 17 May 2013 (UTC)

why external commands are bad

the motivation to avoid external commands can be illustrated by an experience i had just recently: on a website a framework uses cvs to manage changes to the contents. yesterday i wanted to add something to that site, and i was presented with this error: fork() failed with ENOMEM. Out of memory?. draw your own conclusions...--eMBee 03:18, 11 November 2011 (UTC)

Why no copying?

Backup involves copying, and must do since otherwise it is the same file and will be modified by the subsequent update. (Or alternatively it has to have some very special support from the OS; there's no POSIX operation for “checkpoint this file to this other name without copying” IIRC.) The whole strength of backups comes from copying. –Donal Fellows 15:42, 11 November 2011 (UTC)

This depends on the OS and on the pattern of accesses applications use on the file. Under unix, if anything has the file open for writing, then renaming it means they will update the backup.
this is of course a concern, but only if multiple processes deal with a file which is not the concern of this task. also if a copy of a file is made while another process writes to it the the problems are not any less.--eMBee 16:06, 11 November 2011 (UTC)
But if everything uses the "rename and write new copy" system, then it can be safe (though, of course, there's also the issue of more recent backups overwriting older backups). --Rdm 15:48, 11 November 2011 (UTC)
It seems faster to just rename the file. With copying it goes like this: create the new file (.backup), copy the contents of the old file to the new file, clear the old file, write new data to the old file. Without it goes like this: rename the old file to a new name (.backup), create the old file again (already empty), write new data to the newly created file (with the old name). --Mwn3d 15:52, 11 November 2011 (UTC)
good question. thanks for asking. copying is more expensive than rename. copying can fail (due to lack of space for example). if the machine dies before the copied file is written to the disk, which may be some time after the OS signaled that the copy is complete, and you already started to write the to the old file, then both may be lost. rename guarantees that the data is not touched, and thus can hardly be corrupted. and i don't think a rename could cause a file to be deleted if the machine crashes while a rename happens. it's either got the old name or the new one (or in very obscure situations maybe both). as far as i can tell, rename() is posix. at least the rename(2) manpage makes that claim. it is atomic too...--eMBee 16:06, 11 November 2011 (UTC)

No existing file

"Some examples on this page assume that the original file already exists. They might fail if some user is trying to create a new file." So, is it a task requirement that solutions should simply create a new file if there is no existing file? That is not something I would read into "In this task you should create a backup file from an existing file..." If this case is desired it should be added as one of the bullet points. —Sonia 23:11, 16 February 2012 (UTC)

if the file does not exist, it should not be created. it would be nice if the code would fail gracefully if the file is missing, but i don't think this is necessary. it is just a code snippet to solve a particular problem. i'd expect developers to adapt the code if their situation is slightly different. Ensure that a file exists solves that part for example. no need to repeat it here.--eMBee 03:07, 17 February 2012 (UTC)
Oh good. I'll change the solutions I just posted. —Sonia 03:13, 17 February 2012 (UTC)

Follow symlinks

FWIW, following symlinks seems like a really bad idea. It's fine in the context of something like Emacs (which sounds like a possible motivation) with a feature to visit files under their "real" names, but in those cases the user is usually aware of the new name via the UI (as in getting a different buffer name). But for a script this is just wrong, since you get a script which works in a way that can change in the presence of symlinks -- and the whole point of symlinks is to get things to work even when a file is elsewhere. I think that it would be better to simplify this by ignoring symlinks completely, and introduce a separate task for resolving symlinks. --Elibarzilay (talk) 21:01, 17 May 2013 (UTC)

Indeed. A simple application reading/writing files should not normally care (or check) if they are reading/writing via symlinks. The Go code for example is broken since it blindly assumes that any symlink doesn't point to another symlink. There are far too many ways to screw it up unless you really know what you're doing and you really understand symlinks (and how any specific user might choose to use them and want them to behave). IMO it shouldn't be an applications job to make file backups at all (except perhaps as an optional "feature" of an editor or some such; and for example editors like vim have a lot of options related to this so it will do what a user wants; assuming you can blindly lookup where a symlink points to and mess around in that directory is just bad). —dchapes (talk | contribs) 14:00, 6 September 2014 (UTC)

Atomicity

After coming back to this task... the requirements (stated requirements and to some degree implied requirements) stumble over the OS's support for atomic operations on a file system.

If atomicity is not an issue (if it's understood that the backup process may produce unintended consequences when some other mechanism is manipulating one or more of the path names being used to "backup" the file), the task is fairly straightforward.

If it is an issue, then all sorts of problems arise (for example, the file in question is on a network file system ...).

In a "real life" context this requires some sort of external attention (and redundancy -- backups being just one form of redundancy) to catch and recover from the occasional failures. Depending on the context, we wind up with quite a variety of cost/benefit issues.

So this winds up being a "best effort" problem, and many of the details are more about the underlying OS and hardware than about the language. It's an interesting problem. (But it's not a great fit as a rosettacode task.) --Rdm (talk) 10:29, 19 July 2022 (UTC)