Sanitize user input

Revision as of 12:30, 8 September 2021 by Thundergnat (talk | contribs) (→‎{{header|Raku}}: direct link)

"Never trust user input." If the Super Mario Bros. 3 Wrong Warp or [Bobby Tables] have taught programmers anything, it's that user input can be dangerous in unexpected ways.

Sanitize user input is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
Task

Create a function that takes a list of 20 first and last names, and copies them to a record or struct. The list of names won't be provided here, because exploits like the Bobby Tables one are often language-specific. Try to show an example of a "Bobby Tables" style input in your list of names and how your function prevents it from being executed at runtime.

Related tasks


Raku

It would be helpful if the task author would be a little more specific about what he is after. How user inputs must be "sanitized" entirely depends on how the data is going to be used.

For internal usage, in Raku, where you are simply storing data into an internal data structure, it is pretty much a non issue. Variables in Raku aren't executed without specific instructions to do so. Full stop.

Your name is a string of 2.6 million null bytes? Ok. Good luck typing that in.

You're called 'rm -rf /'? Wow. sucks to be you.

Now, it may be a good idea to check for a maximum permitted length; (2.6e6 null bytes) but Raku would handle it with no problem.

The problem mostly comes in when you need to interchange data with some 3rd party library / system; but every different system is going to have it's own quirks and caveats. There is no "one size fits all" solution.

In general, when it comes to sanitizing user input, the best way to go about it is: don't. It's a losing game.

Instead either validate input to make sure it follows a certain format, whitelist input so only a know few commands are permitted, or if those aren't possible, use 3rd party tools the 3rd party system provides to make arbitrary input "safe" to run. Which one of these is used depends on what system you need to interact with.

For the case given, (Bobby Tables), where you are presumably putting names into some 3rd party data storage (nominally a database of some kind), you would use bound parameters to automatically "make safe" any user input. See the Raku entry under the Parametrized SQL statement task.

Validating is making sure the the input matches some predetermined format, usually with some sort of regular expression. For names, you probably want to allow a fixed maximum (and minimum!) number of: any word or digit character, space and period characters and possibly some small selection of non-word characters. It is a careful balance between too restrictive and too permissive. You need to avoid falling into pre-conceived assumptions about: names, time, gender... the list goes on.

When passing a user command to the operating system, you probably want to use whitelisting. Only a very few commands from a predetermined list are allowed to be used.

   if $command ∈ <ls time cd df> then { execute $command }

or some such. What the whitelist contains and how to determine if the input matches is a whole article in itself.

Unfortunately, this is vary vague and hand-wavey due to the vagueness of the task description. Really, any language could copy/paste 95% or better of the above, change the language name, and be done with it. But until the task description is made a little more focused, it will have to do.