Sanitize user input

Revision as of 10:27, 11 September 2021 by Petelomax (talk | contribs) (removed "task")

"Never trust user input." If the Super Mario Bros. 3 Wrong Warp or [Bobby Tables] have taught programmers anything, it's that user input can be dangerous in unexpected ways.

In general, the task of preventing errors such as the above are best left to the built-in security features of the language rather than a filter of your own creation. This exercise is to test your ability to think about all the possible ways user input could break your program.

Task

Create a function that takes a list of 20 first and last names, and copies them to a record or struct. The list of names won't be provided here, because exploits like the Bobby Tables one are often language-specific. Try to show an example of a "Bobby Tables" style input in your list of names and how your function prevents it from being executed at runtime. For example, create a filter that prevents input that looks like your language's instructions from being entered, or escape it with the appropriate escape characters.

Related tasks


Phix

As noted there is no magic "one size fits all" solution, and in the specific case of sql the use of sqlite3_prepare() and sqlite3_bind_text() is strongly recommended in preference to sqlite3_exec() or sqlite3_get_table(), at least for any questionable input.

The inverse problem recently arose in p2js, whereby otherwise perfectly valid code on desktop/Phix could and would generate invalid HTML/Javascript if and when we tried to self-host (an effort which is still very much in progress, albeit not apace, btw):

with javascript_semantics
string header = """
<!DOCTYPE html>
<html lang="en" >
 <head>
  <title>%%s</title>%s
 </head>
 <body>
  <scr!ipt src="p2js.js"></scr!ipt>%%s%s
"""
-- ...
header = substitute(header,"scr!ipt","script")

puts(1,header)  -- (make the example runnable)

In other words I had to "sanitize" a constant in the source code, in this particular case, and I could have gone further and done something similar with all the other tags, but in practice there was no need to because the generated JavaScript was already always inside a script tag.

Raku

It would be helpful if the task author would be a little more specific about what he is after. How user inputs must be "sanitized" entirely depends on how the data is going to be used.

For internal usage, in Raku, where you are simply storing data into an internal data structure, it is pretty much a non issue. Variables in Raku aren't executed without specific instructions to do so. Full stop.

Your name is a string of 2.6 million null bytes? Ok. Good luck typing that in.

You're called 'rm -rf /'? Wow. sucks to be you.

Now, it may be a good idea to check for a maximum permitted length; (2.6e6 null bytes) but Raku would handle it with no problem.

The problem mostly comes in when you need to interchange data with some 3rd party library / system; but every different system is going to have it's own quirks and caveats. There is no "one size fits all" solution.

In general, when it comes to sanitizing user input, the best way to go about it is: don't. It's a losing game.

Instead either validate input to make sure it follows a certain format, whitelist input so only a know few commands are permitted, or if those aren't possible, use 3rd party tools the 3rd party system provides to make arbitrary input "safe" to run. Which one of these is used depends on what system you need to interact with.

For the case given, (Bobby Tables), where you are presumably putting names into some 3rd party data storage (nominally a database of some kind), you would use bound parameters to automatically "make safe" any user input. See the Raku entry under the Parametrized SQL statement task.

Validating is making sure the the input matches some predetermined format, usually with some sort of regular expression. For names, you probably want to allow a fixed maximum (and minimum!) number of: any word or digit character, space and period characters and possibly some small selection of non-word characters. It is a careful balance between too restrictive and too permissive. You need to avoid falling into pre-conceived assumptions about: names, time, gender, addresses, phone numbers... the list goes on.

When passing a user command to the operating system, you probably want to use whitelisting. Only a very few commands from a predetermined list are allowed to be used.

   if $command ∈ <ls time cd df> then { execute $command }

or some such. What the whitelist contains and how to determine if the input matches is a whole article in itself.

Unfortunately, this is very vague and hand-wavey due to the vagueness of the task description. Really, any language could copy/paste 95% or better of the above, change the language name, and be done with it. But until the task description is made a little more focused, it will have to do.

Wren

Library: Wren-ioutil
Library: Wren-pattern
Library: Wren-str
Library: Wren-trait


I'll start by saying that I agree with everything that was said in the Raku entry but, in the interests of writing some code, I've taken a very simplistic view of which names are acceptable if, say, we're trying to build a database.

Basically, names are only valid if they contain letters or digits (yes, digits have been known to be used) in the ISO 8859 range and also hyphens, underscores or apostrophes. However, the first or last character of a name can't be a punctuation character.

Furthermore, that there is a blacklist of unacceptable names though in practice this would probably be much longer than the one I've used here. <lang ecmascript>import "/ioutil" for Input import "/pattern" for Pattern import "/str" for Str import "/trait" for Indexed

class Person {

   construct new(firstName, lastName) {
       _firstName = firstName
       _lastName  = lastName
   }
   firstName { _firstName }
   lastName  { _lastName }
   toString { _firstName + " " + _lastName }

}

var persons = [] var blacklist = [

   "drop", "delete", "erase", "kill", "wipe", "remove",
   "file", "files", "directory", "directories",
   "table", "tables", "record", "records", "database", "databases"

]

var p = Pattern.new("+1&y", Pattern.whole) var punct = "'-_\xad" // allowable punctuation

var sanitizeInput = Fn.new { |name|

   var ok = p.isMatch(name) && !(punct.contains(name[0]) || punct.contains(name[-1]))
   if (!ok) return "Sorry, your name contains unacceptable characters."
   name = Str.lower(name)
   if (blacklist.contains(name)) return "Sorry, your name is unacceptable."
   return ""

}

for (i in 1..20) {

   var names = List.filled(2, null)
   var outer = false
   for (se in Indexed.new(["first", "last "])) {
       var name = Input.text("Enter your %(se.value) name : ", 1, 20)
       var msg = sanitizeInput.call(name)
       if (msg != "") {
           System.print(msg + "\n")
           outer = true
           break
       }
       names[se.index] = name
   }
   if (outer) continue
   persons.add(Person.new(names[0], names[1]))
   System.print()

} var count = persons.count System.print("The following %(count) person(s) have been added to the database:") for (person in persons) System.print(person)</lang>

Output:

Sample (abridged) input/output:

Enter your first name : Donald
Enter your last  name : Duck

Enter your first name : Mickey Mouse
Sorry, your name contains unacceptable characters.

Enter your first name : Bobby
Enter your last  name : Tables
Sorry, your name is unacceptable.

Enter your first name : Fred
Enter your last  name : rm -rf /
Sorry, your name contains unacceptable characters.

Enter your first name : David
Enter your last  name : Wipe
Sorry, your name is unacceptable.

Enter your first name : Nicolas
Enter your last  name : Pépé

Enter your first name : Marilyn
Enter your last  name : Monroe

Enter your first name : Bridget
Enter your last  name : O'Riley

Enter your first name : 'Prince-
Sorry, your name contains unacceptable characters.

Enter your first name : Blaine
Enter your last  name : Wolfeschlegelsteinhausenbergerdorff
Must have a length between 1 and 20 characters, try again.
Enter your last  name : Wolfeschlegelstein'h 

... (plus another 10 acceptable people)

The following 15 person(s) have been added to the database:
Donald Duck
Nicolas Pépé
Marilyn Monroe
Bridget O'Riley
Blaine Wolfeschlegelstein'h 
... (10 more)