Sanitize user input: Difference between revisions

m (→‎{{header|Raku}}: more caveats)
m (→‎{{header|Wren}}: Minor tidy)
(24 intermediate revisions by 6 users not shown)
Line 1:
{{draft task|Text processing}}
"Never trust user input." If the Super Mario Bros. 3 Wrong Warp or [[| Bobby Tables]] have taught programmers anything, it's that user input can be dangerous in unexpected ways.
In general, the task of preventing errors such as the above are best left to the built-in security features of the language rather than a filter of your own creation. This exercise is to test your ability to think about all the possible ways user input could break your program.
Create a function that takes a list of 20 first and last names, and copies them to a record or struct. The listTen of namesthem won'tmust be providedtypical hereinput, because(i.e. exploitsconsist likeof theonly Bobbyletters Tablesof onethe arealphabet oftenand language-specific.punctuation), Trybut tothe showother anten examplemust ofbe adeliberately "Bobbychosen Tables"to stylecause inputproblems inwith youra listprogram ofthat namesexpects andonly howletters yourand functionpunctuation. preventsA itfew from being executed at runtime.examples:
* ASCII control codes such as NUL, CR, LF
* Code for the language you are using that can result in damage (e.g. -rm -rf, delete System32, DROP TABLE, etc.)
* Numbers, symbols, foreign languages, emojis, etc.
(There were already solutions provided before the requirement that ten names are "normal" and ten are potentially harmful was added. Those answers satisfied the task requirements at the time they were submitted.)
;Related tasks
* [[Parametrized SQL statement]]
'''Adapted from [[#Wren]]'''
The jq program presented here offers an interactive approach to the problem
along the lines of the Wren solution. It will accept both "stop" and the end-of-stream
as a signal to finish gathering names.
The main program, `interact`, is somewhat convoluted because
jq does not currently offer much support for the type of interaction
envisioned in the Wren answer. It would be easy to simplify things
by using `stderr` for the prompt, but currently `stderr` cannot be
used to print "raw" strings.
<syntaxhighlight lang=jq>
def Person::new(firstName; lastName):
{firstName: firstName,
lastName: lastName };
def Person::tostring: .firstName + " " + .lastName;
def blacklist: [
"drop", "delete", "erase", "kill", "wipe", "remove",
"file", "files", "directory", "directories",
"table", "tables", "record", "records", "database", "databases",
"system", "system32", "system64", "rm", "rf", "rmdir", "format", "reformat"
def punct: "'-"; # allowable punctuation
def permissible:
def ok: "[A-Za-z\(punct)]+";
test("^" + ok +"$");
# Emit null or else the text of an error message
def checkInput:
. as $name
| (permissible and (((punct|contains($name[0:1])) or (punct|contains($name[-1:]))) | not)) as $ok
| if $ok
if blacklist|index($name|ascii_downcase) then "Sorry, that name is unacceptable."
else null
else "Sorry, that name contains unacceptable characters."
end ;
# Attempt to obtain a valid response until "stop" or EOS.
# Set .invalid and .answer of the incoming object.
def ask:
.invalid = false
# Use `first(inputs)` to avoid error on EOS.
| (first(inputs) // null) as $x
| if $x | IN(null, "stop") then .answer = true # i.e. stop
else .invalid = ($x|checkInput)
| .answer = (if .invalid then null else $x end)
end ;
# $max is the maximum number of full names to request (-1 for arbitrarily many)
def interact($max):
# An array of Person
def summary:
"The following \(length) person(s) have been added to the database:",
(.[] | Person::tostring);
["first", "last"] as $prompts
| label $out
# .question is the question number we are currently focused on.
# .emit is the string to emit if it has been set.
| foreach range(0; infinite) as $i (
{question: 0, emit: null, array: []};
if .array | length == $max
then .finished = .array
elif .emit then ask
| if .answer == true then .finished = .array
elif .invalid then .emit = .invalid + " Please re-enter:"
else .emit = null
| if .question == 0
then .first = .answer | .question = 1
else .last = .answer | .question = 0
| .array = .array + [Person::new(.first; .last)]
else .
# update .emit
| if .finished then .emit = null
elif .emit then .
else .emit = "Enter your \($prompts[.question]) name : "
if .finished then ., break $out
elif .emit then .
else empty
| (select(.emit) | .emit),
(select(.finished) | .array | summary) ;
jq -nrR -f sanitize-user-input.jq
Enter your first name :
Sorry, that name contains unacceptable characters. Please re-enter:
Enter your last name :
Enter your first name :
The following 1 person(s) have been added to the database:
John Doe
With the notorious exception of some older SQL languages, most languages never evaluate external input as code. Because of this,
sanitizing of user input in languages such as Julia is not needed unless the program is designed specifically to run user input
as a system command. The task given does not require such system level evaluation.
<syntaxhighlight lang="julia">import Base: string
const BLACKLIST = [
"drop", "delete", "erase", "kill", "wipe", "remove",
"file", "files", "directory", "directories",
"table", "tables", "record", "records", "database", "databases",
"system", "system32", "system64", "rm", "rf", "rmdir", "format", "reformat"
const PUNC = [''', '-']
const LETT = ['a':'z'; 'A':'Z']
function validator(s)
Validation of `s` requires:
`s` is valid utf-8
`s` only has chars that are in okc
`s` is not in the `blist``, and if `csense` is false (the default),
the lowercase version of `s` is not in the lowercase version of `blist`.
Returns (true, s) if valid and (false, error message) if invalid.
function validator(stri, okc = vcat(LETT, PUNC), blist = copy(BLACKLIST), csense = false)
s = ""
if !csense
blist = lowercase.(blist)
try # some binary sequences are invalid utf8 and may throw error
s = string(stri)
lcs = csense ? s : lowercase(s)
lcs ∈ blist && return false, "Sorry, name $s is forbidden."
any(x -> x ∉ okc, s) && return false, "Sorry, name $s contains bad characters."
catch y
return false, y
return true, s
""" class for Person with firstname and lastname identity strings """
struct Person
""" convert a Person to its string representation """
Base.string(p::Person) = "$(p.firstname) $(p.lastname)"
""" Add Persons to plist with validation by validator """
function addsanitized!(plist, validator = validator)
println("""\n INSTRUCTIONS
Enter new names as first name then last name.
Allowable characters are a through z (A-Z), along with ' and - in names.
Some words are reserved for use by the system and are thus excluded.
Enter two blank lines to exit (Hit Enter for a blank entry).
while true
print("Enter first name: ")
fn = strip(readline())
ok, firstname = validator(fn)
if !ok
print("Enter last name: ")
ln = strip(readline())
ok, lastname = validator(ln)
if !ok
firstname == "" && lastname == "" && break
push!(plist, Person(firstname, lastname))
return plist
const persons = addsanitized!(Person[])
println("\nAdded:\n", join(persons, "\n"))
As noted there is no magic "one size fits all" solution, and in the specific case of sql the use of sqlite3_prepare() and sqlite3_bind_text() is strongly recommended in preference to sqlite3_exec() or sqlite3_get_table(), at least for any questionable input. Using sqlite3_bind_text() there is no problem whatsoever with having a student named (say) "Robert'); DROP TABLE students;--".
Given some suspect [Phix] source code to be run, it is simply not practical to cover cases such as system(rot13(reverse("se- ze"))) or any of the other myriad ways in which harmful content could be disguised. In case you have not guessed, that would execute "rm -rf", assuming the code also contains a working rot13() implementation.
Of course you could block all use, even legitimate, of things like system(), as covered by [[Safe_mode]] and [[Untrusted_environment]], or whitelist as per the Raku entry below.
The inverse problem recently arose in p2js, whereby otherwise perfectly
valid code on desktop/Phix could and would generate invalid HTML/Javascript
if and when we tried to self-host (an effort which is still very much in progress, albeit not apace, btw):
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">header</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">"""
&lt;!DOCTYPE html&gt;
&lt;html lang="en" &gt;
&lt;scr!ipt src="p2js.js"&gt;&lt;/scr!ipt&gt;%%s%s
<span style="color: #000080;font-style:italic;">-- ...</span>
<span style="color: #000000;">header</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">substitute</span><span style="color: #0000FF;">(</span><span style="color: #000000;">header</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"scr!ipt"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"script"</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">puts</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #000000;">header</span><span style="color: #0000FF;">)</span> <span style="color: #000080;font-style:italic;">-- (make the example runnable)</span>
In other words I had to "sanitize" a constant in the source code, in this particular case, and I could have gone further and done something similar with all the other tags, but in practice there was no need to because the generated JavaScript was already always inside a script tag.
Line 40 ⟶ 274:
Unfortunately, this is very vague and hand-wavey due to the vagueness of the task description. Really, any language could copy/paste 95% or better of the above, change the language name, and be done with it. But until the task description is made a little more focused, it will have to do.
The following assumes that names are only valid if they contain ASCII letters, hyphens or apostrophes. However, the first or last character of a name can't be a punctuation character and a name must be between 1 and 20 characters long. A single character name is allowed to cater for an initial where the full name is not known. People are given a chance to abbreviate their names if they are too long.
No other characters are allowed including control characters, spaces, symbols, emojis and non-English letters. Names which include them are simply rejected.
Furthermore, there is a blacklist of unacceptable names though in practice this would probably be longer or more sophisticated than the one I've used here, depending on what will be done with the records later.
<syntaxhighlight lang="wren">import "./ioutil" for Input
import "./pattern" for Pattern
import "./str" for Str
import "./iterate" for Indexed
class Person {
construct new(firstName, lastName) {
_firstName = firstName
_lastName = lastName
firstName { _firstName }
lastName { _lastName }
toString { _firstName + " " + _lastName }
var persons = []
var blacklist = [
"drop", "delete", "erase", "kill", "wipe", "remove",
"file", "files", "directory", "directories",
"table", "tables", "record", "records", "database", "databases",
"system", "system32", "system64", "rm", "rf", "rmdir", "format", "reformat"
var punct = "'-" // allowable punctuation
var i = Pattern.letter + punct
var p ="+1&i", Pattern.whole, i)
var sanitizeInput = { |name|
var ok = p.isMatch(name) && !(punct.contains(name[0]) || punct.contains(name[-1]))
if (!ok) return "Sorry, your name contains unacceptable characters."
name = Str.lower(name)
if (blacklist.contains(name)) return "Sorry, your name is unacceptable."
return ""
for (i in 1..20) {
var names = List.filled(2, null)
var outer = false
for (se in["first", "last "])) {
var name = Input.text("Enter your %(se.value) name : ", 1, 20)
var msg =
if (msg != "") {
System.print(msg + "\n")
outer = true
names[se.index] = name
if (outer) continue
persons.add([0], names[1]))
var count = persons.count
System.print("The following %(count) person(s) have been added to the database:")
for (person in persons) System.print(person)</syntaxhighlight>
Sample (abridged) input/output. The ninth person's name contains a tab character.
Enter your first name : Mickey_mouse
Sorry, your name contains unacceptable characters.
Enter your first name : Bobby
Enter your last name : Tables
Sorry, your name is unacceptable.
Enter your first name : Fred
Enter your last name : rm -rf/
Sorry, your name contains unacceptable characters.
Enter your first name : David
Enter your last name : Wipe
Sorry, your name is unacceptable.
Enter your first name : Beyoncé
Sorry, your name contains unacceptable characters.
Enter your first name : A-12
Sorry, your name contains unacceptable characters.
Enter your first name : 'Andrew-
Sorry, your name contains unacceptable characters.
Enter your first name : 👨👨‍👩‍👦
Sorry, your name contains unacceptable characters.
Enter your first name : Don ald
Sorry, your name contains unacceptable characters.
Enter your first name : Eric
Enter your last name : Schäfer
Sorry, your name contains unacceptable characters.
Enter your first name : Blaine
Enter your last name : Wolfeschlegelsteinhausenbergerdorff
Must have a length between 1 and 20 characters, try again.
Enter your last name : Wolfeschlegelstein'f
Enter your first name : Marilyn
Enter your last name : Monroe
Enter your first name : Bridget
Enter your last name : O'Riley
... (plus another 7 acceptable people)
The following 10 person(s) have been added to the database:
Blaine Wolfeschlegelstein'f
Marilyn Monroe
Bridget O'Riley
... (plus 7 more)
