URL encoding

From Rosetta Code
URL encoding is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

The task is to provide a function or mechanism to convert a provided string into URL encoding representation.

In URL encoding, special characters, control characters and extended characters are converted into a percent symbol followed by a two digit hexadecimal code, So a space character encodes into %20 within the string.

The following characters require conversion:

  • ASCII control codes (Character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal).
  • ASCII symbols (Character ranges 32-47 decimal (20-2F hex))
  • ASCII symbols (Character ranges 58-64 decimal (3A-40 hex))
  • ASCII symbols (Character ranges 91-96 decimal (5B-60 hex))
  • ASCII symbols (Character ranges 123-126 decimal (7B-7E hex))
  • Extended characters with character codes of 128 decimal (80 hex) and above.

Therefore, every character except 0-9, A-Z and a-z requires conversion.

The standards give different rules: RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, section 2.3, says that "-._~" should not be encoded. HTML 5, section 4.10.22.5 URL-encoded form data, says to preserve "-._*", and to encode space " " to "+".

Example

The string "http://foo bar/" would be encoded as "http%3A%2F%2Ffoo%20bar%2F".

Options

It is permissible for an exception string (containing a set of symbols that do not need to be converted) to be utilized. However, this is an optional feature and is not a requirement of this task.

See also

URL decoding

AWK

This program converts " " to "+", because HTML 5 does so. The array ord[] uses idea from Character codes#AWK.

<lang awk>BEGIN { for (i = 0; i <= 255; i++) ord[sprintf("%c", i)] = i }

  1. Encode string with application/x-www-form-urlencoded escapes.

function escape(str, c, len, res) { len = length(str) res = "" for (i = 1; i <= len; i++) { c = substr(str, i, 1); if (c ~ /[-._*0-9A-Za-z]/) res = res c else if (c == " ") res = res "+" else res = res "%" sprintf("%02X", ord[c]) } return res }

  1. Escape every line of input.

{ print escape($0) }</lang>

Go

<lang go>package main

import ( "fmt" "http" "strings" )

func main() { url := http.URLEscape("http://foo bar/") // http.URLEscape replaces ' ' with '+', so: url = strings.Replace(url, "+", "%20", -1) fmt.Println(url) }</lang>

Icon and Unicon

<lang Icon>link hexcvt

procedure main() write("text = ",image(u := "http://foo bar/")) write("encoded = ",image(ue := encodeURL(u))) end

procedure encodeURL(s) #: encode data for inclusion in a URL/URI static en initial { # build lookup table for everything

  en := table()
  every en[c := !string(~(&digits++&letters))] := "%"||hexstring(ord(c),2)
  every /en[c := !string(&cset)] := c
  }

every (c := "") ||:= en[!s] # re-encode everything return c end </lang>

hexcvt provides hexstring

Output:

text    = "http://foo bar/"
encoded = "http%3A%2F%2Ffoo%20bar%2F"

J

J has a urlencode in the gethttp package, but this task requires that all non-alphanumeric characters be encoded.

Here's an implementation that does that:

<lang j>require'strings convert' urlencode=: rplc&((#~2|_1 47 57 64 90 96 122 I.i.@#)a.;"_1'%',.hfd i.#a.)</lang>

Example use:

<lang j> urlencode 'http://foo bar/' http%3A%2F%2Ffoo%20bar%2F</lang>

Java

The built-in URLEncoder in Java converts the space " " into a plus-sign "+" instead of "%20": <lang java>import java.io.UnsupportedEncodingException; import java.net.URLEncoder;

public class Main {

   public static void main(String[] args) throws UnsupportedEncodingException
   {
       String normal = "http://foo bar/";
       String encoded = URLEncoder.encode(normal, "utf-8");
       System.out.println(encoded);
   }

}</lang>

Output:

http%3A%2F%2Ffoo+bar%2F

JavaScript

Confusingly, there are 3 different URI encoding functions in JavaScript: escape(), encodeURI(), and encodeURIComponent(). Each of them encodes a different set of characters. See this article and this article for more information and comparisons. <lang javascript>var normal = 'http://foo/bar/'; var encoded = encodeURIComponent(normal);</lang>

Objective-C

Works with: Cocoa version Mac OS X 10.3+

<lang objc>NSString *normal = @"http://foo bar/"; NSString *encoded = [normal stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; NSLog(@"%@", encoded);</lang>

The Core Foundation function CFURLCreateStringByAddingPercentEscapes() provides more options.

Perl

<lang perl>use URI::Escape;

my $s = 'http://foo/bar/'; print uri_escape($s);</lang>

Use standard CGI module: <lang perl>use 5.10.0; use CGI;

my $s = 'http://foo/bar/'; say $s = CGI::escape($s); say $s = CGI::unescape($s);</lang>

Perl 6

<lang perl6>my $url = 'http://foo bar/';

say $url.subst(/<-[ A..Z a..z 0..9 ]>/, *.ord.fmt("%%%02X"), :g);</lang>

Output:

http%3A%2F%2Ffoo%20bar%2F

PHP

<lang php><?php $s = 'http://foo/bar/'; $s = rawurlencode($s); ?></lang> There is also urlencode(), which also encodes spaces as "+" signs

PicoLisp

<lang PicoLisp>(de urlEncodeTooMuch (Str)

  (pack
     (mapcar
        '((C)
           (if (or (>= "9" C "0") (>= "Z" (uppc C) "A"))
              C
              (list '% (hex (char C))) ) )
        (chop Str) ) ) )</lang>

Test:

: (urlEncodeTooMuch "http://foo bar/")
-> "http%3A%2F%2Ffoo%20bar%2F"

PureBasic

<lang PureBasic>URL$ = URLEncoder("http://foo bar/")</lang>

Python

<lang python>import urllib s = 'http://foo/bar/' s = urllib.quote(s)</lang> There is also urllib.quote_plus(), which also encodes spaces as "+" signs

Ruby

CGI.escape encodes all characters except '-.0-9A-Z_a-z'.

<lang ruby>require 'cgi' puts CGI.escape("http://foo bar/").sub("+", "%20")

  1. => "http%3A%2F%2Ffoo%20bar%2F"</lang>

URI.encode_www_form_component is a new method from Ruby 1.9.2. It obeys HTML 5 and encodes all characters except '-.0-9A-Z_a-z' and '*'.

Works with: Ruby version 1.9.2

<lang ruby>require 'uri' puts URI.encode_www_form_component("http://foo bar/").sub("+", "%20")

  1. => "http%3A%2F%2Ffoo%20bar%2F"</lang>

Programs should not call URI.encode, because it fails to encode some characters. URI.encode is obsolete since Ruby 1.9.2.

Tcl

<lang tcl># Encode all except "unreserved" characters; use UTF-8 for extended chars.

  1. See http://tools.ietf.org/html/rfc3986 §2.4 and §2.5

proc urlEncode {str} {

   set uStr [encoding convertto utf-8 $str]
   set chRE {[^-A-Za-z0-9._~\n]};		# Newline is special case!
   set replacement {%[format "%02X" [scan "\\\0" "%c"]]}
   return [string map {"\n" "%0A"} [subst [regsub -all $chRE $uStr $replacement]]]

}</lang> Demonstrating: <lang tcl>puts [urlEncode "http://foo bar/"]</lang> Output:

http%3A%2F%2Ffoo%20bar%2F%E2%82%AC

TUSCRIPT

<lang tuscript> $$ MODE TUSCRIPT text="http://foo bar/" BUILD S_TABLE spez_char="::>/:</::<%:" spez_char=STRINGS (text,spez_char) LOOP/CLEAR c=spez_char c=ENCODE(c,hex),c=concat("%",c),spez_char=APPEND(spez_char,c) ENDLOOP url_encoded=SUBSTITUTE(text,spez_char,0,0,spez_char) print "text: ", text PRINT "encoded: ", url_encoded </lang> Output:

text:    http://foo bar/
encoded: http%3A%2F%2Ffoo%20bar%2F