URL encoding: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎Tcl: Added implementation)
Line 75: Line 75:
=={{header|PureBasic}}==
=={{header|PureBasic}}==
<lang PureBasic>URL$ = URLEncoder("http://foo bar/")</lang>
<lang PureBasic>URL$ = URLEncoder("http://foo bar/")</lang>

=={{header|Tcl}}==
<lang tcl># Encode all except "unreserved" characters; use UTF-8 for extended chars.
# See http://tools.ietf.org/html/rfc3986 §2.4 and §2.5
proc urlEncode {str} {
set uStr [encoding convertto utf-8 $str]
set chRE {[^-A-Za-z0-9._~\n]}; # Newline is special case!
set replacement {%[format "%02X" [scan "\\\0" "%c"]]}
return [string map {"\n" "%0A"} [subst [regsub -all $chRE $uStr $replacement]]]
}</lang>
Demonstrating:
<lang tcl>puts [urlEncode "http://foo bar/"]</lang>
Output:
<pre>http%3A%2F%2Ffoo%20bar%2F%E2%82%AC</pre>

Revision as of 13:57, 18 June 2011

URL encoding is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

The task is to provide a function or mechanism to convert a provided string into URL encoding representation.

In URL encoding, special characters, control characters and extended characters are converted into a percent symbol followed by a two digit hexadecimal code, So a space character encodes into %20 within the string.

The following characters require conversion:

  • ASCII control codes (Character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal).
  • ASCII symbols (Character ranges 32-47 decimal (20-2F hex))
  • ASCII symbols (Character ranges 58-64 decimal (3A-40 hex))
  • ASCII symbols (Character ranges 91-96 decimal (5B-60 hex))
  • ASCII symbols (Character ranges 123-126 decimal (7B-7E hex))
  • Extended characters with character codes of 128 decimal (80 hex) and above.

Example

The string "http://foo bar/" would be encoded as "http%3A%2F%2Ffoo%20bar%2F".

See also

URL decoding

Icon and Unicon

<lang Icon>link hexcvt

procedure main() write("text = ",image(u := "http://foo bar/")) write("encoded = ",image(ue := encodeURL(u))) end

procedure encodeURL(s) #: encode data for inclusion in a URL/URI static en initial { # build lookup table for everything

  en := table()
  every en[c := !string(~(&digits++&letters))] := "%"||hexstring(ord(c),2)
  every /en[c := !string(&cset)] := c
  }

every (c := "") ||:= en[!s] # re-encode everything return c end </lang>

hexcvt provides hexstring

Output:

text    = "http://foo bar/"
encoded = "http%3A%2F%2Ffoo%20bar%2F"

J

J has a urlencode in the gethttp package, but this task requires that all non-alphanumeric characters be encoded.

Here's an implementation that does that:

<lang j>require'strings convert' urlencode=: rplc&((#~2|_1 47 57 64 90 96 122 I.i.@#)a.;"_1'%',.hfd i.#a.)</lang>

Example use:

<lang j> urlencode 'http://foo bar/' http%3A%2F%2Ffoo%20bar%2F</lang>

Perl

Use standard CGI module: <lang perl>use 5.10.0; use CGI;

my $s = 'http://foo/bar/'; say $s = CGI::escape($s); say $s = CGI::unescape($s);</lang>

PureBasic

<lang PureBasic>URL$ = URLEncoder("http://foo bar/")</lang>

Tcl

<lang tcl># Encode all except "unreserved" characters; use UTF-8 for extended chars.

  1. See http://tools.ietf.org/html/rfc3986 §2.4 and §2.5

proc urlEncode {str} {

   set uStr [encoding convertto utf-8 $str]
   set chRE {[^-A-Za-z0-9._~\n]};		# Newline is special case!
   set replacement {%[format "%02X" [scan "\\\0" "%c"]]}
   return [string map {"\n" "%0A"} [subst [regsub -all $chRE $uStr $replacement]]]

}</lang> Demonstrating: <lang tcl>puts [urlEncode "http://foo bar/"]</lang> Output:

http%3A%2F%2Ffoo%20bar%2F%E2%82%AC