Talk:URL encoding: Difference between revisions

Content added Content deleted
(we could provide an exception string)
Line 17: Line 17:


:: I suppose as a bonus, we could provide an exception string, which contains a list of characters that do not become encoded. --[[User:Markhobley|Markhobley]] 17:57, 20 June 2011 (UTC)
:: I suppose as a bonus, we could provide an exception string, which contains a list of characters that do not become encoded. --[[User:Markhobley|Markhobley]] 17:57, 20 June 2011 (UTC)

== Encoding by RFC 3986 or HTML 5 ==

The current task lists six groups of characters to encode. The puzzle became, which groups of characters to preserve?

* The current task preserves only "0-9A-Za-z".
* My interpretation of RFC 3986 is to preserve "-._~0-9A-Za-z".
* My interpretation of HTML 5, [http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#url-encoded-form-data URL-encoded form data], is to preserve "-._*0-9A-Za-z" and to encode " " to "+".

I added this information to the task. If I understand well, RFC 3986 preserves '~' and encodes '*', while HTML 5 preserves '*' and encodes '~'. RFC 3986 also permits lowercase, so "http%3a%2f%2ffoo%20bar%2f" is valid. HTML 5 has specific rule to always encode to uppercase. --[[User:Kernigh|Kernigh]] 00:29, 31 July 2011 (UTC)