Talk:URL encoding: Difference between revisions

I think this task is ready for promotion.
No edit summary
(I think this task is ready for promotion.)
 
(3 intermediate revisions by 3 users not shown)
Line 15:
: --[[User:Dgamey|Dgamey]] 09:54, 17 June 2011 (UTC)
The point of encoding strings is to avoid confusion. Some characters, such as '+' and '?', tend to be metacharaters used by CGI interface (? for begining of query string, + for separating parameters), while '\r' '\n' must be encoded because they signify end of input; also encoding can carry whatever text not in low 127 bits and printable with "normal text", so dumber server or client software won't get totally confused. I don't know how much we need to conform to various RFCs here, maybe common sense would suffice. In principle you can escape the "http" too, and still conform to most standards, but that would be utterly pointless, wouldn't it? --[[User:Ledrug|Ledrug]] 02:23, 19 June 2011 (UTC)
 
:: I suppose as a bonus, we could provide an exception string, which contains a list of characters that do not become encoded. --[[User:Markhobley|Markhobley]] 17:57, 20 June 2011 (UTC)
 
== Encoding by RFC 3986 or HTML 5 ==
 
The current task lists six groups of characters to encode. The puzzle became, which groups of characters to preserve?
 
* The current task preserves only "0-9A-Za-z".
* My interpretation of RFC 3986 is to preserve "-._~0-9A-Za-z".
* My interpretation of HTML 5, [http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#url-encoded-form-data URL-encoded form data], is to preserve "-._*0-9A-Za-z" and to encode " " to "+".
 
I added this information to the task. If I understand well, RFC 3986 preserves '~' and encodes '*', while HTML 5 preserves '*' and encodes '~'. RFC 3986 also permits lowercase, so "http%3a%2f%2ffoo%20bar%2f" is valid. HTML 5 has specific rule to always encode to uppercase. --[[User:Kernigh|Kernigh]] 00:29, 31 July 2011 (UTC)
 
:I can think of several ways to approach this. One would be to move the information to another page and link to it from here. Another would be to change the task itself. That said, personally, I do not see much use in "preserving characters". There is a minor bulk advantage, but all encoded characters will pass through safely. So the safest interpretation of multiple standards would be to encode any character suggested by any of them (and there are a variety of standards...). --[[User:Rdm|Rdm]] 01:04, 1 August 2011 (UTC)
::I think we have mostly covered this now. The provision for an exception string allows for variations. For this task I don't really want a space to be encoded as a plus symbol, as this is not common and would also require a decoder with the reverse capability. A separate task is required for such a variation, if desired.
 
I think this task is ready for promotion. [[User:Markhobley|Markhobley]] 17:06, 13 August 2011 (UTC)