Talk:URL encoding

From Rosetta Code
Revision as of 13:01, 17 June 2011 by rosettacode>Markhobley (This task is not restricted to HTTP urls)

I believe that some symbols are usually not encoded in URLs. The exact list varies, but they usually include the period (.) and hyphen (-), and sometimes underscore (_). Should we not include those? This is complicated by the fact that there are several standards on URI syntax (RFC 1738, RFC 3986), additional restrictions for specific protocols, like HTTP (e.g. the plus character (+) is encoded in form data), as well as lots of slightly different implementations across languages (and sometimes even in the same language). So whatever solutions that people present that use library functions will invariably encode a slightly smaller set of characters than in the task specification. It would be hard to keep all the solutions consistent. --98.210.210.193 07:06, 17 June 2011 (UTC)

Nailing this down would help since there are two tasks dependent on this (URL encoding and decoding). Sorting out and making sense of the current set of RFCs is probably a prerequisite.
RFC 3986 is about URIs and updates 1738 - these two appear to be the most relavent RFCs
RFC 1738 is about URLs
Superseding RFCs may only supersede some of the functionality (such as for a protocol like gopher)
Superseded RFCs should be ignored
As this task seems to be about HTTP URLs we should ignore some of the RFCs for other protocols like mail, tn3270, etc. There are also RFCs that extend functionality such as for extensions of protocols such as WebDav which would seem not to be part of the core task. Also, some of these RFC's have been marked as 'historic' a polite way of sayng obsolete.
This task is not restricted to HTTP urls, and can be applied to any string that can be encoded into this format.
I believe the example of an encoded url is in error (or not described properly). Specifically,
The string "http://foo bar/" would be encoded as "http%3A%2F%2Ffoo%20bar%2F".
Would only be encoded if this URL were being passed as data within another URL. See the RFC sections on Reserved Characters and When to Encode or Decode.
The task is to demonstrate the encoding mechanism, rather than when to use the application of this, so we can assume that this will be used in applications where the URL string requires encoding. --Markhobley 13:01, 17 June 2011 (UTC)
There probably should be soome required input(s) and output(s). I noticed the perl example is very cryptic using a library and provides no output. The output it would produce doesn't match the 'example' string as it only encodes data in the path portion of the URL and not the entire URL.
--Dgamey 09:54, 17 June 2011 (UTC)