Talk:Find URI in text: Difference between revisions

m (→‎Expected Output Needed: whats encoded)
Line 30:
[http://www.ietf.org/rfc/rfc3986.txt RFC 3986] defines URIs and does not allow Unicode; however, the IETF addresses this in [http://www.ietf.org/rfc/rfc3987.txt RFC 3987] via the IRI mechanism which is related but separate. The syntactic definitions are very similar where most of the elements are extended. Two lower level elements are added 'iprivate' and 'ucschar' which are specific ranges of two byte percent encoded values. These elements percolate up through most of the higher syntax elements such as the authority, paths, and segments which have i-versions. Other elements such as 'scheme' and the IP address elements are left alone. There is also no 'ireserved' element. --[[User:Dgamey|Dgamey]] 14:50, 8 January 2012 (UTC)
: Having worked on a couple of projects that involve parsing things defined by RFCs I've found that, unless it's a use once and throw away solution, straying from the RFC or reinterpreting them is generally asking for trouble. --[[User:Dgamey|Dgamey]] 14:50, 8 January 2012 (UTC)
:: there is also the general rule: be strict in what you produce, but be liberal in what you accept. i believe this applies here. but thank you for pointing to RFC 3987. looks like that is exactly what i meant, and i wouldn't mind if that is used as a base to decide what is valid and what not. however, i believe that using "any text" except <code>" ' < > </code> and whitespace as delimiters at the end of an URI is sufficient for most use cases.
::as for once off or throwaway code, i see rosettacode not as a place to provide finalized libraries but code that anyone can use as a starting point to implement their own. for that i favor simpler code that is easier to understand and modify rather than complete code that solves all edge cases which a user may not even be interested in.--[[User:EMBee|eMBee]] 15:55, 8 January 2012 (UTC)
Anonymous user