Talk:Find URI in text: Difference between revisions

Line 23:

:: This gets a bit into the details. The link is encoded with &auml; which is allowed in a URI. If it's Unicode then it is not technically a URI but an IRI (see below). --[[User:Dgamey|Dgamey]] 15:19, 8 January 2012 (UTC)

::: (no it isn't (ok, i don't have a german keyboard and i was just lazy ;-)) i am not talking about the encoding here in the text, but the display in the browser address bar. (imagine looking at a screenshot). it is conceivable and to be expectd that a person would type such an address as she sees it, and expect it to work.--[[User:EMBee|eMBee]] 15:29, 8 January 2012 (UTC)

:::: That would be the wiki then. The character I get back is a single byte extended ASCII value of 228 or xe4. --[[User:Dgamey|Dgamey]] 15:32, 8 January 2012 (UTC)

:it is not necessary to copy the example input exactly. if you can think of other examples that are worth testing, please include them too.

:as for the expected output, this is a question of the balance beween following the rfc and handling user expectations. for example, a <code> . </code> or <code> , </code> at the end of a URI is most likely not part of the URI according to user expectation, but it is a legal character in the RFC. which rule is better? i don't know. until someone can show a live URI that has <code> . </code> or <code> , </code> at the end i am inclined to remove them. in contrast the <code>()</code> case is somewhat easier to decide. if there is a <code>(</code> before the URI, then clearly the <code>)</code> at the end is also not part of the URI, but there are edge-cases too.--[[User:EMBee|eMBee]] 06:58, 8 January 2012 (UTC)