Talk:URL decoding: Difference between revisions

no edit summary
mNo edit summary
No edit summary
 
Line 43:
:::::::::::::::::You are correct my mistake. Kevin's function is based on the extended ascii table. For the character "á" it encodes it as %E1 (which works in most browsers) however Kevin's function can't independently decode UTF-8 %C3%A1 back to "á", rather that depends on the OS (locale settings). The reason for UTF-8 is because most browsers and HTML pages encode in UTF-8 format so when you do web scrapping and want to extract a URL (say, a href link tag), it's encoded in UTF-8, and if you then want to display part of that URL (say, the name of a search term) you have to convert it back to visible characters. I've yet to find an OS-independent way to do it (in Awk) that doesn't rely on an external tool (such as Bill Poser's [http://billposer.org/Software/uni2ascii.html ascii2uni] .. which isn't very portable as an external tool). Really what I'm looking for is an Awk program that will covert RFC 2396 URI format (e.g. %C3%A9) -> Unicode, independent of locale settings. -- [[User:3havj7t3nps8z8wij3g9|3havj7t3nps8z8wij3g9]] ([[User talk:3havj7t3nps8z8wij3g9|talk]]) 16:14, 29 May 2015 (UTC)
:::::::::::::::::: Why not make the locale settings a part of the implementation? --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 17:42, 29 May 2015 (UTC)
::::::::::::::::::: Right. There's no way to modify the environment from within awk, but make it requirement before running. Or a wrapper bash script. -- [[User:3havj7t3nps8z8wij3g9|3havj7t3nps8z8wij3g9]] ([[User talk:3havj7t3nps8z8wij3g9|talk]]) 23:36, 29 May 2015 (UTC)