Talk:URL decoding

Task update suggestion: support for extended ascii UTF-8. -- 3havj7t3nps8z8wij3g9 (talk) 05:31, 26 May 2015 (UTC)

in what way? --Rdm (talk) 08:21, 26 May 2015 (UTC)

Say for example Google search `Abdu'l-Bahá .. https://www.google.com/search?q=%60Abdu%27l-Bah%C3%A1 .. how to decode %60Abdu%27l-Bah%C3%A1 = `Abdu'l-Bahá? -- 3havj7t3nps8z8wij3g9 (talk) 16:04, 26 May 2015 (UTC)

Any existing implementation should have no problem with the url https://www.google.com/search?q=%60Abdu%27l-Bah%C3%A1 - so it would be reasonable to add that as a test case. --Rdm (talk) 18:29, 26 May 2015 (UTC)

Ok added it as a test case. I know it breaks the Awk code. I left a note saying where to find working gawk code, but it lists every potential UTF-8 character so it's large (and given the possibilities not even complete). I suspect other languages could have similar problems. -- 3havj7t3nps8z8wij3g9 (talk) 00:47, 27 May 2015 (UTC)

I had no serious problem with the existing awk implementation on your new example. I did have two minor issues I needed to deal with:

The url being decoded is hardcoded into the example. I dealt with this by replacing the hardcoded url. A more general solution might place the url on stdin.
I use using LC_ALL=C which prevented display of text as utf-8. I dealt with this by unsetting that environmental variable. (LC_CTYPE and LANG might have similar effects, but I was not using them.)

I suspect that if you were encountering issues that they might be similar. --Rdm (talk) 04:31, 27 May 2015 (UTC)