Talk:XML/Input: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎Interpreting XML?: Task description needs to be updated.)
(→‎Interpreting XML?: AWK not really suitable)
Line 11: Line 11:
::Donal, the problem is that AWK implementation does not interpret the structure at all. It is quite possible to do some parsing even if there are no ready-made library routines for that. But that does not mean that we should implement a full XML parser. The task should be kept relatively simple.
::Donal, the problem is that AWK implementation does not interpret the structure at all. It is quite possible to do some parsing even if there are no ready-made library routines for that. But that does not mean that we should implement a full XML parser. The task should be kept relatively simple.
::I notice that the XML input file has now been changed. But the the task description needs to be changed, too. --[[User:PauliKL|PauliKL]] 09:14, 2 June 2009 (UTC)
::I notice that the XML input file has now been changed. But the the task description needs to be changed, too. --[[User:PauliKL|PauliKL]] 09:14, 2 June 2009 (UTC)
::Being the poster of the AWK solution, I have to admit it was a bit tongue-in-cheek - but also true to the XP rule "do the simplest thing that might possibly work", which the original code did for the original task. But rather than implement an XML parser in AWK, I'm rather ok with withdrawing the AWK code. --[[User:Suchenwi|Suchenwi]] 10:17, 2 June 2009 (UTC)

Revision as of 10:17, 2 June 2009

Interpreting XML?

The name of this task is XML Reading. Are we supposed to interpret the XML structure, or just extract the names in this particular example?

The AWK implementation only extracts any text between double quotes. That would not be useful in any practical purpose. I think the task should at least require to extract only the contents of the fields named "Name". Maybe the example input file should contain some other fields that are not to be extracted. --PauliKL 13:00, 1 June 2009 (UTC)

I'm tempted to say let the AWK example stand with comments about how it is scraping the XML and not properly parsing it; disappointingly many languages have to do it that way anyway and it is a common (if nasty) technique. —Donal Fellows 13:25, 1 June 2009 (UTC)
This task should definitely require stuctured XML parsing. We already have Web Scraping for more ad-hoc methods. To aid this, I would change the XML to something less trivial. --IanOsgood 19:04, 1 June 2009 (UTC)
I added a entity numeric character reference, since XML processors in general need to be able to handle & and the full character set. --Kevin Reid 00:44, 2 June 2009 (UTC)
Are you suggesting that the program should convert HTML entities and numeric references into some character encoding? I think that should be a separate task. And, AFAIK, it is HTML specific, not XML. --PauliKL 09:03, 2 June 2009 (UTC)
Donal, the problem is that AWK implementation does not interpret the structure at all. It is quite possible to do some parsing even if there are no ready-made library routines for that. But that does not mean that we should implement a full XML parser. The task should be kept relatively simple.
I notice that the XML input file has now been changed. But the the task description needs to be changed, too. --PauliKL 09:14, 2 June 2009 (UTC)
Being the poster of the AWK solution, I have to admit it was a bit tongue-in-cheek - but also true to the XP rule "do the simplest thing that might possibly work", which the original code did for the original task. But rather than implement an XML parser in AWK, I'm rather ok with withdrawing the AWK code. --Suchenwi 10:17, 2 June 2009 (UTC)