Too broad

Right now, I think the task is a bit too broad. In particular, there's too many options for what to do and we should instead focus on a more restricted set of things that everyone can implement “the same”. Since we're really talking something like wikitext or markdown, we should use something like that (including allowing people to make use of useful libraries if they wish). Paragraphs are really the minimum; inline bold, italic and fixed-width are also very useful in the minimum set (if only there was single accepted standard for doing them…) –Donal Fellows 12:03, 5 January 2012 (UTC)

we are specifically not talking about wikitext or markdown, but text without any markup at all, except for indenting, empty lines and numbers and bullets, things you would use in plain text. specifically things like inline bold, italic and fixed-width are not possible without some kind of markup, and thus not what this task is looking for.
Bold and italic can be recognized by things like *foo* and /foo/ which people use in plain text anyway.
what i am looking for is to go beyond just recognizing paragraphs, to explore what else can be analyzed out of plain text. i very much expect that the task description will be in flux for a while until we can work out a reasonable set of requirements.
think of what you would get when using a commandline browser like lynx or w3m on a terminal with out colors or bold text. what you see there is potential input for this task. i do not expect all such input to be parsable, but a reasonable set that goes beyond just paragraphs.--eMBee 12:36, 5 January 2012 (UTC)
It is going to be difficult to compare implementations if none of them are doing the same thing. And, for example, the open ended concept of "plain text tables" pretty much guarantees that any implementation which does not ignore that part of the task will be different from any other implementation where a "copy of implementation" relationship does not exist. A lack of examples will also make comparison difficult. --Rdm 18:11, 5 January 2012 (UTC)

Concrete requirements?

    Recognize
a leading indentation.

Also hanging
    indentations.

   Block
   indentations.

A paragraph
* with
* bullets, some
of which are like this, but
the additional lines should line up with the first word.

Treat     this
as a      table
because   of
alignment.

A little convention for *bold* or /italic/ or _underline_ is not
such a terrible thing.

Horizontal rule:

----------------

+-----------------------+
| Box                   |
+-----------+-----------+
| Structure |           |
+-----------------------+
| How about it?         |
+-----------------------+

Of course, it should go without saying that, HTML characters like
< and & must be properly escaped.

But http://this.is.a/url turned into a link.

192.139.122.42 21:14, 5 January 2012 (UTC)

Return to "Text to HTML" page.