Content added Content deleted

Inline

Revision as of 10:37, 5 May 2015

When developing a Website it is occasionally necessary to handle text that is received without formatting, and present it in a pleasing manner. to achieve this the text needs to be converted to HTML.

Write a converter from plain text to HTML.

The plain text has no formatting information.

It may have centered headlines, numbered sections, paragraphs, lists, and URIs. It could even have tables.

Simple converters restrict themselves at identifying paragraphs, but i believe more can be done if the text is analyzed.

You are not requested to copy the algorithm from the existing solutions but use whatever faculties available in your language to best solve the problem.

The only requirement is to ensure that the result is valid xhtml.

This task seems like it's very under-defined, but the discussion seems to be headed towards a simple markdown specification. I therefore do this with a small interface to cmark to render commonmark text.

(Note that this is not some cooked code, it's coming from code that I'm using to render class notes, and hopefully it will be useful to have such an example here. It cetrainly seems to me as a useful thing compared to some half-baked not-really-markdown-or-anything implementation.)

lang at-exp racket

(require ffi/unsafe ffi/unsafe/define)

(define-ffi-definer defcmark (ffi-lib "libcmark"))

(define _cmark_opts

 (_bitmask '(sourcepos = 1 hardbreaks = 2 normalize = 4 smart = 8)))

(define-cpointer-type _node) (defcmark cmark_markdown_to_html

 (_fun [bs : _bytes] [_int = (bytes-length bs)] _cmark_opts
       -> [r : _bytes] -> (begin0 (bytes->string/utf-8 r) (free r))))

(define (cmark-markdown-to-html #:options [opts '(normalize smart)] . text)

   (cmark_markdown_to_html (string->bytes/utf-8 (string-append* text)) opts))

(display @cmark-markdown-to-html{

 This is
 a paragraph

     a block of
     code

 * A one-bullet list
   > With quoted text
   >
   >     and code

}) </lang>

Output:

<p>This is
a paragraph</p>
<pre><code>a block of
code
</code></pre>
<ul>
<li>A one-bullet list
<blockquote>
<p>With quoted text</p>
<pre><code>and code
</code></pre>
</blockquote>
</li>
</ul>

Tcl

This renderer doesn't do all that much. Indeed, it deliberately avoids doing all the complexity that is possible; instead it seeks to just provide the minimum that could possibly be useful to someone who is doing very simple text pages. <lang tcl>package require Tcl 8.5

proc splitParagraphs {text} {

   split [regsub -all {\n\s*(\n\s*)+} [string trim $text] \u0000] "\u0000"

} proc determineParagraph {para} {

   set para [regsub -all {\s*\n\s*} $para " "]
   switch -regexp -- $para {

{^\s*\*+\s} { return [list ul [string trimleft $para " \t*"]] } {^\s*\d+\.\s} { set para [string trimleft $para " \t\n0123456789"] set para [string range $para 1 end] return [list ol [string trimleft $para " \t"]] } {^#+\s} { return [list heading [string trimleft $para " \t#"]] }

   }
   return [list normal $para]

} proc markupParagraphContent {para} {

   set para [string map {& & < < > >} $para]
   regsub -all {_([\w&;]+)_} $para {\1} para
   regsub -all {\*([\w&;]+)\*} $para {\1} para
   regsub -all {`([\w&;]+)`} $para {\1} para
   return $para

}

proc markupText {title text} {

   set title [string map {& & < < > >} $title]
   set result "<html>"
   append result "<head><title>" $title "</title>\n</head>"

append result "<body>" "

$title

\n"

   set state normal
   foreach para [splitParagraphs $text] {

lassign [determineParagraph $para] type para set para [markupParagraphContent $para] switch $state,$type {

normal,normal {append result "

" $para "

\n"}

normal,heading {

append result "

" $para "

\n"

set type normal }

normal,ol {append result "

" $para "

" $para "

" $para "

ul,heading {

" $para "

set type normal }

" $para "
" $para "

" $para "

ol,heading {

" "

" $para "

\n"

set type normal }

ol,ol {append result "

" $para "

\n"} ol,ul {append result "" "

" $para "

} set state $type

   }
   if {$state ne "normal"} {

append result "</$state>"

   }
   return [append result "</body></html>"]

}</lang> Here's an example of how it would be used. <lang tcl>set sample " This is an example of how a pseudo-markdown-ish formatting scheme could work. It's really much simpler than markdown, but does support a few things.

Block paragraph types

This is a bulleted list

And this is the second item in it

1. Here's a numbered list

2. Second item

3. Third item

Inline formatting types

The formatter can render text with _italics_, *bold* and in a `typewriter` font. It also does the right thing with <angle brackets> and &ersands, but relies on the encoding of the characters to be conveyed separately."

puts [markupText "Sample" $sample]</lang>

Output:

<lang html><html><head><title>Sample</title>

Sample

This is an example of how a pseudo-markdown-ish formatting scheme could work. It's really much simpler than markdown, but does support a few things.

Block paragraph types

This is a bulleted list
And this is the second item in it

Here's a numbered list
Second item
Third item

Inline formatting types

The formatter can render text with italics, bold and in a typewriter font. It also does the right thing with <angle brackets> and &ersands, but relies on the encoding of the characters to be conveyed separately.

</body></html></lang>

Text to HTML: Difference between revisions

Revision as of 10:37, 5 May 2015

Pike

Racket

Tcl

$title

" $para "

" $para "

" $para "

Sample

Block paragraph types

Inline formatting types