Separate the house number from the street name: Difference between revisions

From Rosetta Code
Content added Content deleted
m (more cleanup)
No edit summary
Line 1: Line 1:
{{Draft task}}
{{Draft task}}
In Germany and the Netherlands has a postal address the form street name followed by the house number in accordance with the national standards DIN 5008 respectively NEN 5825.
In Germany and the Netherlands has a postal address the form street name followed by the house number in accordance with the national standards DIN 5008 respectively NEN 5825. The problem is that some street names has numbers e.g. special years and some [[http://en.wikipedia.org/wiki/House_numbering#Europe house numbers]] has characters as an extension.
;Task:
;Task:


Line 37: Line 37:
Marktplatz 31
Marktplatz 31
Schmidener Weg 3
Schmidener Weg 3
Karl-Weysser-Str. 6</pre>
Karl-Weysser-Str. 6</pre>The Scala solution has the right separations.

{{header|PL/SQL}}
{{header|PL/SQL}}
<lang PL/SQL></lang>
<lang PL/SQL> _ </lang>


{{header|Scala}}
{{header|Scala}}
Line 81: Line 82:
|Karl-Weysser-Str. 6""".stripMargin
|Karl-Weysser-Str. 6""".stripMargin


val matcher = new Regex("""(\s\d+[-/]\d+)|([^’'\S]\d+\D*)$""")
val extractor = new Regex("""(\s\d+[-/]\d+)|(\s(?!1940|1945)\d+[a-zI. /]*\d*)$""")


def splitsAdressen(input: String) = (matcher.split(input).mkString, matcher.findFirstIn(input).getOrElse(""))
def splitsAdressen(input: String) = (extractor.split(input).mkString, extractor.findFirstIn(input).getOrElse(""))


fromString(adressen).getLines.foreach(s => println(f"$s%-25s split as ${splitsAdressen(s)}"))
adressen.lines.foreach(s => println(f"$s%-25s split as ${splitsAdressen(s)}"))
}</lang>
}</lang>
{{out}}
{{out}}
Line 101: Line 102:
Laan 1940-’45 66 split as (Laan 1940-’45, 66)
Laan 1940-’45 66 split as (Laan 1940-’45, 66)
Laan ’40-’45 split as (Laan ’40-’45,)
Laan ’40-’45 split as (Laan ’40-’45,)
Langeloërduinen 3 46 split as (Langeloërduinen 3, 46)
Langeloërduinen 3 46 split as (Langeloërduinen, 3 46)
Marienwaerdt 2e Dreef 2 split as (Marienwaerdt 2e Dreef, 2)
Marienwaerdt 2e Dreef 2 split as (Marienwaerdt 2e Dreef, 2)
Provincialeweg N205 1 split as (Provincialeweg N205, 1)
Provincialeweg N205 1 split as (Provincialeweg N205, 1)
Rivium 2e Straat 59. split as (Rivium 2e Straat, 59.)
Rivium 2e Straat 59. split as (Rivium 2e Straat, 59.)
Nieuwe gracht 20zw /2 split as (Nieuwe gracht 20zw /2,)
Nieuwe gracht 20rd split as (Nieuwe gracht, 20rd)
Nieuwe gracht 20zw/3 split as (Nieuwe gracht 20zw/3,)
Nieuwe gracht 20rd 2 split as (Nieuwe gracht, 20rd 2)
Nieuwe gracht 20 zw/4 split as (Nieuwe gracht 20 zw/4,)
Nieuwe gracht 20zw /2 split as (Nieuwe gracht, 20zw /2)
Nieuwe gracht 20zw/3 split as (Nieuwe gracht, 20zw/3)
Nieuwe gracht 20 zw/4 split as (Nieuwe gracht, 20 zw/4)
Bahnhofstr. 4 split as (Bahnhofstr., 4)
Bahnhofstr. 4 split as (Bahnhofstr., 4)
Wertstr. 10 split as (Wertstr., 10)
Wertstr. 10 split as (Wertstr., 10)
Line 119: Line 122:
Schmidener Weg 3 split as (Schmidener Weg, 3)
Schmidener Weg 3 split as (Schmidener Weg, 3)
Karl-Weysser-Str. 6 split as (Karl-Weysser-Str., 6)</pre>
Karl-Weysser-Str. 6 split as (Karl-Weysser-Str., 6)</pre>

{{header|Scala}}
<lang Scala></lang>

{{header|Scala}}
<lang Scala></lang>


{{header|Scala}}
{{header|Scala}}
Line 130: Line 127:


{{header|J}}
{{header|J}}
{{needs-review|lang}}
'''Solution''' (''native''):<lang j> din5008 =: split~ i.&1@:e.&'0123456789'</lang>
'''Solution''' (''native''):<lang j> din5008 =: split~ i.&1@:e.&'0123456789'</lang>
'''Solution''' (''regex''):<lang j> require'regex'
'''Solution''' (''regex''):<lang j> require'regex'

Revision as of 15:21, 10 June 2014

Separate the house number from the street name is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In Germany and the Netherlands has a postal address the form street name followed by the house number in accordance with the national standards DIN 5008 respectively NEN 5825. The problem is that some street names has numbers e.g. special years and some [house numbers] has characters as an extension.

Task

Develop a program which separates the house number form the street name and represents them both. A test-set:

Plataanstraat 5
Straat 12
Straat 12 II
Dr. J. Straat   12
Dr. J. Straat 12 a
Dr. J. Straat 12-14
Laan 1940 – 1945 37
Plein 1940 2
1213-laan 11
16 april 1944 Pad 1
1e Kruisweg 36
Laan 1940-’45 66
Laan ’40-’45
Langeloërduinen 3 46
Marienwaerdt 2e Dreef 2
Provincialeweg N205 1
Rivium 2e Straat 59.
Nieuwe gracht 20rd
Nieuwe gracht 20rd 2
Nieuwe gracht 20zw /2
Nieuwe gracht 20zw/3
Nieuwe gracht 20 zw/4
Bahnhofstr. 4
Wertstr. 10
Lindenhof 1
Nordesch 20
Weilstr. 6
Harthauer Weg 2
Mainaustr. 49
August-Horch-Str. 3
Marktplatz 31
Schmidener Weg 3
Karl-Weysser-Str. 6

The Scala solution has the right separations.

PL/SQL <lang PL/SQL> _ </lang>

Scala <lang Scala>import scala.io.Source.fromString import scala.util.matching.Regex

object HouseNumber extends App {

 def adressen =
   """Plataanstraat 5
   |Straat 12
   |Straat 12 II
   |Dr. J. Straat   12
   |Dr. J. Straat 12 a
   |Dr. J. Straat 12-14
   |Laan 1940 – 1945 37
   |Plein 1940 2
   |1213-laan 11
   |16 april 1944 Pad 1
   |1e Kruisweg 36
   |Laan 1940-’45 66
   |Laan ’40-’45
   |Langeloërduinen 3 46
   |Marienwaerdt 2e Dreef 2
   |Provincialeweg N205 1
   |Rivium 2e Straat 59.
   |Nieuwe gracht 20rd
   |Nieuwe gracht 20rd 2
   |Nieuwe gracht 20zw /2
   |Nieuwe gracht 20zw/3
   |Nieuwe gracht 20 zw/4
   |Bahnhofstr. 4
   |Wertstr. 10
   |Lindenhof 1
   |Nordesch 20
   |Weilstr. 6
   |Harthauer Weg 2
   |Mainaustr. 49
   |August-Horch-Str. 3
   |Marktplatz 31
   |Schmidener Weg 3
   |Karl-Weysser-Str. 6""".stripMargin
 val extractor = new Regex("""(\s\d+[-/]\d+)|(\s(?!1940|1945)\d+[a-zI. /]*\d*)$""")
 def splitsAdressen(input: String) = (extractor.split(input).mkString, extractor.findFirstIn(input).getOrElse(""))
 adressen.lines.foreach(s => println(f"$s%-25s split as ${splitsAdressen(s)}"))

}</lang>

Output:
Plataanstraat 5           split as (Plataanstraat, 5)
Straat 12                 split as (Straat, 12)
Straat 12 II              split as (Straat, 12 II)
Dr. J. Straat   12        split as (Dr. J. Straat  , 12)
Dr. J. Straat 12 a        split as (Dr. J. Straat, 12 a)
Dr. J. Straat 12-14       split as (Dr. J. Straat, 12-14)
Laan 1940 – 1945 37       split as (Laan 1940 – 1945, 37)
Plein 1940 2              split as (Plein 1940, 2)
1213-laan 11              split as (1213-laan, 11)
16 april 1944 Pad 1       split as (16 april 1944 Pad, 1)
1e Kruisweg 36            split as (1e Kruisweg, 36)
Laan 1940-’45 66          split as (Laan 1940-’45, 66)
Laan ’40-’45              split as (Laan ’40-’45,)
Langeloërduinen 3 46      split as (Langeloërduinen, 3 46)
Marienwaerdt 2e Dreef 2   split as (Marienwaerdt 2e Dreef, 2)
Provincialeweg N205 1     split as (Provincialeweg N205, 1)
Rivium 2e Straat 59.      split as (Rivium 2e Straat, 59.)
Nieuwe gracht 20rd        split as (Nieuwe gracht, 20rd)
Nieuwe gracht 20rd 2      split as (Nieuwe gracht, 20rd 2)
Nieuwe gracht 20zw /2     split as (Nieuwe gracht, 20zw /2)
Nieuwe gracht 20zw/3      split as (Nieuwe gracht, 20zw/3)
Nieuwe gracht 20 zw/4     split as (Nieuwe gracht, 20 zw/4)
Bahnhofstr. 4             split as (Bahnhofstr., 4)
Wertstr. 10               split as (Wertstr., 10)
Lindenhof 1               split as (Lindenhof, 1)
Nordesch 20               split as (Nordesch, 20)
Weilstr. 6                split as (Weilstr., 6)
Harthauer Weg 2           split as (Harthauer Weg, 2)
Mainaustr. 49             split as (Mainaustr., 49)
August-Horch-Str. 3       split as (August-Horch-Str., 3)
Marktplatz 31             split as (Marktplatz, 31)
Schmidener Weg 3          split as (Schmidener Weg, 3)
Karl-Weysser-Str. 6       split as (Karl-Weysser-Str., 6)

Scala <lang Scala></lang>

J

This example may be incorrect due to a recent change in the task requirements or a lack of testing. Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

Solution (native):<lang j> din5008 =: split~ i.&1@:e.&'0123456789'</lang> Solution (regex):<lang j> require'regex'

  din5008 =: split~ [: {.@, '\d'&rxmatch</lang>

Example:<lang j> din5008"1 ];._2 noun define Straat 12 Straat 12 II Dr. J. Straat 12 Dr. J. Straat 12 a Dr. J. Straat 12-14 Laan 1940 – 1945 37 Plein 1940 2 1213-laan 11 16 april 1944 Pad 1 1e Kruisweg 36 Laan 1940-’45 66 Laan ’40-’45 Langeloërduinen 3 46 Marienwaerdt 2e Dreef 2 Provincialeweg N205 1 Rivium 2e Straat 59. Nieuwe gracht 20rd Nieuwe gracht 20rd 2 Nieuwe gracht 20zw /2 Nieuwe gracht 20zw/3 Nieuwe gracht 20 zw/4 Bahnhofstr. 4 Wertstr. 10 Lindenhof 1 Nordesch 20 Weilstr. 6 Harthauer Weg 2 Mainaustr. 49 August-Horch-Str. 3 Marktplatz 31 Schmidener Weg 3 Karl-Weysser-Str. 6 ) +------------------+-----------------------+ |Straat |12 | +------------------+-----------------------+ |Straat |12 II | +------------------+-----------------------+ |Dr. J. Straat |12 | +------------------+-----------------------+ |Dr. J. Straat |12 a | +------------------+-----------------------+ |Dr. J. Straat |12-14 | +------------------+-----------------------+ |Laan |1940 – 1945 37 | +------------------+-----------------------+ |Plein |1940 2 | +------------------+-----------------------+ | |1213-laan 11 | +------------------+-----------------------+ | |16 april 1944 Pad 1 | +------------------+-----------------------+ | |1e Kruisweg 36 | +------------------+-----------------------+ |Laan |1940-’45 66 | +------------------+-----------------------+ |Laan ’ |40-’45 | +------------------+-----------------------+ |Langeloërduinen |3 46 | +------------------+-----------------------+ |Marienwaerdt |2e Dreef 2 | +------------------+-----------------------+ |Provincialeweg N |205 1 | +------------------+-----------------------+ |Rivium |2e Straat 59. | +------------------+-----------------------+ |Nieuwe gracht |20rd | +------------------+-----------------------+ |Nieuwe gracht |20rd 2 | +------------------+-----------------------+ |Nieuwe gracht |20zw /2 | +------------------+-----------------------+ |Nieuwe gracht |20zw/3 | +------------------+-----------------------+ |Nieuwe gracht |20 zw/4 | +------------------+-----------------------+ |Bahnhofstr. |4 | +------------------+-----------------------+ |Wertstr. |10 | +------------------+-----------------------+ |Lindenhof |1 | +------------------+-----------------------+ |Nordesch |20 | +------------------+-----------------------+ |Weilstr. |6 | +------------------+-----------------------+ |Harthauer Weg |2 | +------------------+-----------------------+ |Mainaustr. |49 | +------------------+-----------------------+ |August-Horch-Str. |3 | +------------------+-----------------------+ |Marktplatz |31 | +------------------+-----------------------+ |Schmidener Weg |3 | +------------------+-----------------------+ |Karl-Weysser-Str. |6 | +------------------+-----------------------+</lang> Notes:I'm jumping on this task very early in its development; at the moment, it lacks explicit rules for identifying the location where the house number begins. So, since I don't read German or Dutch, pending more explicit rules, I'm going to assume the number starts at the first decimal digit in the string and continues to the end, and that everything preceding that point is considered the street name.