URL parser

From Rosetta Code
Revision as of 14:14, 17 October 2015 by rosettacode>Jabbson (Add Python code)
Task
URL parser
You are encouraged to solve this task according to the task description, using any language you may know.

URL are very common strings with a simple syntax:

scheme://[username:password@]domain[:port]/path?query_string#fragment_id

This task (which has nothing to do with URL encoding or URL decoding) is to parse a well-formed URL to retrieve the relevant information scheme, domain, path,...

According to the standards, the characters [!*'();:@&=+$,/?%#[]] only need to be percent-encoded in case of possible confusion. So warn the splits and regular expressions. Note also that, the path, query and fragment are case sensitive, even if the scheme and domain are not.

The way the returned information is provided (set of variables, array, structured, record, object,...) is language dependent and left to the programer, but the code should be clear enough to reuse.

Extra credit is given for clear errors diagnostic.



Test cases

According to T. Berners-Lee

foo://example.com:8042/over/there?name=ferret#nose should parse into:

  • scheme = foo
  • domain = example.com
  • port = :8042
  • path = over/there
  • query = name=ferret
  • fragment = nose

urn:example:animal:ferret:nose should parse into:

  • scheme = urn
  • path = example:animal:ferret:nose

other must parse include:

  • jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true
  • ftp://ftp.is.co.za/rfc/rfc1808.txt
  • http://www.ietf.org/rfc/rfc2396.txt#header1
  • ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two
  • mailto:John.Doe@example.com
  • news:comp.infosystems.www.servers.unix
  • tel:+1-816-555-1212
  • telnet://192.0.2.16:80/
  • urn:oasis:names:specification:docbook:dtd:xml:4.1.2

Elixir

<lang elixir>test_cases = [

 "foo://example.com:8042/over/there?name=ferret#nose",
 "urn:example:animal:ferret:nose",
 "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true",
 "ftp://ftp.is.co.za/rfc/rfc1808.txt",
 "http://www.ietf.org/rfc/rfc2396.txt#header1",
 "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two",
 "mailto:John.Doe@example.com",
 "news:comp.infosystems.www.servers.unix",
 "tel:+1-816-555-1212",
 "telnet://192.0.2.16:80/",
 "urn:oasis:names:specification:docbook:dtd:xml:4.1.2",
 "ssh://alice@example.com",
 "https://bob:pass@example.com/place",
 "http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64"

]

Enum.each(test_cases, fn str ->

 IO.puts "\n#{str}"
 IO.inspect URI.parse(str)

end)</lang>

Output:
foo://example.com:8042/over/there?name=ferret#nose
%URI{authority: "example.com:8042", fragment: "nose", host: "example.com",
 path: "/over/there", port: 8042, query: "name=ferret", scheme: "foo",
 userinfo: nil}

urn:example:animal:ferret:nose
%URI{authority: nil, fragment: nil, host: nil,
 path: "example:animal:ferret:nose", port: nil, query: nil, scheme: "urn",
 userinfo: nil}

jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true
%URI{authority: nil, fragment: nil, host: nil,
 path: "mysql://test_user:ouupppssss@localhost:3306/sakila", port: nil,
 query: "profileSQL=true", scheme: "jdbc", userinfo: nil}

ftp://ftp.is.co.za/rfc/rfc1808.txt
%URI{authority: "ftp.is.co.za", fragment: nil, host: "ftp.is.co.za",
 path: "/rfc/rfc1808.txt", port: 21, query: nil, scheme: "ftp", userinfo: nil}

http://www.ietf.org/rfc/rfc2396.txt#header1
%URI{authority: "www.ietf.org", fragment: "header1", host: "www.ietf.org",
 path: "/rfc/rfc2396.txt", port: 80, query: nil, scheme: "http", userinfo: nil}

ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two
%URI{authority: "2001:db8::7", fragment: nil, host: "2001:db8::7",
 path: "/c=GB", port: 389, query: "objectClass=one&objectClass=two",
 scheme: "ldap", userinfo: nil}

mailto:John.Doe@example.com
%URI{authority: nil, fragment: nil, host: nil, path: "John.Doe@example.com",
 port: nil, query: nil, scheme: "mailto", userinfo: nil}

news:comp.infosystems.www.servers.unix
%URI{authority: nil, fragment: nil, host: nil,
 path: "comp.infosystems.www.servers.unix", port: nil, query: nil,
 scheme: "news", userinfo: nil}

tel:+1-816-555-1212
%URI{authority: nil, fragment: nil, host: nil, path: "+1-816-555-1212",
 port: nil, query: nil, scheme: "tel", userinfo: nil}

telnet://192.0.2.16:80/
%URI{authority: "192.0.2.16:80", fragment: nil, host: "192.0.2.16", path: "/",
 port: 80, query: nil, scheme: "telnet", userinfo: nil}

urn:oasis:names:specification:docbook:dtd:xml:4.1.2
%URI{authority: nil, fragment: nil, host: nil,
 path: "oasis:names:specification:docbook:dtd:xml:4.1.2", port: nil, query: nil,
 scheme: "urn", userinfo: nil}

ssh://alice@example.com
%URI{authority: "alice@example.com", fragment: nil, host: "example.com",
 path: nil, port: nil, query: nil, scheme: "ssh", userinfo: "alice"}

https://bob:pass@example.com/place
%URI{authority: "bob:pass@example.com", fragment: nil, host: "example.com",
 path: "/place", port: 443, query: nil, scheme: "https", userinfo: "bob:pass"}

http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64
%URI{authority: "example.com", fragment: nil, host: "example.com", path: "/",
 port: 80, query: "a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64", scheme: "http",
 userinfo: nil}

Go

This uses Go's standard net/url package. The source code for this package (excluding tests) is in a single file of ~720 lines. <lang go>package main

import ( "fmt" "log" "net" "net/url" )

func main() { for _, in := range []string{ "foo://example.com:8042/over/there?name=ferret#nose", "urn:example:animal:ferret:nose", "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true", "ftp://ftp.is.co.za/rfc/rfc1808.txt", "http://www.ietf.org/rfc/rfc2396.txt#header1", "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two", "mailto:John.Doe@example.com", "news:comp.infosystems.www.servers.unix", "tel:+1-816-555-1212", "telnet://192.0.2.16:80/", "urn:oasis:names:specification:docbook:dtd:xml:4.1.2",

"ssh://alice@example.com", "https://bob:pass@example.com/place", "http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64", } { fmt.Println(in) u, err := url.Parse(in) if err != nil { log.Println(err) continue } if in != u.String() { fmt.Printf("Note: reassmebles as %q\n", u) } printURL(u) } }

func printURL(u *url.URL) { fmt.Println(" Scheme:", u.Scheme) if u.Opaque != "" { fmt.Println(" Opaque:", u.Opaque) } if u.User != nil { fmt.Println(" Username:", u.User.Username()) if pwd, ok := u.User.Password(); ok { fmt.Println(" Password:", pwd) } } if u.Host != "" { if host, port, err := net.SplitHostPort(u.Host); err == nil { fmt.Println(" Host:", host) fmt.Println(" Port:", port) } else { fmt.Println(" Host:", u.Host) } } if u.Path != "" { fmt.Println(" Path:", u.Path) } if u.RawQuery != "" { fmt.Println(" RawQuery:", u.RawQuery) m, err := url.ParseQuery(u.RawQuery) if err == nil { for k, v := range m { fmt.Printf(" Key: %q Values: %q\n", k, v) } } } if u.Fragment != "" { fmt.Println(" Fragment:", u.Fragment) } }</lang>

Output:
foo://example.com:8042/over/there?name=ferret#nose
    Scheme: foo
    Host: example.com
    Port: 8042
    Path: /over/there
    RawQuery: name=ferret
        Key: "name" Values: ["ferret"]
    Fragment: nose
urn:example:animal:ferret:nose
    Scheme: urn
    Opaque: example:animal:ferret:nose
jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true
    Scheme: jdbc
    Opaque: mysql://test_user:ouupppssss@localhost:3306/sakila
    RawQuery: profileSQL=true
        Key: "profileSQL" Values: ["true"]
ftp://ftp.is.co.za/rfc/rfc1808.txt
    Scheme: ftp
    Host: ftp.is.co.za
    Path: /rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt#header1
    Scheme: http
    Host: www.ietf.org
    Path: /rfc/rfc2396.txt
    Fragment: header1
ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two
    Scheme: ldap
    Host: [2001:db8::7]
    Path: /c=GB
    RawQuery: objectClass=one&objectClass=two
        Key: "objectClass" Values: ["one" "two"]
mailto:John.Doe@example.com
    Scheme: mailto
    Opaque: John.Doe@example.com
news:comp.infosystems.www.servers.unix
    Scheme: news
    Opaque: comp.infosystems.www.servers.unix
tel:+1-816-555-1212
    Scheme: tel
    Opaque: +1-816-555-1212
telnet://192.0.2.16:80/
    Scheme: telnet
    Host: 192.0.2.16
    Port: 80
    Path: /
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
    Scheme: urn
    Opaque: oasis:names:specification:docbook:dtd:xml:4.1.2
ssh://alice@example.com
    Scheme: ssh
    Username: alice
    Host: example.com
https://bob:pass@example.com/place
    Scheme: https
    Username: bob
    Password: pass
    Host: example.com
    Path: /place
http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64
    Scheme: http
    Host: example.com
    Path: /
    RawQuery: a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64
        Key: "a" Values: ["1"]
        Key: "b" Values: ["2 2"]
        Key: "c" Values: ["3" "4"]
        Key: "d" Values: ["encoded"]

J

As most errors are contextual (e.g. invalid authority, invalid path, unrecognized scheme), we shall defer error testing to the relevant consumers. This might offend some on the grounds of temporary safety, but consumers already bear responsibility to parse and validate their relevant uri element(s).

Our parsing strategy is fixed format recursive descent. (Please do not criticize this on efficiency grounds without first investigating the implementations of other parsers.)

Implementation:

<lang J>split=:1 :0

 ({. ; ] }.~ 1+[)~ i.&m

)

uriparts=:3 :0

 'server fragment'=. '#' split y
 'sa query'=. '?' split server
 'scheme authpath'=. ':' split sa
 scheme;authpath;query;fragment

)

queryparts=:3 :0

 (0<#y)#<;._1 '?',y

)

authpathparts=:3 :0

 if. '//' -: 2{.y do.
   split=. <;.1 y
   (}.1{::split);;2}.split
 else.
   ;y
 end.

)

authparts=:3 :0

 if. '@' e. y do.
   'userinfo hostport'=. '@' split y
 else.
   hostport=. y [ userinfo=.
 end.
 if. '[' = {.hostport do.
    'host_t port_t'=. ']' split hostport
    assert. (0=#port_t)+.':'={.port_t
    (':' split userinfo),(host_t,']');}.port_t
 else.
    (':' split userinfo),':' split hostport
 end.

)

taskparts=:3 :0

 'scheme authpath querystring fragment'=. uriparts y
 'auth path'=. authpathparts authpath
 'user creds host port'=. authparts auth
 query=. queryparts querystring
 export=. ;:'scheme user creds host port path query fragment'
 (#~ 0<#@>@{:"1) (,. do each) export

)</lang>

Task examples:

<lang j> taskparts 'foo://example.com:8042/over/there?name=ferret#nose' ┌────────┬─────────────┐ │scheme │foo │ ├────────┼─────────────┤ │host │example.com │ ├────────┼─────────────┤ │port │8042 │ ├────────┼─────────────┤ │path │/over/there │ ├────────┼─────────────┤ │query │┌───────────┐│ │ ││name=ferret││ │ │└───────────┘│ ├────────┼─────────────┤ │fragment│nose │ └────────┴─────────────┘

  taskparts 'urn:example:animal:ferret:nose'

┌──────┬──────────────────────────┐ │scheme│urn │ ├──────┼──────────────────────────┤ │path │example:animal:ferret:nose│ └──────┴──────────────────────────┘

  taskparts 'jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true'

┌──────┬──────────────────────────────────────────────────┐ │scheme│jdbc │ ├──────┼──────────────────────────────────────────────────┤ │path │mysql://test_user:ouupppssss@localhost:3306/sakila│ ├──────┼──────────────────────────────────────────────────┤ │query │┌───────────────┐ │ │ ││profileSQL=true│ │ │ │└───────────────┘ │ └──────┴──────────────────────────────────────────────────┘

  taskparts 'ftp://ftp.is.co.za/rfc/rfc1808.txt'

┌──────┬────────────────┐ │scheme│ftp │ ├──────┼────────────────┤ │host │ftp.is.co.za │ ├──────┼────────────────┤ │path │/rfc/rfc1808.txt│ └──────┴────────────────┘

  taskparts 'http://www.ietf.org/rfc/rfc2396.txt#header1'

┌────────┬────────────────┐ │scheme │http │ ├────────┼────────────────┤ │host │www.ietf.org │ ├────────┼────────────────┤ │path │/rfc/rfc2396.txt│ ├────────┼────────────────┤ │fragment│header1 │ └────────┴────────────────┘

  taskparts 'ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two'

┌──────┬─────────────────────────────────┐ │scheme│ldap │ ├──────┼─────────────────────────────────┤ │host │[2001:db8::7] │ ├──────┼─────────────────────────────────┤ │path │/c=GB │ ├──────┼─────────────────────────────────┤ │query │┌───────────────────────────────┐│ │ ││objectClass=one&objectClass=two││ │ │└───────────────────────────────┘│ └──────┴─────────────────────────────────┘

  taskparts 'mailto:John.Doe@example.com'

┌──────┬────────────────────┐ │scheme│mailto │ ├──────┼────────────────────┤ │path │John.Doe@example.com│ └──────┴────────────────────┘

  taskparts 'news:comp.infosystems.www.servers.unix'

┌──────┬─────────────────────────────────┐ │scheme│news │ ├──────┼─────────────────────────────────┤ │path │comp.infosystems.www.servers.unix│ └──────┴─────────────────────────────────┘

  taskparts 'tel:+1-816-555-1212'

┌──────┬───────────────┐ │scheme│tel │ ├──────┼───────────────┤ │path │+1-816-555-1212│ └──────┴───────────────┘

  taskparts 'telnet://192.0.2.16:80/'

┌──────┬──────────┐ │scheme│telnet │ ├──────┼──────────┤ │host │192.0.2.16│ ├──────┼──────────┤ │port │80 │ ├──────┼──────────┤ │path │/ │ └──────┴──────────┘

  taskparts 'urn:oasis:names:specification:docbook:dtd:xml:4.1.2'

┌──────┬───────────────────────────────────────────────┐ │scheme│urn │ ├──────┼───────────────────────────────────────────────┤ │path │oasis:names:specification:docbook:dtd:xml:4.1.2│ └──────┴───────────────────────────────────────────────┘</lang>

Note that the path of the example jdbc uri is itself a uri which may be parsed:

<lang J> taskparts 'mysql://test_user:ouupppssss@localhost:3306/sakila' ┌──────┬──────────┐ │scheme│mysql │ ├──────┼──────────┤ │user │test_user │ ├──────┼──────────┤ │pass │ouupppssss│ ├──────┼──────────┤ │host │localhost │ ├──────┼──────────┤ │port │3306 │ ├──────┼──────────┤ │path │/sakila │ └──────┴──────────┘</lang>

Also, examples borrowed from the go implementation:

<lang J> taskparts 'ssh://alice@example.com' ┌──────┬───────────┐ │scheme│ssh │ ├──────┼───────────┤ │user │alice │ ├──────┼───────────┤ │host │example.com│ └──────┴───────────┘

  taskparts 'https://bob:pass@example.com/place'

┌──────┬───────────┐ │scheme│https │ ├──────┼───────────┤ │user │bob │ ├──────┼───────────┤ │creds │pass │ ├──────┼───────────┤ │host │example.com│ ├──────┼───────────┤ │path │/place │ └──────┴───────────┘

  taskparts 'http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64'

┌──────┬───────────────────────────────────────────┐ │scheme│http │ ├──────┼───────────────────────────────────────────┤ │host │example.com │ ├──────┼───────────────────────────────────────────┤ │path │/ │ ├──────┼───────────────────────────────────────────┤ │query │┌─────────────────────────────────────────┐│ │ ││a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64││ │ │└─────────────────────────────────────────┘│ └──────┴───────────────────────────────────────────┘</lang>

Note that escape decoding is left to the consumer (as well as decoding things like '+' as a replacement for the space character and determining the absolute significance of relative paths and the details of ip address parsing and so on...). This seems like a good match to the hierarchical nature of uri parsing. See URL decoding for an implementation of escape decoding.

Note that taskparts was engineered specifically for the requirements of this task -- in idiomatic use you should instead expect to call the relevant ____parts routines directly as illustrated by the first four lines of taskparts.

Note that w3c recommends a handling for query strings which differs from that of RFC-3986. For example, the use of ; as replacement for the & delimiter, or the use of the query element name as the query element value when the = delimiter is omitted from the name/value pair. We do not implement that here, as it's not a part of this task. But that sort of implementation could be achieved by replacing the definition of queryparts. And, of course, other treatments of query strings are also possible, should that become necessary...

JavaScript

As JavaScript is (at the time of writing) still the native language of the DOM, the simplest first-pass approach will be to set the href property of a DOM element, and read off the various components of the DOM parse from that element.

Here is an example, tested against the JavaScript engines of current versions of Chrome and Safari, of taking this 'Gordian knot' approach to the task:

<lang JavaScript>(function (lstURL) {

   var e = document.createElement('a'),
       lstKeys = [
           'hash',
           'host',
           'hostname',
           'origin',
           'pathname',
           'port',
           'protocol',
           'search'
       ],
       fnURLParse = function (strURL) {
           e.href = strURL;
           return lstKeys.reduce(
               function (dct, k) {
                   dct[k] = e[k];
                   return dct;
               }, {}
           );
       };
   return JSON.stringify(
       lstURL.map(fnURLParse),
       null, 2
   );

})([

 "foo://example.com:8042/over/there?name=ferret#nose",
 "urn:example:animal:ferret:nose",
 "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true",
 "ftp://ftp.is.co.za/rfc/rfc1808.txt",
 "http://www.ietf.org/rfc/rfc2396.txt#header1",
 "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two",
 "mailto:John.Doe@example.com",
 "news:comp.infosystems.www.servers.unix",
 "tel:+1-816-555-1212",
 "telnet://192.0.2.16:80/",
 "urn:oasis:names:specification:docbook:dtd:xml:4.1.2",
 "ssh://alice@example.com",
 "https://bob:pass@example.com/place",
 "http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64"

]);</lang>

Results of applying this approach in the JavaScript of Safari 8 <lang JSON>[

 {
   "hash": "#nose",
   "host": "example.com:8042",
   "hostname": "example.com",
   "origin": "foo://example.com:8042",
   "pathname": "/over/there",
   "port": "8042",
   "protocol": "foo:",
   "search": "?name=ferret"
 },
 {
   "hash": "",
   "host": "",
   "hostname": "",
   "origin": "urn://",
   "pathname": "example:animal:ferret:nose",
   "port": "",
   "protocol": "urn:",
   "search": ""
 },
 {
   "hash": "",
   "host": "",
   "hostname": "",
   "origin": "jdbc://",
   "pathname": "mysql://test_user:ouupppssss@localhost:3306/sakila",
   "port": "",
   "protocol": "jdbc:",
   "search": "?profileSQL=true"
 },
 {
   "hash": "",
   "host": "ftp.is.co.za",
   "hostname": "ftp.is.co.za",
   "origin": "ftp://ftp.is.co.za",
   "pathname": "/rfc/rfc1808.txt",
   "port": "",
   "protocol": "ftp:",
   "search": ""
 },
 {
   "hash": "#header1",
   "host": "www.ietf.org",
   "hostname": "www.ietf.org",
   "origin": "http://www.ietf.org",
   "pathname": "/rfc/rfc2396.txt",
   "port": "",
   "protocol": "http:",
   "search": ""
 },
 {
   "hash": "",
   "host": "[2001:db8::7]",
   "hostname": "[2001:db8::7]",
   "origin": "ldap://[2001:db8::7]",
   "pathname": "/c=GB",
   "port": "",
   "protocol": "ldap:",
   "search": "?objectClass=one&objectClass=two"
 },
 {
   "hash": "",
   "host": "",
   "hostname": "",
   "origin": "mailto://",
   "pathname": "John.Doe@example.com",
   "port": "",
   "protocol": "mailto:",
   "search": ""
 },
 {
   "hash": "",
   "host": "",
   "hostname": "",
   "origin": "news://",
   "pathname": "comp.infosystems.www.servers.unix",
   "port": "",
   "protocol": "news:",
   "search": ""
 },
 {
   "hash": "",
   "host": "",
   "hostname": "",
   "origin": "tel://",
   "pathname": "+1-816-555-1212",
   "port": "",
   "protocol": "tel:",
   "search": ""
 },
 {
   "hash": "",
   "host": "192.0.2.16:80",
   "hostname": "192.0.2.16",
   "origin": "telnet://192.0.2.16:80",
   "pathname": "/",
   "port": "80",
   "protocol": "telnet:",
   "search": ""
 },
 {
   "hash": "",
   "host": "",
   "hostname": "",
   "origin": "urn://",
   "pathname": "oasis:names:specification:docbook:dtd:xml:4.1.2",
   "port": "",
   "protocol": "urn:",
   "search": ""
 },
 {
   "hash": "",
   "host": "example.com",
   "hostname": "example.com",
   "origin": "ssh://example.com",
   "pathname": "",
   "port": "",
   "protocol": "ssh:",
   "search": ""
 },
 {
   "hash": "",
   "host": "example.com",
   "hostname": "example.com",
   "origin": "https://example.com",
   "pathname": "/place",
   "port": "",
   "protocol": "https:",
   "search": ""
 },
 {
   "hash": "",
   "host": "example.com",
   "hostname": "example.com",
   "origin": "http://example.com",
   "pathname": "/",
   "port": "",
   "protocol": "http:",
   "search": "?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64"
 }

]</lang>

Julia

This solution uses Julia's URIParser package. The detailview function shows all of the non-empty components of the URI object created by this parser. No attempt is made to further parse more complex components, e.g. query or userinfo. Error detection is limited to indicating whether a string is parsable as a URI and providing a hint as to whether the URI is valid (according to this package's isvalid function). <lang Julia> using URIParser const FIELDS = names(URI)

function detailview(uri::URI, indentlen::Int=4)

   indent = " "^indentlen
   s = String[]
   for f in FIELDS
       d = string(getfield(uri, f))
       !isempty(d) || continue
       f != :port || d != "0" || continue
       push!(s, @sprintf("%s%s:  %s", indent, string(f), d))
   end
   join(s, "\n")

end

test = ["foo://example.com:8042/over/there?name=ferret#nose",

       "urn:example:animal:ferret:nose",
       "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true",
       "ftp://ftp.is.co.za/rfc/rfc1808.txt",
       "http://www.ietf.org/rfc/rfc2396.txt#header1",
       "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two",
       "mailto:John.Doe@example.com",
       "news:comp.infosystems.www.servers.unix",
       "tel:+1-816-555-1212",
       "telnet://192.0.2.16:80/",
       "urn:oasis:names:specification:docbook:dtd:xml:4.1.2",
       "This is not a URI!",
       "ssh://alice@example.com",
       "https://bob:pass@example.com/place",
       "http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64"]

isfirst = true for st in test

   if isfirst
       isfirst = false
   else
       println()
   end
   println("Attempting to parse\n  \"", st, "\" as a URI:")
   uri = try
       URI(st)
   catch
       println("URIParser failed to parse this URI, is it OK?")
       continue
   end
   print("This URI is parsable ")
   if isvalid(uri)
       println("and appears to be valid.")
   else
       println("but may be invalid.")
   end
   println(detailview(uri))

end </lang>

Output:
Attempting to parse
  "foo://example.com:8042/over/there?name=ferret#nose" as a URI:
This URI is parsable but may be invalid.
    schema:  foo
    host:  example.com
    port:  8042
    path:  /over/there
    query:  name=ferret
    fragment:  nose
    specifies_authority:  true

Attempting to parse
  "urn:example:animal:ferret:nose" as a URI:
This URI is parsable and appears to be valid.
    schema:  urn
    path:  example:animal:ferret:nose
    specifies_authority:  false

Attempting to parse
  "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true" as a URI:
This URI is parsable but may be invalid.
    schema:  jdbc
    path:  mysql://test_user:ouupppssss@localhost:3306/sakila
    query:  profileSQL=true
    specifies_authority:  false

Attempting to parse
  "ftp://ftp.is.co.za/rfc/rfc1808.txt" as a URI:
This URI is parsable and appears to be valid.
    schema:  ftp
    host:  ftp.is.co.za
    path:  /rfc/rfc1808.txt
    specifies_authority:  true

Attempting to parse
  "http://www.ietf.org/rfc/rfc2396.txt#header1" as a URI:
This URI is parsable and appears to be valid.
    schema:  http
    host:  www.ietf.org
    path:  /rfc/rfc2396.txt
    fragment:  header1
    specifies_authority:  true

Attempting to parse
  "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two" as a URI:
This URI is parsable and appears to be valid.
    schema:  ldap
    host:  2001:db8::7
    path:  /c=GB
    query:  objectClass=one&objectClass=two
    specifies_authority:  true

Attempting to parse
  "mailto:John.Doe@example.com" as a URI:
This URI is parsable and appears to be valid.
    schema:  mailto
    path:  John.Doe@example.com
    specifies_authority:  false

Attempting to parse
  "news:comp.infosystems.www.servers.unix" as a URI:
This URI is parsable and appears to be valid.
    schema:  news
    path:  comp.infosystems.www.servers.unix
    specifies_authority:  false

Attempting to parse
  "tel:+1-816-555-1212" as a URI:
This URI is parsable and appears to be valid.
    schema:  tel
    path:  +1-816-555-1212
    specifies_authority:  false

Attempting to parse
  "telnet://192.0.2.16:80/" as a URI:
This URI is parsable and appears to be valid.
    schema:  telnet
    host:  192.0.2.16
    port:  80
    path:  /
    specifies_authority:  true

Attempting to parse
  "urn:oasis:names:specification:docbook:dtd:xml:4.1.2" as a URI:
This URI is parsable and appears to be valid.
    schema:  urn
    path:  oasis:names:specification:docbook:dtd:xml:4.1.2
    specifies_authority:  false

Attempting to parse
  "This is not a URI!" as a URI:
URIParser failed to parse this URI, is it OK?

Attempting to parse
  "ssh://alice@example.com" as a URI:
This URI is parsable but may be invalid.
    schema:  ssh
    host:  example.com
    userinfo:  alice
    specifies_authority:  true

Attempting to parse
  "https://bob:pass@example.com/place" as a URI:
This URI is parsable and appears to be valid.
    schema:  https
    host:  example.com
    path:  /place
    userinfo:  bob:pass
    specifies_authority:  true

Attempting to parse
  "http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64" as a URI:
This URI is parsable and appears to be valid.
    schema:  http
    host:  example.com
    path:  /
    query:  a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64
    specifies_authority:  true

Python

Links to Python Documentation: v2: [1], v3: [2] <lang Python>import urllib.parse as up # urllib.parse for Python v3

url = up.urlparse('http://user:pass@example.com:8081/path/file.html;params?query1=1#fragment')

print('url.scheme = ', url.scheme) print('url.netloc = ', url.netloc) print('url.hostname = ', url.hostname) print('url.port = ', url.port) print('url.path = ', url.path) print('url.params = ', url.params) print('url.query = ', url.query) print('url.fragment = ', url.fragment) print('url.username = ', url.username) print('url.password = ', url.password) </lang>

Output:
url.scheme =  http
url.netloc =  user:pass@example.com:8081
url.hostname =  example.com
url.port =  8081
url.path =  /path/file.html
url.params =  params
url.query =  query1=1
url.fragment =  fragment
url.username =  user
url.password =  pass

Racket

Links: url structure in Racket documentation.

<lang racket>#lang racket/base (require racket/match net/url) (define (debug-url-string U)

 (match-define (url s u h p pa? (list (path/param pas prms) ...) q f) (string->url U))
 (printf "URL: ~s~%" U)
 (printf "-----~a~%" (make-string (string-length (format "~s" U)) #\-))
 (when #t          (printf "scheme:         ~s~%" s))
 (when u           (printf "user:           ~s~%" u))
 (when h           (printf "host:           ~s~%" h))
 (when p           (printf "port:           ~s~%" p))
 ;; From documentation link in text:
 ;; > For Unix paths, the root directory is not included in `path';
 ;; > its presence or absence is implicit in the path-absolute? flag.
 (printf "path-absolute?: ~s~%" pa?)
 (printf "path  bits:     ~s~%" pas)
 ;; prms will often be a list of lists. this will print iff
 ;; one of the inner lists is not null
 (when (memf pair? prms) 
   (printf "param bits:     ~s [interleaved with path bits]~%" prms))
 (unless (null? q) (printf "query:          ~s~%" q))
 (when f           (printf "fragment:       ~s~%" f))
 (newline))

(for-each

debug-url-string
'("foo://example.com:8042/over/there?name=ferret#nose"
  "urn:example:animal:ferret:nose"
  "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true"
  "ftp://ftp.is.co.za/rfc/rfc1808.txt"
  "http://www.ietf.org/rfc/rfc2396.txt#header1"
  "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two"
  "mailto:John.Doe@example.com"
  "news:comp.infosystems.www.servers.unix"
  "tel:+1-816-555-1212"
  "telnet://192.0.2.16:80/"
  "urn:oasis:names:specification:docbook:dtd:xml:4.1.2"))</lang>
Output:
URL: "foo://example.com:8042/over/there?name=ferret#nose"
---------------------------------------------------------
scheme:         "foo"
host:           "example.com"
port:           8042
path-absolute?: #t
path  bits:     ("over" "there")
query:          ((name . "ferret"))
fragment:       "nose"

URL: "urn:example:animal:ferret:nose"
-------------------------------------
scheme:         "urn"
path-absolute?: #f
path  bits:     ("example:animal:ferret:nose")

URL: "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true"
------------------------------------------------------------------------------
scheme:         "jdbc"
path-absolute?: #f
path  bits:     ("mysql:" "" "test_user:ouupppssss@localhost:3306" "sakila")
query:          ((profileSQL . "true"))

URL: "ftp://ftp.is.co.za/rfc/rfc1808.txt"
-----------------------------------------
scheme:         "ftp"
host:           "ftp.is.co.za"
path-absolute?: #t
path  bits:     ("rfc" "rfc1808.txt")

URL: "http://www.ietf.org/rfc/rfc2396.txt#header1"
--------------------------------------------------
scheme:         "http"
host:           "www.ietf.org"
path-absolute?: #t
path  bits:     ("rfc" "rfc2396.txt")
fragment:       "header1"

URL: "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two"
----------------------------------------------------------------
scheme:         "ldap"
host:           "[2001"
path-absolute?: #f
path  bits:     ("db8::7]" "c=GB")
query:          ((objectClass . "one") (objectClass . "two"))

IPv6 URL address parses incorrectly. See issue https://github.com/plt/racket/issues/980

URL: "mailto:John.Doe@example.com"
----------------------------------
scheme:         "mailto"
path-absolute?: #f
path  bits:     ("John.Doe@example.com")

URL: "news:comp.infosystems.www.servers.unix"
---------------------------------------------
scheme:         "news"
path-absolute?: #f
path  bits:     ("comp.infosystems.www.servers.unix")

URL: "tel:+1-816-555-1212"
--------------------------
scheme:         "tel"
path-absolute?: #f
path  bits:     ("+1-816-555-1212")

URL: "telnet://192.0.2.16:80/"
------------------------------
scheme:         "telnet"
host:           "192.0.2.16"
port:           80
path-absolute?: #t
path  bits:     ("")

URL: "urn:oasis:names:specification:docbook:dtd:xml:4.1.2"
----------------------------------------------------------
scheme:         "urn"
path-absolute?: #f
path  bits:     ("oasis:names:specification:docbook:dtd:xml:4.1.2")

Tcl

Library: tcllib

Tcllib's uri package already knows how to decompose many kinds of URIs. The implementation is a a quite readable example of this kind of parsing. For this task, we'll use it directly.

Schemes can be added with uri::register, but the rules for this task assume HTTP-style decomposition for unknown schemes, which is done below by reaching into the documented interfaces $::uri::schemes and uri::SplitHttp.

For some URI types (such as urn, news, mailto), this provides more information than the task description demands, which is simply to parse them all as HTTP URIs.

The uri package doesn't presently handle IPv6 syntx as used in the example: a bug and patch will be submitted presently ..

<lang Tcl>package require uri package require uri::urn

  1. a little bit of trickery to format results:

proc pdict {d} {

   array set \t $d
   parray \t

}

proc parse_uri {uri} {

   regexp {^(.*?):(.*)$} $uri -> scheme rest
   if {$scheme in $::uri::schemes} {
       # uri already knows how to split it:
       set parts [uri::split $uri]
   } else {
       # parse as though it's http:
       set parts [uri::SplitHttp $rest]
       dict set parts scheme $scheme
   }
   dict filter $parts value ?* ;# omit empty sections

}

set tests {

   foo://example.com:8042/over/there?name=ferret#nose
   urn:example:animal:ferret:nose
   jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true
   ftp://ftp.is.co.za/rfc/rfc1808.txt
   http://www.ietf.org/rfc/rfc2396.txt#header1
   ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two
   mailto:John.Doe@example.com
   news:comp.infosystems.www.servers.unix
   tel:+1-816-555-1212
   telnet://192.0.2.16:80/
   urn:oasis:names:specification:docbook:dtd:xml:4.1.2 

}

foreach uri $tests {

   puts \n$uri
   pdict [parse_uri $uri]

}</lang>

Output:
foo://example.com:8042/over/there?name=ferret#nose
	(fragment) = nose
	(host)     = example.com
	(path)     = over/there
	(port)     = 8042
	(query)    = name=ferret
	(scheme)   = foo

urn:example:animal:ferret:nose
	(nid)    = example
	(nss)    = animal:ferret:nose
	(scheme) = urn

jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true
	(path)   = mysql://test_user:ouupppssss@localhost:3306/sakila
	(query)  = profileSQL=true
	(scheme) = jdbc

ftp://ftp.is.co.za/rfc/rfc1808.txt
	(host)   = ftp.is.co.za
	(path)   = rfc/rfc1808.txt
	(scheme) = ftp

http://www.ietf.org/rfc/rfc2396.txt#header1
	(fragment) = header1
	(host)     = www.ietf.org
	(path)     = rfc/rfc2396.txt
	(scheme)   = http

ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two
	(host)   = [2001
	(scheme) = ldap

mailto:John.Doe@example.com
	(host)   = example.com
	(scheme) = mailto
	(user)   = John.Doe

news:comp.infosystems.www.servers.unix
	(newsgroup-name) = comp.infosystems.www.servers.unix
	(scheme)         = news

tel:+1-816-555-1212
	(path)   = +1-816-555-1212
	(scheme) = tel

telnet://192.0.2.16:80/
	(host)   = 192.0.2.16
	(port)   = 80
	(scheme) = telnet

urn:oasis:names:specification:docbook:dtd:xml:4.1.2
	(nid)    = oasis
	(nss)    = names:specification:docbook:dtd:xml:4.1.2
	(scheme) = urn

VBScript

<lang vb> Function parse_url(url) parse_url = "URL: " & url If InStr(url,"//") Then 'parse the scheme scheme = Split(url,"//") parse_url = parse_url & vbcrlf & "Scheme: " & Mid(scheme(0),1,Len(scheme(0))-1) 'parse the domain domain = Split(scheme(1),"/") 'check if the domain includes a username, password, and port If InStr(domain(0),"@") Then cred = Split(domain(0),"@") If InStr(cred(0),".") Then username = Mid(cred(0),1,InStr(1,cred(0),".")-1) password = Mid(cred(0),InStr(1,cred(0),".")+1,Len(cred(0))-InStr(1,cred(0),".")) ElseIf InStr(cred(0),":") Then username = Mid(cred(0),1,InStr(1,cred(0),":")-1) password = Mid(cred(0),InStr(1,cred(0),":")+1,Len(cred(0))-InStr(1,cred(0),":")) End If parse_url = parse_url & vbcrlf & "Username: " & username & vbCrLf &_ "Password: " & password 'check if the domain have a port If InStr(cred(1),":") Then host = Mid(cred(1),1,InStr(1,cred(1),":")-1) port = Mid(cred(1),InStr(1,cred(1),":")+1,Len(cred(1))-InStr(1,cred(1),":")) parse_url = parse_url & vbCrLf & "Domain: " & host & vbCrLf & "Port: " & port Else parse_url = parse_url & vbCrLf & "Domain: " & cred(1) End If ElseIf InStr(domain(0),":") And Instr(domain(0),"[") = False And Instr(domain(0),"]") = False Then host = Mid(domain(0),1,InStr(1,domain(0),":")-1) port = Mid(domain(0),InStr(1,domain(0),":")+1,Len(domain(0))-InStr(1,domain(0),":")) parse_url = parse_url & vbCrLf & "Domain: " & host & vbCrLf & "Port: " & port ElseIf Instr(domain(0),"[") And Instr(domain(0),"]:") Then host = Mid(domain(0),1,InStr(1,domain(0),"]")) port = Mid(domain(0),InStr(1,domain(0),"]")+2,Len(domain(0))-(InStr(1,domain(0),"]")+1)) parse_url = parse_url & vbCrLf & "Domain: " & host & vbCrLf & "Port: " & port Else parse_url = parse_url & vbCrLf & "Domain: " & domain(0) End If 'parse the path if exist If UBound(domain) > 0 Then For i = 1 To UBound(domain) If i < UBound(domain) Then path = path & domain(i) & "/" ElseIf InStr(domain(i),"?") Then path = path & Mid(domain(i),1,InStr(1,domain(i),"?")-1) If InStr(domain(i),"#") Then query = Mid(domain(i),InStr(1,domain(i),"?")+1,InStr(1,domain(i),"#")-InStr(1,domain(i),"?")-1) fragment = Mid(domain(i),InStr(1,domain(i),"#")+1,Len(domain(i))-InStr(1,domain(i),"#")) path = path & vbcrlf & "Query: " & query & vbCrLf & "Fragment: " & fragment Else query = Mid(domain(i),InStr(1,domain(i),"?")+1,Len(domain(i))-InStr(1,domain(i),"?")) path = path & vbcrlf & "Query: " & query End If ElseIf InStr(domain(i),"#") Then fragment = Mid(domain(i),InStr(1,domain(i),"#")+1,Len(domain(i))-InStr(1,domain(i),"#")) path = path & Mid(domain(i),1,InStr(1,domain(i),"#")-1) & vbCrLf &_ "Fragment: " & fragment Else path = path & domain(i) End If Next parse_url = parse_url & vbCrLf & "Path: " & path End If ElseIf InStr(url,":") Then scheme = Mid(url,1,InStr(1,url,":")-1) path = Mid(url,InStr(1,url,":")+1,Len(url)-InStr(1,url,":")) parse_url = parse_url & vbcrlf & "Scheme: " & scheme & vbCrLf & "Path: " & path Else parse_url = parse_url & vbcrlf & "Invalid!!!" End If

End Function

'test the convoluted function :-( WScript.StdOut.WriteLine parse_url("foo://example.com:8042/over/there?name=ferret#nose") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("ftp://ftp.is.co.za/rfc/rfc1808.txt") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("http://www.ietf.org/rfc/rfc2396.txt#header1") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("mailto:John.Doe@example.com") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("news:comp.infosystems.www.servers.unix") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("tel:+1-816-555-1212") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("telnet://192.0.2.16:80/") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("urn:oasis:names:specification:docbook:dtd:xml:4.1.2") WScript.StdOut.WriteLine "-------------------------------" WScript.StdOut.WriteLine parse_url("this code is messy, long, and needs a makeover!!!") </lang>

Output:
URL: foo://example.com:8042/over/there?name=ferret#nose
Scheme: foo
Domain: example.com
Port: 8042
Path: over/there
Query: name=ferret
Fragment: nose
-------------------------------
URL: jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true
Scheme: jdbc:mysql
Username: test_user
Password: ouupppssss
Domain: localhost
Port: 3306
Path: sakila
Query: profileSQL=true
-------------------------------
URL: ftp://ftp.is.co.za/rfc/rfc1808.txt
Scheme: ftp
Domain: ftp.is.co.za
Path: rfc/rfc1808.txt
-------------------------------
URL: http://www.ietf.org/rfc/rfc2396.txt#header1
Scheme: http
Domain: www.ietf.org
Path: rfc/rfc2396.txt
Fragment: header1
-------------------------------
URL: ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two
Scheme: ldap
Domain: [2001:db8::7]
Path: c=GB
Query: objectClass=one&objectClass=two
-------------------------------
URL: mailto:John.Doe@example.com
Scheme: mailto
Path: John.Doe@example.com
-------------------------------
URL: news:comp.infosystems.www.servers.unix
Scheme: news
Path: comp.infosystems.www.servers.unix
-------------------------------
URL: tel:+1-816-555-1212
Scheme: tel
Path: +1-816-555-1212
-------------------------------
URL: telnet://192.0.2.16:80/
Scheme: telnet
Domain: 192.0.2.16
Port: 80
Path: 
-------------------------------
URL: urn:oasis:names:specification:docbook:dtd:xml:4.1.2
Scheme: urn
Path: oasis:names:specification:docbook:dtd:xml:4.1.2
-------------------------------
URL: this code is messy, long, and needs a makeover!!!
Invalid!!!