Jump to content

URL parser: Difference between revisions

m
→‎{{header|Wren}}: Changed to Wren S/H
(→‎{{header|jq}}: note about decoding)
m (→‎{{header|Wren}}: Changed to Wren S/H)
 
(3 intermediate revisions by 2 users not shown)
Line 1,440:
 
=={{header|Java}}==
Java offers the ''URI'' class which will parse a URL. URNs are not supported.
In Java, you can use the <code>URI</code> class for this, so it's pretty straightforward. I just did a bit of tweaking to output.<syntaxhighlight lang="java">import java.net.URI;
<syntaxhighlight lang="java">
import java.net.URISyntaxException;
URI uri;
public class WebAddressParser{
try {
public static void main(String[] args){
uri = new parseAddressURI("foo://example.com:8042/over/there?name=ferret#nose");
} catch (URISyntaxException exception) {
parseAddress("urn:example:animal:ferret:nose");
/* invalid URI */
}
}
</syntaxhighlight>
You would then use any of the accompanying class methods to retrieve the value you're looking for.<br />
For example, to get the scheme value, you would use the following.
<syntaxhighlight lang="java">
uri.getScheme()
</syntaxhighlight>
It successfully parsed a majority of the test URLs.
<pre>
foo://example.com:8042/over/there?name=ferret#nose
scheme: foo
userinfo:
host: example.com
port: 8042
authority: example.com:8042
path: /over/there
query: name=ferret
fragment: nose
</pre>
To parse the 'jdbc:' URL you'll need to remove the prefix.
<pre>
mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true
scheme: mysql
userinfo: test_user:ouupppssss
host: localhost
port: 3306
authority: test_user:ouupppssss@localhost:3306
path: /sakila
query: profileSQL=true
fragment:
</pre>
The others work as expected, except for 'mailto', 'news', and 'tel'.<br />
Although, these are not too complicated to parse. For 'mailto', I used a class with two records.<br />
RFC 6068 defines a 'mailto' scheme. https://datatracker.ietf.org/doc/html/rfc6068.
<syntaxhighlight lang="java">
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;
 
public class MailTo {
static void parseAddress(String a){
private final To to;
System.out.println("Parsing " + a);
private List<Field> fields;
try{
 
public MailTo(String string) {
// this line does the work
if URI u(string == new URI(anull);
throw new NullPointerException();
 
if (string.isBlank() || !string.toLowerCase().startsWith("mailto:"))
System.out.println("\tscheme = " + u.getScheme());
System.out.printlnthrow new IllegalArgumentException("\tdomainRequires ='mailto' scheme" + u.getHost());
string = string.substring(string.indexOf(':') + 1);
System.out.println("\tport = " + (-1==u.getPort()?"default":u.getPort()));
/* we can use the 'URLDecoder' class to decode any entities */
System.out.println("\tpath = " + (null==u.getPath()?u.getSchemeSpecificPart():u.getPath()));
string = SystemURLDecoder.out.printlndecode("\tquerystring, = " + uStandardCharsets.getQuery()UTF_8);
/* the address and fields are separated by a '?' */
System.out.println("\tfragment = " + u.getFragment());
int indexOf = string.indexOf('?');
}
catchString[] (URISyntaxException x){address;
if System.err.println("Oops: "indexOf +== x-1);
address = string.split("@");
else {
address = string.substring(0, indexOf).split("@");
string = string.substring(indexOf + 1);
/* each field is separated by a '&' */
String[] fields = string.split("&");
String[] field;
this.fields = new ArrayList<>(fields.length);
for (String value : fields) {
field = value.split("=");
this.fields.add(new Field(field[0], field[1]));
}
}
to = new To(address[0], address[1]);
}
 
record To(String user, String host) { }
record Field(String name, String value) { }
}
</syntaxhighlight>I'm only showing two examples, but the others work too, honest.
I ran a majority of the examples from RFC 6068 and got the expected results
{{Out}}
<pre>
<pre>Parsing foo://example.com:8042/over/there?name=ferret#nose
mailto:infobot@example.com?subject=current-issue
scheme = foo
user infobot
domain = example.com
host example.com
port = 8042
name subject
path = /over/there
value current-issue
query = name=ferret
 
fragment = nose
mailto:list@example.org?In-Reply-To=%3C3469A91.D10AF4C@example.com%3E
Parsing urn:example:animal:ferret:nose
user list
scheme = urn
host example.org
domain = null
name In-Reply-To
port = default
value <3469A91.D10AF4C@example.com>
path = example:animal:ferret:nose
</pre>
query = null
Both 'tel' and 'news' are similar, and actually simpler.<br />
fragment = null</pre>
RFC 3966 outlines the 'tel' syntaxs. https://datatracker.ietf.org/doc/html/rfc3966.<br />
And RFC 5538 outlines the 'news' syntax. https://www.rfc-editor.org/rfc/rfc5538.html.
 
=={{header|JavaScript}}==
Line 3,544 ⟶ 3,601:
Uses the URI library which implements a Raku grammar based on the RFC 3986 BNF grammar.
<syntaxhighlight lang="raku" line>use URI;
use URI::Escape;
 
my @test-uris = <
Line 3,557 ⟶ 3,615:
telnet://192.0.2.16:80/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
ssh://alice@example.com
https://bob:pass@example.com/place
http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64
>;
 
Line 3,566 ⟶ 3,627:
for <scheme host port path query frag> -> $t {
my $token = try {$u."$t"()} || '';
say "$t:\t", uri-unescape $token.Str if $token;
}
say '';
Line 3,619 ⟶ 3,680:
URI: tel:+1-816-555-1212
scheme: tel
path: + 1-816-555-1212
 
URI: telnet://192.0.2.16:80/
Line 3,630 ⟶ 3,691:
scheme: urn
path: oasis:names:specification:docbook:dtd:xml:4.1.2
 
</pre>
URI: ssh://alice@example.com
scheme: ssh
host: example.com
port: 22
path:
 
URI: https://bob:pass@example.com/place
scheme: https
host: example.com
port: 443
path: /place
 
URI: http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64
scheme: http
host: example.com
port: 80
path: /
query: a=1&b=2 2&c=3&c=4&d=encoded</pre>
 
=={{header|Ruby}}==
Line 4,291 ⟶ 4,370:
{{trans|VBScript}}
... though modified quite a bit.
<syntaxhighlight lang="ecmascriptwren">var urlParse = Fn.new { |url|
var parseUrl = "URL = " + url
var index
9,482

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.