Web scraping: Difference between revisions

From Rosetta Code
Content added Content deleted
m (Punctuation, link)
(Added Java)
Line 4: Line 4:
Only use libraries that come at no ''extra'' monetary cost with the programming language and that are widely available and popular such as [http://www.cpan.org/ CPAN] for Perl or [http://www.boost.org/ Boost] for C++.
Only use libraries that come at no ''extra'' monetary cost with the programming language and that are widely available and popular such as [http://www.cpan.org/ CPAN] for Perl or [http://www.boost.org/ Boost] for C++.


==Python==
=={{header|Java}}==
<java>import java.io.BufferedReader;
<python>
import urllib
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;


public class WebTime{
public static void main(String[] args){
try{
URL address = new URL(
"http://tycho.usno.navy.mil/cgi-bin/timer.pl");
URLConnection conn = address.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(conn.getInputStream()));
String line;
while(!(line = in.readLine()).contains("UTC"));
System.out.println(line.substring(4));
}catch(IOException e){
System.err.println("error connecting to server.");
e.printStackTrace();
}
}
}
</java>

=={{header|Python}}==
<python>import urllib


page = urllib.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl')
page = urllib.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl')
Line 13: Line 39:
print line.strip()[4:]
print line.strip()[4:]
break
break
page.close()
page.close()</python>
</python>
Sample Output:
Sample Output:
<pre>Aug. 20, 19:50:38 UTC</pre>
<pre>Aug. 20, 19:50:38 UTC</pre>

Revision as of 20:27, 20 August 2008

Task
Web scraping
You are encouraged to solve this task according to the task description, using any language you may know.

Create a program that downloads the time from this URL: http://tycho.usno.navy.mil/cgi-bin/timer.pl and then prints the current UTC time by extracting just the UTC time from the web page's HTML.

Only use libraries that come at no extra monetary cost with the programming language and that are widely available and popular such as CPAN for Perl or Boost for C++.

Java

<java>import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.URL; import java.net.URLConnection;


public class WebTime{ public static void main(String[] args){ try{ URL address = new URL( "http://tycho.usno.navy.mil/cgi-bin/timer.pl"); URLConnection conn = address.openConnection(); BufferedReader in = new BufferedReader( new InputStreamReader(conn.getInputStream())); String line; while(!(line = in.readLine()).contains("UTC")); System.out.println(line.substring(4)); }catch(IOException e){ System.err.println("error connecting to server."); e.printStackTrace(); } } } </java>

Python

<python>import urllib

page = urllib.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl') for line in page:

   if ' UTC\n' in line:
       print line.strip()[4:]
       break

page.close()</python> Sample Output:

Aug. 20, 19:50:38 UTC