Web scraping: Difference between revisions

Content added Content deleted

Inline

Revision as of 20:12, 20 August 2008

Create a program that downloads the time from this URL: http://tycho.usno.navy.mil/cgi-bin/timer.pl and then prints the current UTC time by extracting just the UTC time from the web page's HTML.

Only use libraries that come at no extra monetary cost with the programming language and that are widely available and popular such as CPAN for Perl or Boost for C++.

Python

<python> import urllib

page = urllib.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl') for line in page:

   if ' UTC\n' in line:
       print line.strip()[4:]
       break

page.close() </python> Sample Output:

Aug. 20, 19:50:38 UTC

@@ Line 1: / Line 1: @@
 {{task}}
-Create a program that downloads the time from this URL: [http://tycho.usno.navy.mil/cgi-bin/timer.pl http://tycho.usno.navy.mil/cgi-bin/timer.pl] and then prints the current UTC time by extracting just the UTC time from the web pages HTML.
+Create a program that downloads the time from this URL: [http://tycho.usno.navy.mil/cgi-bin/timer.pl http://tycho.usno.navy.mil/cgi-bin/timer.pl] and then prints the current UTC time by extracting just the UTC time from the web page's [[HTML]].
 Only use libraries that come at no ''extra'' monetary cost with the programming language and that are widely available and popular such as [http://www.cpan.org/ CPAN] for Perl or [http://www.boost.org/ Boost] for C++.