Web scraping: Difference between revisions
Content added Content deleted
(New page. Python example. given) |
m (Punctuation, link) |
||
Line 1: | Line 1: | ||
{{task}} |
{{task}} |
||
Create a program that downloads the time from this URL: [http://tycho.usno.navy.mil/cgi-bin/timer.pl http://tycho.usno.navy.mil/cgi-bin/timer.pl] and then prints the current UTC time by extracting just the UTC time from the web |
Create a program that downloads the time from this URL: [http://tycho.usno.navy.mil/cgi-bin/timer.pl http://tycho.usno.navy.mil/cgi-bin/timer.pl] and then prints the current UTC time by extracting just the UTC time from the web page's [[HTML]]. |
||
Only use libraries that come at no ''extra'' monetary cost with the programming language and that are widely available and popular such as [http://www.cpan.org/ CPAN] for Perl or [http://www.boost.org/ Boost] for C++. |
Only use libraries that come at no ''extra'' monetary cost with the programming language and that are widely available and popular such as [http://www.cpan.org/ CPAN] for Perl or [http://www.boost.org/ Boost] for C++. |
Revision as of 20:12, 20 August 2008
Web scraping
You are encouraged to solve this task according to the task description, using any language you may know.
You are encouraged to solve this task according to the task description, using any language you may know.
Create a program that downloads the time from this URL: http://tycho.usno.navy.mil/cgi-bin/timer.pl and then prints the current UTC time by extracting just the UTC time from the web page's HTML.
Only use libraries that come at no extra monetary cost with the programming language and that are widely available and popular such as CPAN for Perl or Boost for C++.
Python
<python> import urllib
page = urllib.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl') for line in page:
if ' UTC\n' in line: print line.strip()[4:] break
page.close() </python> Sample Output:
Aug. 20, 19:50:38 UTC