User:Bukzor: Difference between revisions

Content added Content deleted

Inline

Latest revision as of 15:37, 26 April 2010

My Favorite Languages
Language	Proficiency
Python	Expert
UNIX Shell	Very Active
SQL	Active
JavaScript	Semi-Active
C	Rusty
Perl	Hacker/Hater
C++	Rusty

Right now I'm just having fun improving other people's python. That's probably not what this is all about, but I like it.

Automatic pylint

The current state of this project can be found here: rosetta_pylint.py

use the MediaWiki API to grab a list of the pages in Category:Python
- The mediawiki API is pretty straightforward. I feel done with that part.
grab the HTML for those pages, put them into a DOM
- I'm having trouble getting any of the builtin html or xml parsers to give me a DOM. htmlparser is just a ghetto little state machine, and the xml parsers are too strict (  is an 'unknown entity').
- I've posted a stackoverflow question on this subject here. --Bukzor 16:31, 20 April 2010 (UTC)
- Despite everyone agreeing that Python doesn't have a builtin HTML->DOM parser, I've parsed the site A-Z with ElementTree with minimal effort. I had to fix a bunch of inavalid HTML though. Look at my edits for the previous couple days for details.
select for "python" as a CSS class, and get lumps of Python code.
- Now I have ~700 python snippets that I'm working on pylint'ing and analyzing. --Bukzor 01:29, 24 April 2010 (UTC)
- I note that plenty of the Python solutions are using plain <pre> tags which are being skipped in my current scheme. I'll have to add some code to detect this...
automate feeding that code through pylint
- The current Ubuntu pylint (0.18) throws up on 'import curses' for unknown reasons, but installing the latest version (0.20) allows me to pylint all of the scraped snippets as a whole. Collectively, they're rated at -1.58/10 (that's negative). I hope to get that up to 10/10 someday. --Bukzor 05:02, 26 April 2010 (UTC)
save a report of pages->scores

--Bukzor 05:02, 26 April 2010 (UTC)

@@ Line 1: / Line 1: @@
 {{mylangbegin}}
-{{mylang|Python|Expert(?)}}
+{{mylang|Python|Expert}}
 {{mylang|UNIX Shell|Very Active}}
+{{mylang|SQL|Active}}
+{{mylang|JavaScript|Semi-Active}}
+{{mylang|C|Rusty}}
+{{mylang|Perl|Hacker/Hater}}
+{{mylang|C++|Rusty}}
+{{mylangend}}
+Right now I'm just having fun improving other people's python.
+That's probably not what this is all about, but I like it.
+== Automatic pylint ==
+The current state of this project can be found here: [http://bukzor.hopto.org/svn/software/python/rosetta_pylint.py rosetta_pylint.py]
+# use the MediaWiki API to grab a list of the pages in Category:Python
+#*The mediawiki API is pretty straightforward. I feel done with that part.
+# grab the HTML for those pages, put them into a DOM
+#*I'm having trouble getting any of the builtin html or xml parsers to give me a DOM. [http://docs.python.org/library/htmlparser.html htmlparser] is just a ghetto little state machine, and the xml parsers are too strict (&amp;nbsp; is an 'unknown entity').
+#*I've posted a stackoverflow question on this subject [http://stackoverflow.com/questions/2676872/how-to-parse-malformed-html-in-python-using-standard-libraries here]. --Bukzor 16:31, 20 April 2010 (UTC)
+#*Despite everyone agreeing that Python doesn't have a builtin HTML->DOM parser, I've parsed the site A-Z with ElementTree with minimal effort. I had to fix a bunch of inavalid HTML though. Look at my edits for the previous couple days for details.
+# select for "python" as a CSS class, and get lumps of Python code.
+#* Now I have ~700 python snippets that I'm working on pylint'ing and analyzing. --Bukzor 01:29, 24 April 2010 (UTC)
+#* I note that plenty of the Python solutions are using plain &lt;pre> tags which are being skipped in my current scheme. I'll have to add some code to detect this...
+# automate feeding that code through pylint
+#* The current Ubuntu pylint (0.18) throws up on 'import curses' for unknown reasons, but installing the latest version (0.20) allows me to pylint all of the scraped snippets as a whole. Collectively, they're rated at -1.58/10 (that's negative). I hope to get that up to 10/10 someday. --[[User:Bukzor|Bukzor]] 05:02, 26 April 2010 (UTC)
+# save a report of pages->scores
+--[[User:Bukzor|Bukzor]] 05:02, 26 April 2010 (UTC)