Village Pump:Home/Syntax Highlighting ( archived 2009-06-18 ): Difference between revisions

→‎Groovy also not copacetic: I'll have to double-check.
(→‎Groovy also not copacetic: I'll have to double-check.)
 
(46 intermediate revisions by 12 users not shown)
Line 1:
This page is to discuss changes to the syntax highlighting system.
 
==About bugs in GeSHi==
Please don't hesitate to report code highlighting bugs (missing keywords, problems with strings, mishighlighted stuff) upstream at BenBE@geshi.org to have them fixed in the official release. Also if you happen to have a new language file I'm happy to include it (if it passes the langcheck script included with current releases or you have a good excuse for it not to ;-)).
 
Regarding some issues mentioned here:
* Double escape in D has been fixed in SVN
* The Lisp problem could only be fixed, because someone reported it ...
* The C issues has been included; also fixed c_mac, cpp and cpp_qt
* The multiline preprocessor directives are fixed in SVN trunk; RC coming soon
* Modula3 included in SVN trunk; coming with the RC
* The whitespace being trimmed is a MediaWiki bug. Whitespace is an esotheric language; I didn't intent people to test their sites with it ;-) Though it beautifully works to find CSS bugs (have to patch the CSS on the official GeSHi page ;-))
* Cannot reproduce the OCaml issue locally. Your site simply is lacking CSS for class coMULTI.
 
Latest language files can be found at https://geshi.svn.sourceforge.net/svnroot/geshi/trunk/geshi-1.0.X/src/, although some changes between releases might require you to update the parser too. Latest bugfixes not yet considered stable are reflected there. Release Candidate grade updates can be found at https://geshi.svn.sourceforge.net/svnroot/geshi/branches/RELEASE_1_0_X_STABLE/geshi-1.0.X/src/
 
BenBE.
Author of GeSHi
: I'll be moving Rosetta Code over to GeSHi svn HEAD to take advantage of these fixes, and to make merging our fixes with upstream simpler. --[[User:Short Circuit|Short Circuit]] 19:16, 8 February 2009 (UTC)
 
==Code tag change==
Line 41 ⟶ 59:
: This is going to require a massive site-wide effort for all supported languages. See [[Help:GeSHi]] for details. As soon as we're confident most of the pages have been handled, I'll disable support for <nowiki>&lt;lang&gt;</nowiki> entirely. --[[User:Short Circuit|Short Circuit]] 02:33, 23 January 2009 (UTC)
:: When the <nowiki><lang></nowiki> tags are removed, [[Special:Version]] will not any more list all the languages in the tags section, right? Maybe there should be a (perhaps auto-generated) page listing all supported languages (or maybe it's even possible to include them in [[Special:Version]] as separate section). Or maybe there's already such a list somewhere, which I simply didn't notice? --[[User:Ce|Ce]] 08:18, 23 January 2009 (UTC)
::: We are now only using <nowiki><lang cpp></nowiki> tags, where "cpp" is replaced by the language ID in question. Additionally, all language IDs are now case-insensitive, so <nowiki><lang C></nowiki> works the same as <nowiki><lang c></nowiki>. This is a Good Thing, because the supported case was originally determined by the case of the language file name. And those were all lower case... --[[User:Short Circuit|Short Circuit]] 08:11, 11 February 2009 (UTC)
:::You mean [[Help:GeSHi#Supported_source_tags|this]]? --[[User:Mwn3d|Mwn3d]] 13:27, 23 January 2009 (UTC)
::: Another problem I just noticed: The old meaning of <nowiki><tt>text</tt></nowiki> is now changed (it now creates a div block instead of inline text). Those tags might have been used already in this wiki. Maybe it would have been a better idea to use a tag which didn't yet have a meaning (say, <nowiki><source></nowiki>). I'm not sure whether it's a good idea to change the GeSHi tag now, or to find and change all previous usages of <nowiki><code></nowiki> instead. --[[User:Ce|Ce]] 08:50, 23 January 2009 (UTC)
Line 188 ⟶ 207:
:: I've now found a trick to get Whitespace highlighting without any (visible) text around it: Use the Unicode character U+FEFF (ZERO WIDTH NO-BREAK SPACE) to mark the start/end of the code:
<lang whitespace>


</lang>
:: Note that U+FEFF renders as absolutely nothing, and is not whitespace as defined by Whitespace (although it ''is'' whitespace according to Unicode), and therefore should be ignored as "comment" by whitespace interpreters (I didn't test that, though). It apparently also isn't considered whitespace by the start/end line removing code, therefore it's not removed. --[[User:Ce|Ce]] 15:22, 26 January 2009 (UTC)
Line 233 ⟶ 252:
</lang>
:::: --[[User:Ce|Ce]] 17:26, 30 January 2009 (UTC)
 
::::: Please do not forget this! --[[User:Ce|Ce]] 15:31, 25 March 2009 (UTC)
:::::: Consider it no longer forgotten. :-) I believe I tried the code back in February, but it didn't work in some way. (I forget why not.) I'm heading over to Qrush, Slawmaster and Mwn3d's place this weekend for a brief vacation, and I'll poke at it some more while I'm there. --[[User:Short Circuit|Short Circuit]] 18:41, 25 March 2009 (UTC)
::::::: Thanks. As I said, I don't really know PHP, so it's not really a surprise if it doesn't work correctly out of the box. Probably I missed some detail that any real PHP hacker would know ... --[[User:Ce|Ce]] 14:33, 26 March 2009 (UTC)
 
==D problem==
Line 243 ⟶ 266:
 
: I've just found that the "fail" on unsupported languages indeed fails, i.e. doesn't work correctly. See e.g. [[http://www.rosettacode.org/wiki/Apply_a_callback_to_an_Array#E]]. Looking at the generated HTML, you obviously inserted &lt;code&gt; tags instead of &lt;pre&gt; tags. --[[User:Ce|Ce]] 16:57, 30 January 2009 (UTC)
 
===Inconsistent content model===
There's a problem with this fallback: if the language is recognized, then the content is treated literally (except for &lt;/lang>, of course), but if it is unrecognized, then it is treated as HTML. The following two examples are of &lt;lang foo&gt;abc &lt;fnord&gt; def&lt;/lang&gt; and &lt;lang c&gt;abc &lt;fnord&gt; def&lt;/lang&gt;:
<lang foo>abc <fnord> def</lang>
<lang c>abc <fnord> def</lang>
This difference means that code examples containing &lt; or &amp; will stop displaying correctly if the specified language becomes supported.
I would prefer that everything be treated as HTML (as &lt;pre>, indenting, and the current unsupported-language behavior do) so that it's possible to insert markup in examples (e.g. hyperlinks in comments), </lang> is not a magic string impossible to include, and for consistency with most of the rest of HTML.
On the other hand, treating the content literally does have the advantage of making it easier to paste in examples containing &lt;s.
--[[User:Kevin Reid|Kevin Reid]] 00:36, 9 February 2009 (UTC)
: I can fix that inconsistency by feeding the source code snippet through the PHP htmlentities or htmlspecialchars functions, so they show up literally. Of course, if &lt;code&gt; obviates that need, I can use that instead of &lt;pre&gt;. If I do that, though, any code example for an unsupported language already using an HTML entity as a workaround is going to break. --[[User:Short Circuit|Short Circuit]] 05:21, 9 February 2009 (UTC)
 
== Code tag special behaviour ==
Line 252 ⟶ 285:
<lang ocaml>(* a comment *)</lang>
: More generally, can we get this for any language? "Pascal family" languages all use (* foo *) for comments, so adding it would add comment highlighting for lots of languages. --[[User:Mbishop|Mbishop]] 07:02, 5 February 2009 (UTC)
:: Non-trivial. Doing something like that would require different languages to be able to inherit from other languages. Doing something like that would very likely break compatibility with upgrades to the GeSHi engine. It's be nice, though... --[[User:Short Circuit|Short Circuit]] 07:21, 5 February 2009 (UTC)
::: You sure it's not just the default color theme or whatever that doesn't highlight (* ... *) style comments? --[[User:Mbishop|Mbishop]] 15:44, 5 February 2009 (UTC)
: Looking at the source for the ocaml language file, it's being defined correctly according to GeSHi language file syntax. I suspect an engine bug. --[[User:Short Circuit|Short Circuit]] 07:21, 5 February 2009 (UTC)
:: Modula-3 comments should also be highlighted, as well as Pascal comments, but they are not. --[[User:Mbishop|Mbishop]] 15:44, 5 February 2009 (UTC)
::: Hehe. A quick "view selection source" indicates you're correct; I'd assumed somebody had already checked that. Colors needed to be added for the class "coMULTI". --[[User:Short Circuit|Short Circuit]] 17:09, 5 February 2009 (UTC)
::: Fixed. --[[User:Short Circuit|Short Circuit]] 17:00, 9 February 2009 (UTC)
 
==C# Break==
 
Take a look at [[99 Bottles of Beer]], [[C sharp]] break statement is not highlighted.
Someone can fix csharp.php file? --[[User:Guga360|Guga360]] 20:22, 15 February 2009 (UTC)
: It's in the language file as a keyword, but the HTML source of that snippet shows that it's not being given a CSS style. I don't know what's going on; It might be an engine bug. I plan to try switching to GeSHi's SVN HEAD some time today, so perhaps that engine bug has been fixed. --[[User:Short Circuit|Short Circuit]] 21:01, 15 February 2009 (UTC)
: Is this still an issue? I didn't see the break keyword in the code example. --[[User:Short Circuit|Short Circuit]] 03:01, 3 April 2009 (UTC)
 
==C# List Comprehension==
 
Take a look at [[Yuletide Holiday]], [[C sharp]] "from", "select" and "where" are not highlighted.
--[[User:Guga360|Guga360]] 22:18, 2 April 2009 (UTC)
: Fixed, I think. Can someone verify that that code example actually compiles? I want to make sure I'm not adding stuff to the language file that isn't supported by the language. If it's good, I'll send the revised file upstream to be included with the next GeSHi release. --[[User:Short Circuit|Short Circuit]] 03:06, 3 April 2009 (UTC)
:: Yes, it works.
 
==Highlighting of [[Tcl]]==
===Braces aren't comments===
Is it possible to change the highlighting of Tcl so that sequences where there is an open and close brace on the same line are not highlighted as (presumably) comments? This makes expressions and one-liners much more difficult to read than they otherwise would be. For example, this is a one line <tt>if</tt>:
<lang tcl>if {[incr $a] == [list $b $c]} {puts [$d $a]} {error "$e $a"}</lang>
It's probably best for “{…}” to be not treated specially at all. (At some point we could also do with updating the list of ”keywords”, but that's nothing like as important.) —[[User:Dkf|Dkf]] 09:07, 22 May 2009 (UTC)
:Thanks for fixing this. —[[User:Dkf|Donal Fellows]] 14:34, 17 June 2009 (UTC)
===Keywords===
The current list of "keywords" for Tcl 8.6 (which is quite a bit longer than for previous versions) is:
:'''Normal Keywords:''' append apply bgerror break catch cd class close concat constructor continue copy define deletemethod destructor else elseif eof error eval exec exit export expr fblocked fconfigure fcopy fileevent filter finally flush for foreach format gets glob if incr join lappend lassign lindex linsert list llength load lrange lrepeat lreplace lreverse lsearch lset lsort mixin my next objdefine object on open parray pid puts pwd read regexp regsub rename renamemethod return scan seek self set socket source split subst superclass switch tell then throw time trap try unexport unload unset uplevel vwait while
:'''Function Definition Keywords:''' create forward method new proc
:'''Variable Definition Keywords:''' global upvar variable
:'''Compound Keywords:''' after array binary chan clock dde dict encoding file info interp namespace package prefix registry string trace update zlib
(With compound keywords, the word after the listed keyword should also be highlighted.) OK, they're not formally keywords, but they should be formatted like they are. —[[User:Dkf|Donal Fellows]] 09:58, 17 June 2009 (UTC)
:The following words are linkable to <code><nowiki>http://www.tcl.tk/man/tcl8.6/TclCmd/</nowiki>''blah''<nowiki>.htm</nowiki></code>:
::proc global upvar variable after append apply array bgerror binary break catch cd chan clock close concat continue dde dict encoding eof error eval exec exit expr fblocked fconfigure fcopy file fileevent flush for foreach format gets glob if incr info interp join lappend lassign lindex linsert list llength load lrange lrepeat lreplace lreverse lsearch lset lsort my namespace next open package parray pid prefix puts pwd read regexp registry regsub rename return scan seek self set socket source split string subst switch tell throw time trace try unload unset update uplevel vwait while zlib
:Alternatively go to <code><nowiki>http://wiki.tcl.tk/</nowiki>''blah''</code> for any identified keyword and, if the page isn't there now it soon will be... ;-) —[[User:Dkf|Donal Fellows]] 14:14, 17 June 2009 (UTC)
===Variables===
A “$” followed by alphanumerics should be highlighted as a variable reference (if you highlight such things in other languages, of course). —[[User:Dkf|Donal Fellows]] 14:34, 17 June 2009 (UTC)
===Comments===
The comment regexp should (probably) be:
(?:^|[[{;])[ \t]*(#[^\n]*)
(However that is encoded in PHP, I don't know.) It doesn't handle multi-line comments but we're not really using those on RC anyway. —[[User:Dkf|Donal Fellows]] 14:34, 17 June 2009 (UTC)
 
==Java5 messed up==
Recently the highlighting for "java5" got messed up. For example:
<lang java5>String</lang>
It appears that some "span" tags got inserted in the middle of the URL for the link. --[[Special:Contributions/76.173.203.58|76.173.203.58]] 07:03, 14 June 2009 (UTC)
: Upgraded GeSHi last night. Will look to see what changed in the java5 language file. --[[User:Short Circuit|Short Circuit]] 19:34, 14 June 2009 (UTC)
:: It seems to be fixed now. Thanks. --[[Special:Contributions/76.173.203.58|76.173.203.58]] 05:37, 18 June 2009 (UTC)
 
==Smalltalk oddness==
Take a look at [[Mode#Smalltalk]]... it appears a <tt>1/></tt> (after the "s := ")... editing, I can't see nothing special; if a put a generic lang, it does not appear. --[[User:ShinTakezou|ShinTakezou]] 13:55, 14 June 2009 (UTC)
: If I change "self" in "solf" (or whatever), the problem disappears, so the problem is about the tags for highlighting the special "self" word. --[[User:ShinTakezou|ShinTakezou]] 14:01, 14 June 2009 (UTC)
: The same happens even for other special words, like <tt>nil</tt>, cfr. e.g. [[Gnome sort#Smalltalk]]. --[[User:ShinTakezou|ShinTakezou]] 14:40, 14 June 2009 (UTC)
 
==APL==
It seems APL (APL2) in this page (and maybe more) is bad encoded: [[Mean#APL]]. (Editing the example, I can see the right symbols, likely UTF-8 encoded) --[[User:ShinTakezou|ShinTakezou]] 16:05, 14 June 2009 (UTC)
: Rather: the lang tag seems not suitable for APL, or at least it should be created a fake geshi APL descriptor specifying that APL source encoding is utf-8 rather than whatever... If it is possible to specify such an information (otherwise, we must use indentation for APL code rather than lang tag? ugly solution) --[[User:ShinTakezou|ShinTakezou]] 16:10, 14 June 2009 (UTC)
 
==Groovy also not copacetic==
Groovy highlighting is exhibiting a problem similar to the Java problem.
<lang groovy>Binding</lang>
Just for the record, so that future generations will know what the heck we were talking about even after the bug is fixed, the above looks something like this when rendered:
<pre style="color:blue;">5.0%2Fdocs%2Fapi%2F">Binding</pre>
Maybe somebody has something against the JVM? (besides the usual, I mean) --[[User:Balrog|Balrog]] 03:29, 17 June 2009 (UTC)
: For some reason the updating of geshi seems to have messed up several languages; my current list has Java (fixed?), Smalltalk, Matlab, Groovy ... open eyes for more... --[[User:ShinTakezou|ShinTakezou]] 13:00, 17 June 2009 (UTC)
::Java isn't fixed. The tag "java" was never broken, but "java5" is the problem. Check it out:
::<lang java5>this is a test String</lang>
::Which (for future generations) renders as:
::<pre>this is a test 1.5.0/docs/api/java/lang/String.html">String</pre>
::--[[User:Mwn3d|Mwn3d]] 13:16, 17 June 2009 (UTC)
:::I hearby move that we ask [[User:Short Circuit|Short Circuit]] to back out the most recent geshi update if that's possible. Do we have a second? --[[User:Balrog|Balrog]] 21:43, 17 June 2009 (UTC)
:::Java 5 appears to be fixed. Groovy is still hosed. --[[User:Balrog|Balrog]] 00:26, 18 June 2009 (UTC)
 
::::From the moment I woke up this morning to a couple hours past when I was supposed to be at work, I've been working with the GeSHi and Tcl folks to get things fixed. I now have GeSHi SVN commit access, and I appear to have been handed some degree of responsibility to intake and process new language files, as well as finish the langfile creation wizard they started working on at my behest a couple weeks ago. I'm also planning on adding a JavaScript widget that allows the user to change the syntax highlighting CSS on the local side, so I can get better defaults for sitewide CSS. Syntax highlighting on RC is about to improve significantly.
::::I apologize for the issues we've been having lately. I'll roll back GeSHi this evening, and modify the tag extension to enable the old version of GeSHi for most users, and use GeSHi 1.0.x HEAD for anyone interested in helping test new languages and language support. (All anonymous visits would still see the old system, or at least some working intermediate revision.) The enabling of devel version of GeSHi for any given account will have to be done by someone with Bureaucrat access or higher (not a lot of these right now, but if someone wants to volunteer for the role, email me and we'll discuss it.).
::::Oh, and for the record, I hate [[wp:Robert's Rules of Order]]...They always seem to create more problems than they solve. --[[User:Short Circuit|Short Circuit]] 03:32, 18 June 2009 (UTC)
:::::Just in case I haven't said that you're amazing lately, let me just say (for future generations) that you're amazing. --[[User:Balrog|Balrog]] 03:54, 18 June 2009 (UTC) (hoping that you're okay with emoticons :-)
:::::Hmmmm... I don't know if [[User:Short Circuit|Short Circuit]] has rolled back GeSHi yet (is there a way I could check?), but Groovy still has junk in the GeSHi rendering --[[User:Balrog|Balrog]] 17:00, 18 June 2009 (UTC)
::::::I thought I had, but I may have erred. I'll have to double-check when I get home tonight.