Talk:CRC-32

From Rosetta Code
Revision as of 08:26, 18 August 2013 by rosettacode>Gerard Schildberger (→‎REXX: added results from test runs. -- ~~~~)

Task goal

Hi, What do you want the task to be about? --Paddy3118 17:45, 29 November 2011 (UTC)

The questions that come to my mind...Which polynominal? Is this in pursuit of a particular protocol's check? (i.e. ethernet frame checksums, ZIP file stream checksums, etc). "CRC" isn't any more specific than, say, "draw a shape".--Michael Mol 18:02, 29 November 2011 (UTC)
I would like a task that calculates the ubiquitous CRC-32 as in RFC 1952 for GZIP. (The RFC contains an algorithm in C code, and refers to section 8.1.1.6.2 "32-bit frame check sequence" from ITU-T Rec. V.42.) However, I am not author of this draft task. --Kernigh 21:13, 29 November 2011 (UTC)
Given that there are language (and even CPU) accelerators for different common cases of CRC32, I'd say go ahead and create a separate task for each form of interest. Probably use a naming scheme of "CRC32/$name", where $name would be the common name for the CRC32 variant. CRC32/CRC32c, for example. I realize this would lead to a small explosion in tasks which would be mundane and uninteresting after the second or so variation, but that leaves low-hanging fruit for whoever comes along later. --Michael Mol 15:27, 30 November 2011 (UTC)
Isn't it possible to write a program that calculates an N-bit CRC of a string with polynomial P ? --Spekkio 21:14, 29 November 2011 (UTC)

Hmm, ok I see. There are many types of algoritms that can be called a CRC. I think the most common CRC algoritm would be the CRC-32 used by IEEE 802.3, also mentioned by Kernigh. But it would also be interesting to make CRC-8,CRC-16 etc... --Spekkio 07:47, 30 November 2011 (UTC)

Maybe (I don't know), and perhaps its of theoretical interest, but who wants to use a generic algorithm? CRC is used specifically because it's a fast, low-cost way to generate a reasonable checksum. --Michael Mol 15:27, 30 November 2011 (UTC)
I'm not really sure what you mean? Why use a different kind of CRC algoritm that is not a standard? I have seen different kinds of CRC algoritms, like in Modbus, though Modbus is old.--Spekkio 16:09, 30 November 2011 (UTC)
I didn't mean to suggest that you'd want to use a CEC algorithm that wasn't either de facto or de jure standard. --Michael Mol 18:15, 30 November 2011 (UTC)
Why use a standard CRC algoritm, I think to reduce confusion :) --Spekkio 16:09, 30 November 2011 (UTC)
Certainly, but there are many meaningful, standard CRC algorithms. I noted above that it might be reasonable to implement several of them as different tasks. --Michael Mol 18:15, 30 November 2011 (UTC)
Or do you mean why make a function that can calculate N bit CRC for poly P? Maybe it would be interesting to have such a function if an MCU is communicating with several devices that uses different polynomials and different bit length CRC, maybe it could save space, but it would be slower --Spekkio 16:09, 30 November 2011 (UTC)
A generic function, while theoretically interesting (and probably certainly worthwhile as a generic, illustrative case) is not something you'd see in a practical setting. Practical settings tend to call for a great deal of speed, and thus rely on optimized or specially-provided forms. As a consequence, languages like PHP provide a crc32 function, and Intel is even including a CPU instruction for accelerating CRC32 with their Nehalem architecture. These aren't likely to be able to be illustrated with a good generic function, but they're going to be the best way forward for some CRC32s. That's why I noted different tasks above. --Michael Mol 18:15, 30 November 2011 (UTC)
That would have been the basic argument behind the splitting of the MD5 and MD5/Implementation tasks. The other was that for the complexity of MD5 it seemed those calling a library function got an easy pass. Although in this case CRC32 is less complex so the two tasks could live in one. The other argument for implementation is also practical, in low use situations a generic works just fine - especially given that there may not be a library version available. --Dgamey 13:44, 7 December 2011 (UTC)
Now there's an old thread! As far as CRC32 vs MD5 sums go, CRC32 adds a complication...there are several different variations of CRC32, whereas there is only one md5sum. Spekkio's generic function would be akin to MD5/Implementation in that it's almost necessarily a pure-language implementation of the algorithm. As an aside...anyone have any idea which polynominal PHP's crc32 function uses? It relies on these tables, but there's no documentation on where the tables came from. --Michael Mol 15:04, 7 December 2011 (UTC)
No idea. It bears no obvious relationship with the cited WP articles. I thought he the intended algorithm was specified by --Dgamey 03:50, 8 December 2011 (UTC)
Maybe related to [1]
I don't know about the one named crc32_table, but the one named crc32b_table is the polynomial used in zip files. I verified this by comparing with the tables in the info-zip source. The values in php are endian-swapped compared to the info-zip table, but that could just be due to operating in the opposite direction. --Coderjoe 08:31, 8 December 2011 (UTC)
oops. I just realized I was looking at the big-endian table in info-zip. after checking the little endian table, it matches the php table named crc32b_table. --Coderjoe 08:35, 8 December 2011 (UTC)

Exact task requirement

We still don't have an answer to the question "What do we need to do to fulfill this task?". How about stating:

"Implement the CRC commonly known as ... whose equation is ... . Use it to generate a checksum for the following bytes ..."

--Paddy3118 05:39, 1 December 2011 (UTC)

Why is it important to define a string of bytes to use, it's not in the MD5 task description for example ? --Spekkio 09:59, 2 December 2011 (UTC)
All the language examples have to work with the same input leading to greater uniformity of purpose and hopefully comparability in resultant code. As soon as one example gives a checksum, other examples can compare their checksums. --Paddy3118 10:55, 2 December 2011 (UTC)
Yes I understand, but what is meant was it is not in the task description of MD5 for example, or is it just implied? --Spekkio 13:25, 2 December 2011 (UTC)
I can't answer that. But it is in the MD5/Implementation which is closer in principle to this task. MD5 frequently just calls libraries that have been validated elsewhere. --Dgamey 13:18, 7 December 2011 (UTC)

Use a library?

Is using a built-in implementation acceptable? Or a library? –Donal Fellows 15:17, 6 December 2011 (UTC)

Given some of the above discussion (for example: "A generic function, while theoretically interesting...") I assumed that using a built-in was desirable. --Rdm 15:38, 6 December 2011 (UTC)
I don't think this is clear. See above discussion comparing with MD5/Implementation and MD5 tasks. --Dgamey 03:50, 8 December 2011 (UTC)
To me, when he said "implement the Cryclic ... Algorithms are described ... " meant to write a program to implement it, as described in the Wiki doc. Just calling a function so show what it returns isn't interesting, just mundane. -- Gerard Schildberger 00:55, 31 March 2012 (UTC)
I don't think using a built in function is very interesting --Spekkio 12:15, 8 December 2011 (UTC)
Nor do I. Also it's trivial. --Dgamey 12:32, 8 December 2011 (UTC)
So? Some people round here seem to think that doing everything with a hair shirt on is fine, but in reality it is the people who use the built-in functions and well-known libraries that will really prosper. In particular, they are working idiomatically with their language. If there is anything that people should be doing when posting solutions here, it is ensuring that those solutions are — to the greatest extent practicable — idiomatic and easy to read. After all, that is how to best encourage the use of the language and to compare and contrast. Moreover, some languages really do slam-dunk their way through tasks that others find exceptionally challenging (and for many reasons). That is interesting, bit manipulations… not so much. –Donal Fellows 15:04, 8 December 2011 (UTC)
Mostly concur with Donal, except that I do think seeing bit-twiddling solutions useful as well; it's best to use a language's idiomatic solution, but it's almost as important, in my opinion, to have an understanding of how it works. (Pragmatic vs theoretical arguments are abound, but it's often pragmatic to have an understanding of the theory) I'll retreat on my original recommendation of having polynomial-specific pages, though, and just suggest that different polynomials beyond the most common be optional entries; if a language has a shortcut for CRC32 of a particular polynomial, then that shortcut should be included as an additional solution. --Michael Mol 15:28, 8 December 2011 (UTC)
(responding to Donal Fellows) -- I agree, you should use built-in functions. But they do not always exist, for example when using the HI-TECH C-compiler or Microchips C compilers. And if I would use a built in function, it would produce unessecary large code. It would be better if you had a good understanding of the function and make your own. We're programming in different enviroments. --Spekkio 19:14, 8 December 2011 (UTC)
And if I started writing my own from scratch it would produce unreasonably large (and slow) code. Delegating to a well-known library (which might or might not use special system instructions) is good. Most programmers — especially most on this site — don't use those compilers you mentioned; having a basic implementation is acceptable as a possible method, but it should not be the required method for something as well known as CRC32 which has many library implementations already. –Donal Fellows 08:43, 9 December 2011 (UTC)
I do not agree, I rewrite the CRC function every time. Sometimes it just needs to compute a CRC8 for 2 bytes. I feel this is not a good place for electronic engineers :) --Spekkio 08:55, 9 December 2011 (UTC)
(responding to Michael Mol) -- Or if the language has a shortcut for CRC polynomials in general... (The implementation I posted uses a library routine which currently supports running any 33 bit polynomial, where the leading bit is set, against a sequence of character literals.) --Rdm 19:16, 8 December 2011 (UTC)
(responding to Donal) Don't take my opinion that calling a library routine isn't useful. I wasn't griping. I was just commenting that the task is less interesting. Again, my real point was that the task originally wasn't all that clear on that (and a number of other points). Hence my comments on library versus implementation tasks. If the task(s) are clear then these debates become far simpler and people get less caught up. In my view BOTH types of tasks are useful. One to show how to get the specific job done. The other shows how to do it in detail and by extension how to do something like it that isn't in a library somewhere. In my case there was no crc32 for my language that I known of so I needed to write one. The folks caught in this position had a small challenge digging through the references to find the right spec. --Dgamey 13:41, 9 December 2011 (UTC)

Error correction

I work alot with different microcontrollers and communication with differnt devices, and everytime I get in contact with a CRC it takes some time to get it working. Sometimes it's CRC-8 or CRC-15 or something else, I've never really got a good grip of CRC. And I have never programmed a function that corrects an error using CRC, from what I understand this is possible? --Spekkio 14:48, 8 December 2011 (UTC)

No. CRC is for error detection, not error correction. You'd be more interested in Hamming code. We have Hamming numbers, but I don't think they're related. (If they are, it's not clear how, at least not on RC.) --Michael Mol 15:18, 8 December 2011 (UTC)
I suggest something based on Gallager codes would be appropriately beefy, as those are used in very high performance applications like transmissions from satellites. If that doesn't satisfy you, Spekkio, I don't know what will. :-) Different task though. –Donal Fellows 17:27, 11 December 2011 (UTC)
Thanks, that is very interesting :D --Spekkio 10:02, 12 December 2011 (UTC)

CRC library ported to several languages

I modified the CRC library from Lammert Bies (http://www.lammertbies.nl) to add many 8,16,24,32,40 and 64-bit CRC algorithms. I translated them in several computer languages. I put them at SourceForge: http://sourceforge.net/projects/crc/

REXX

added Numeric Digits 10 to avoid this error (on ooRexx):

    17 *-*   say 'hex CRC-32 checksum =' c2x(checksum) left('',15),    "dec CRC-32 checksum =" c2d(checksum)     /*show CRC-32 in hex & dec*/
     6 *-* call show a_string
Error 93 running H:\crc32a.rex line 17:  Incorrect call to method
Error 93.936:  C2D result is not a valid whole number with NUMERIC DIGITS 9

---Walterpachl (talk) 19:55, 17 August 2013 (UTC)

Question: Is this an 'extension' introduced by ooRexx?
cd2: Returns the decimal value of the binary representation of string. If the result cannot be expressed as a
whole number, an error results. That is, the result must not have more digits than the current setting of
NUMERIC DIGITS

I would think that Rexx on VM or TSO would fail as well but have no longer access to try. Anyone else?? --Walterpachl (talk) 05:09, 18 August 2013 (UTC)

Could you please run this test on your Rexx(es)?

<lang rexx>/* REXX ***************************************************************

  • 18.08.2013 Walter Pachl Test c2d behavior
                                                                                                                                            • /

cnt.=0 c=d2c(999999999) If c2x(c)='3B9AC9FF' Then cnt.0ok=cnt.0ok+1 d=c2d(c) Signal on Syntax d1=d2c(d+1) cnt.0err=cnt.0err+1 back: Say cnt.0ok 'Tests ok' Say cnt.0err 'Tests failed' Exit syntax: cnt.0ok=cnt.0ok+1 Signal back</lang> --Walterpachl (talk) 05:53, 18 August 2013 (UTC)


I don't quite understand the reasoning/logic behind the test program:

if the program gets a SYNTAX error anywhere, it counts as OK.
if the program successfully does the two D2Cs, it counts as an error in one case, but not the other.
if the program successfully does the 2nd D2C, it is counted as an error.
no check is made to verify that the value of D matches the value of the 1st argument (999999999).

A     parse version x; say x     was added after the 1st statement.
A     say 'digits=' digits()     would be a nice addition.

But in any case   (all were executed under a Windows/XP DOS window (cmd.exe):

output from Regina:

REXX-Regina_3.7(MT) 5.00 14 Oct 2012
1 Tests ok
1 Tests failed

output from Personal REXX:

REXX/Personal 4.00 21 Mar 1992
2 Tests ok
0 Tests failed

output from R4:

REXX-r4 4.00 27 Jan 2013
2 Tests ok
0 Tests failed

output from ROO:

REXX-roo 4.00 28 Jan 2007 
Error 40 : Incorrect call to routine (OFF)
Information: D2C argument 1 is invalid. Whole number is required.
Argument value: 1.00000000E+9

Error occurred in statement# 9
Statement source: d1= d2c(d+1)
Statement context: D:\crud0.rex, procedure: badwalter

Clearly, different REXXes which are different are behaving differently. -- Gerard Schildberger (talk) 08:26, 18 August 2013 (UTC)