Talk:CRC-32

Task goal

Hi, What do you want the task to be about? --Paddy3118 17:45, 29 November 2011 (UTC)

The questions that come to my mind...Which polynominal? Is this in pursuit of a particular protocol's check? (i.e. ethernet frame checksums, ZIP file stream checksums, etc). "CRC" isn't any more specific than, say, "draw a shape".--Michael Mol 18:02, 29 November 2011 (UTC)

I would like a task that calculates the ubiquitous CRC-32 as in RFC 1952 for GZIP. (The RFC contains an algorithm in C code, and refers to section 8.1.1.6.2 "32-bit frame check sequence" from ITU-T Rec. V.42.) However, I am not author of this draft task. --Kernigh 21:13, 29 November 2011 (UTC)

Given that there are language (and even CPU) accelerators for different common cases of CRC32, I'd say go ahead and create a separate task for each form of interest. Probably use a naming scheme of "CRC32/$name", where $name would be the common name for the CRC32 variant. CRC32/CRC32c, for example. I realize this would lead to a small explosion in tasks which would be mundane and uninteresting after the second or so variation, but that leaves low-hanging fruit for whoever comes along later. --Michael Mol 15:27, 30 November 2011 (UTC)

Isn't it possible to write a program that calculates an N-bit CRC of a string with polynomial P ? --Spekkio 21:14, 29 November 2011 (UTC)

Hmm, ok I see. There are many types of algoritms that can be called a CRC. I think the most common CRC algoritm would be the CRC-32 used by IEEE 802.3, also mentioned by Kernigh. But it would also be interesting to make CRC-8,CRC-16 etc... --Spekkio 07:47, 30 November 2011 (UTC)

Maybe (I don't know), and perhaps its of theoretical interest, but who wants to use a generic algorithm? CRC is used specifically because it's a fast, low-cost way to generate a reasonable checksum. --Michael Mol 15:27, 30 November 2011 (UTC)

I'm not really sure what you mean? Why use a different kind of CRC algoritm that is not a standard? I have seen different kinds of CRC algoritms, like in Modbus, though Modbus is old.--Spekkio 16:09, 30 November 2011 (UTC)

I didn't mean to suggest that you'd want to use a CEC algorithm that wasn't either de facto or de jure standard. --Michael Mol 18:15, 30 November 2011 (UTC)

Why use a standard CRC algoritm, I think to reduce confusion :) --Spekkio 16:09, 30 November 2011 (UTC)

Certainly, but there are many meaningful, standard CRC algorithms. I noted above that it might be reasonable to implement several of them as different tasks. --Michael Mol 18:15, 30 November 2011 (UTC)

Or do you mean why make a function that can calculate N bit CRC for poly P? Maybe it would be interesting to have such a function if an MCU is communicating with several devices that uses different polynomials and different bit length CRC, maybe it could save space, but it would be slower --Spekkio 16:09, 30 November 2011 (UTC)

A generic function, while theoretically interesting (and ~~probably~~ certainly worthwhile as a generic, illustrative case) is not something you'd see in a practical setting. Practical settings tend to call for a great deal of speed, and thus rely on optimized or specially-provided forms. As a consequence, languages like PHP provide a crc32 function, and Intel is even including a CPU instruction for accelerating CRC32 with their Nehalem architecture. These aren't likely to be able to be illustrated with a good generic function, but they're going to be the best way forward for some CRC32s. That's why I noted different tasks above. --Michael Mol 18:15, 30 November 2011 (UTC)

That would have been the basic argument behind the splitting of the MD5 and MD5/Implementation tasks. The other was that for the complexity of MD5 it seemed those calling a library function got an easy pass. Although in this case CRC32 is less complex so the two tasks could live in one. The other argument for implementation is also practical, in low use situations a generic works just fine - especially given that there may not be a library version available. --Dgamey 13:44, 7 December 2011 (UTC)

Now there's an old thread! As far as CRC32 vs MD5 sums go, CRC32 adds a complication...there are several different variations of CRC32, whereas there is only one md5sum. Spekkio's generic function would be akin to MD5/Implementation in that it's almost necessarily a pure-language implementation of the algorithm. As an aside...anyone have any idea which polynominal PHP's crc32 function uses? It relies on these tables, but there's no documentation on where the tables came from. --Michael Mol 15:04, 7 December 2011 (UTC)

No idea. It bears no obvious relationship with the cited WP articles. I thought he the intended algorithm was specified by

x^{32}+x^{26}+x^{23}+x^{22}+x^{16}+x^{12}+x^{11}+x^{10}+x^{8}+x^{7}+x^{5}+x^{4}+x^{2}+x+1

--Dgamey 03:50, 8 December 2011 (UTC)

Maybe related to [1]

I don't know about the one named crc32_table, but the one named crc32b_table is the polynomial used in zip files. I verified this by comparing with the tables in the info-zip source. The values in php are endian-swapped compared to the info-zip table, but that could just be due to operating in the opposite direction. --Coderjoe 08:31, 8 December 2011 (UTC)

oops. I just realized I was looking at the big-endian table in info-zip. after checking the little endian table, it matches the php table named crc32b_table. --Coderjoe 08:35, 8 December 2011 (UTC)

We still don't have an answer to the question "What do we need to do to fulfill this task?". How about stating:

"Implement the CRC commonly known as ... whose equation is ... . Use it to generate a checksum for the following bytes ..."

--Paddy3118 05:39, 1 December 2011 (UTC)

Why is it important to define a string of bytes to use, it's not in the MD5 task description for example ? --Spekkio 09:59, 2 December 2011 (UTC)

All the language examples have to work with the same input leading to greater uniformity of purpose and hopefully comparability in resultant code. As soon as one example gives a checksum, other examples can compare their checksums. --Paddy3118 10:55, 2 December 2011 (UTC)

Yes I understand, but what is meant was it is not in the task description of MD5 for example, or is it just implied? --Spekkio 13:25, 2 December 2011 (UTC)

I can't answer that. But it is in the MD5/Implementation which is closer in principle to this task. MD5 frequently just calls libraries that have been validated elsewhere. --Dgamey 13:18, 7 December 2011 (UTC)

Is using a built-in implementation acceptable? Or a library? –Donal Fellows 15:17, 6 December 2011 (UTC)

Given some of the above discussion (for example: "A generic function, while theoretically interesting...") I assumed that using a built-in was desirable. --Rdm 15:38, 6 December 2011 (UTC)

I don't think this is clear. See above discussion comparing with MD5/Implementation and MD5 tasks. --Dgamey 03:50, 8 December 2011 (UTC)

I don't think using a built in function is very interesting --Spekkio 12:15, 8 December 2011 (UTC)

Nor do I. Also it's trivial. --Dgamey 12:32, 8 December 2011 (UTC)

So? Some people round here seem to think that doing everything with a hair shirt on is fine, but in reality it is the people who use the built-in functions and well-known libraries that will really prosper. In particular, they are working idiomatically with their language. If there is anything that people should be doing when posting solutions here, it is ensuring that those solutions are — to the greatest extent practicable — idiomatic and easy to read. After all, that is how to best encourage the use of the language and to compare and contrast. Moreover, some languages really do slam-dunk their way through tasks that others find exceptionally challenging (and for many reasons). That is interesting, bit manipulations… not so much. –Donal Fellows 15:04, 8 December 2011 (UTC)

Mostly concur with Donal, except that I do think seeing bit-twiddling solutions useful as well; it's best to use a language's idiomatic solution, but it's almost as important, in my opinion, to have an understanding of how it works. (Pragmatic vs theoretical arguments are abound, but it's often pragmatic to have an understanding of the theory) I'll retreat on my original recommendation of having polynomial-specific pages, though, and just suggest that different polynomials beyond the most common be optional entries; if a language has a shortcut for CRC32 of a particular polynomial, then that shortcut should be included as an additional solution. --Michael Mol 15:28, 8 December 2011 (UTC)

(responding to Donal Fellows) -- I agree, you should use built-in functions. But they do not always exist, for example when using the HI-TECH C-compiler or Microchips C compilers. And if I would use a built in function, it would produce unessecary large code. It would be better if you had a good understanding of the function and make your own. We're programming in different enviroments. --Spekkio 19:14, 8 December 2011 (UTC)

And if I started writing my own from scratch it would produce unreasonably large (and slow) code. Delegating to a well-known library (which might or might not use special system instructions) is good. Most programmers — especially most on this site — don't use those compilers you mentioned; having a basic implementation is acceptable as a possible method, but it should not be the required method for something as well known as CRC32 which has many library implementations already. –Donal Fellows 08:43, 9 December 2011 (UTC)

I do not agree, I rewrite the CRC function every time. Sometimes it just needs to compute a CRC8 for 2 bytes. I feel this is not a good place for electronic engineers :) --Spekkio 08:55, 9 December 2011 (UTC)

(responding to Michael Mol) -- Or if the language has a shortcut for CRC polynomials in general... (The implementation I posted uses a library routine which currently supports running any 33 bit polynomial, where the leading bit is set, against a sequence of character literals.) --Rdm 19:16, 8 December 2011 (UTC)

(responding to Donal) Don't take my opinion that calling a library routine isn't useful. I wasn't griping. I was just commenting that the task is less interesting. Again, my real point was that the task originally wasn't all that clear on that (and a number of other points). Hence my comments on library versus implementation tasks. If the task(s) are clear then these debates become far simpler and people get less caught up. In my view BOTH types of tasks are useful. One to show how to get the specific job done. The other shows how to do it in detail and by extension how to do something like it that isn't in a library somewhere. In my case there was no crc32 for my language that I known of so I needed to write one. The folks caught in this position had a small challenge digging through the references to find the right spec. --Dgamey 13:41, 9 December 2011 (UTC)

Error correction

I work alot with different microcontrollers and communication with differnt devices, and everytime I get in contact with a CRC it takes some time to get it working. Sometimes it's CRC-8 or CRC-15 or something else, I've never really got a good grip of CRC. And I have never programmed a function that corrects an error using CRC, from what I understand this is possible? --Spekkio 14:48, 8 December 2011 (UTC)

No. CRC is for error detection, not error correction. You'd be more interested in Hamming code. We have Hamming numbers, but I don't think they're related. (If they are, it's not clear how, at least not on RC.) --Michael Mol 15:18, 8 December 2011 (UTC)

I suggest something based on Gallager codes would be appropriately beefy, as those are used in very high performance applications like transmissions from satellites. If that doesn't satisfy you, Spekkio, I don't know what will. :-) Different task though. –Donal Fellows 17:27, 11 December 2011 (UTC)