Talk:Bitwise IO

From Rosetta Code

Real intention

The real intention of this task and code was to have a bunch of functions to test the LZW compressor / decompressor on a binary real output (instead of array or similar output); this way compression ratio statistics for the LZW of the task LZW compression can be done. --ShinTakezou 16:57, 19 December 2008 (UTC)

Hmm, strictly speaking it is not bit-oriented. It is rather a bit-stream interface to a byte-oriented I/O. Exactly because it is not bit I/O (as for instance a serial I/O is) you have an endianness issue here. So in your task you should specify whether it is big or little endian encoding of bits into bytes. Your code looks like big endian. Was it your intention? --Dmitry-kazakov 18:28, 19 December 2008 (UTC)
It is bit-oriented, as common software (not my intention to drive serial hardware this way!) I/O can be, i.e. you think you are writing one or more bits, indeed they are sent only grouped by eight, since we can only write bytes (and no more of one byte per time, or we have surely endianness issues). For the endianness while computing, it could be a point, but here I can't see how, since endianness issues are related to how bytes are stored into memory. Let us take a 0xFF packed into a 16 bit word. It will be written into memory, on little endian arch, as 0xFF 0x00. But when you take it, you can consider it logically (I love big endian!) as 0x00FF and you can be sure that if you perform a left shift, you will obtain 0x01FE... If you write it into memory, you have again 0xFE 0x01 on LE arch. But, if you shift left 9 time, you will obtain 0xFE00 with a carry (or whatever the status flag for a bit slipped away from left is called). Again, iff you write it into memory, and pretend to read it as sequencial bytes instead of word, you get an issue. Again, into memory it is 0x00 0xFE for LE arch. Luckly even LE processor handle data into registers in a more logical way!
You can object that a variable such d is stored into memory somewhere, so it is LE encoded. But all operations in a processor are rather endianness-logical (!)! So when I left-shift a 0x00FF that is stored into memory as 0xFF 0x00, the processor first load the datum, so that now you can think of it as 0x00FF, then perform a left shift, obtaining e.g. 0xFF00, then storing into memory, as 0x00 0xFF. If I'd read from memory byte by byte, loading and shifting to create a datum longer than one byte, I should have considered endianness seriously. But that's also why read and write operation are performed byte by byte rather than accumulating into a 16 or 32 bit word.
To say it briefly, I can't see any endianness issues. I am not interested how the processor stores the unsigned int I use to take the bits from; the fact is that when I do a datum << (32-4) for a 4 bit datum of 0xF, what I obtain is (luckly) 0xF0000000 (stored as 0x00 0x00 0x00 0xF0). When I shift again 1 bit left, I expect to have 0xE0000000, not 0xE0010000 (stored as 0x00 0x00 0x01 0xE0). I am telling it harder than it is. It's late for me... hopely tomorrow I will find better words. --ShinTakezou 01:07, 20 December 2008 (UTC)
Endianness is an issue here because you have to specify how a tuple of N bits is to be packed into an integral unit called byte, since your I/O deals with these units, not with bits. Note, this has nothing to do with the endianness of the machine. If N=8, then, to illustrate the case, let Bi=(1,0,0,0,1,0,0,1), then B is 13710 using big endian and 14510 using little endian encoding, and there are N! variants of packing in total. The code you provided looks like big endian. If you don't like the term endian, no problem. You can simply provide a formula, like byte = Σ Bi*2i-1 (=little endian) or Σ Bi*28-i (=big endian).
As far as I know, packing into bytes is as expected in common mathematics (that is, if we want to talk about endianness, big endian): less significative bits are on the right, most significative bits are on the left, so that being Ai with i ∈ [0,7] the binary digits, A0 is the unity bit, which is so to say A0⋅20. So having the binary number 10010001, this is packed into a byte as 10010001 simply. When we are interested in less than 8 bits, they should be counted (always so to say) from right to left; e.g. if I want to write the 4 bits 1001, these must be right-aligned into the variable holding the bits. But these are conventions the user of the functions must know, they are not mandatory to do the way I did. I've implemented all the stuff so that you must always right-align the bits, then the functions will shift to left so that the intended most significative bit of the bits datum becomes the leftmost bit in the container (unsigned int in the C implementation). It is enough the user of the functions gets knowledge about how your functions extract bits (in which order) from the data they want to pass; then, it is intended that the first (being it the leftmost or the rightmost according to your convention) must be the first of the output stream. So that it will be the first bit you read when you ask for a single bit from the so output stream. Maybe your misunderstanding comes from the fact that you handle single bits (in your expl, still not looked your code) as unit held by an array. In C it would be like having char bitarray[N], where each char can be only 0 or 1. Doing this way, you can decide your preferred convention, i.e. if bitarray[0] is the first bit to be output or it is instead bitarray[N-1]. In C this implementation would be hard to use; if I want to write integers less than 16, which can be stored in just 4 bits, it is enough I pass to the function the number, like 12 (binary 1100), and tell the function I want just (the first) 4 bits; otherwise, I (as user of the functions) should split the integer into chars, each rapresenting a bit, pack it into an array in the right endianness, and then pass the array to the function. Maybe this is easier in Ada (I don't know), but it would be harder in C. Consider e.g. the integers output array of the LZW task; in C, it is enough to take one integer per time, and say to the function to output (e.g.) 11 bits of that integer (of course, 11 bits must be enough to hold the value!); you are outputting 11-bits long words (which are the words of the LZW dictionary). Hope I've understood well what you've said and expressed well my explanation (too long as usual, sorry). --ShinTakezou 21:53, 20 December 2008 (UTC)