Talk:Bitwise IO: Difference between revisions

→‎Real intention: too long half reply
(Number vs. numeral system vs. memory representation)
(→‎Real intention: too long half reply)
Line 21:
--[[User:Dmitry-kazakov|Dmitry-kazakov]] 09:08, 21 December 2008 (UTC)
</blockquote>
 
Interesting. I am planning to rewrite the task text, maybe in a more formal way (and hopefully clearer), but now I am more busy on LZW and Xmas stuffs. I don't think the following clarify the task, but it is worth noting (I believe so).
# All the task can be rewritten shortly: provide functions to write integer numbers packed into ''n'' bits (binary digits), and read integer numbers packed into ''n'' bits. Consider a very long binary number, composed by N bits. Bits of this huge number can be grouped in M integral number n<sub>i</sub>; the grouping is done taking L<sub>1</sub> bits for n<sub>1</sub>, L<sub>2</sub> bits for n<sub>i</sub> and so on. This suggests also the reading process (which is clarified in the next sentece, since we must know the ''convention'' of the grouping). About writing: consider M integral numbers n<sub>i</sub>; the number N will be n<sub>1</sub> + n<sub>2</sub> &sdot; 2<sup>L<sub>1</sub></sup> + n<sub>3</sub> &sdot; 2<sup>L<sub>1</sub> + L<sub>2</sub></sup> + ... which can be written as N = &sum;<sub>i</sub> n<sub>i</sub> &sdot; 2<sup>&sum;<sub>j&lt;i</sub>L<sub>j</sub></sup>, ''i'' from 1 to M, ''j'' from 0 to i-1, and with L<sub>0</sub> = 0. This also should clarify the reading (but to me it makes no clearer the task for everyone). (In this mathematicsish, n<sub>M</sub> is the first number the user requested to output; we could say it is the last, as natural; but this way reading from a stream would mean to have the whole stream &mdash;the huge number&mdash; into memory, instead of splitted into bytes)
# Yes I suppose you can see it that way. It is not important how you define the thing, the important is that the result is as expected. If I want to output the number 1100 in 4 bits, I must obtain the byte 1100'''0000''', bold for padding. If I want to output the number 1001 in 9 bits, I must obtain 00000100 1'''0000000''', bold for padding. If I want to output the number 1100, 4 bits, "followed" by the number 1001 in 9 bits, I must obtain 11000000 01001'''000'''. You can tell the user of your functions that to write the number 1100 (and obtain as output 1100'''0000''') s/he must pack the bits into an array so that <nowiki>A[0] = 1, A[1] = 1, A[2] = 0, A[3] = 0</nowiki>, or <nowiki>A[0] = 0, A[1] = 0, A[2] = 1, A[3] = 1</nowiki> or in any other of the N! variants. Why the output must be that way? Conventionally, I want that if I read a bit per time and build the "number" by shifting leftward already stored bits, I obtain the intended "number". While writing 1100, I want that shifting leftward in the requested size (4 bits), the first bit which goes out is the first bit of output, the second is the second and so on. So let us have an accumulator of infinite size (not important) and the huge number of N bits, the MSB being 1100.... Put the "accumulator" A in front of the huge number H, like A|H (you can consider the whole as the same huge number H, since at the beginning the accumulator is void, so that it is just as writing a lot of zeros in front of the H, like ...00000H, where H are N binary digits). Now, leftshift the "number" A|H (ie multiply it by 2), you obtain 1|100.... The accumulator holds (infinite sequence of 0s)1. It is as if you have read into the accumulator one bit. Leftshift again. Now, we have 11|00....., you can leftshift two more times, obtaining 1100|.... Now the accumulator holds the number 1100, since we have read the same number of bits we wrote (of course, this must be known!). If we want a "new" number, we clear the accumulator and continue the process. Imagining it this way should also make clearer why I called this bit oriented reading (writing is not so different). When building the tape, I must provide a way I give the bits I want to write. Here, A and H are swapped: the accumulator A holds exactly the bits I want to write, H is a growing number, at the beginning just 0 (i.e. an infinite sequence of zeros), but of course in any point of the process it just hold a sequence of bits. I put in the accumulator (shrunk to 4 bits) the number 1100, having H|1100 (it would have been 01100 if I wanted to write the same number, but packed into 5 bits). Leftshifting gives H1|100'''0''', then H11|00'''00''', then H110|0'''000''', then H1100|'''0000'''. At this point I have "appended" the 4bits binary number 1100 to H, building a bigger number. This all suggests an implementation, but I can't say it is mandatory to implement it this way. I just want the following: if the huge number is 1100H, and I ask for 4 bits, I must obtain 1100 (as if I say, in the natural language, to take the first four bits, write it apart, and then delete them, e.g. <del>1100</del>H). If the number is H, and I say I want to write the number 1100, I must obtain H1100. In base 10 seeing it like a turn-game: player 1 write a number (he can write also leading 0 digits), e.g. 06744. Player 2 writes another number, just after the previous; he wants to write 998, so that on the paper now we can read 06744998. Now back: player 1 must extracts digits in order to obtain back its number; he extracts 06744 and to signal how many digits he took, delete them: <del>06744</del>; player 2 must do the same, ignoring of course deleted digits; he takes 998. If we have N players, we obtain a big "unique" number composed by N numbers. Of course, each player, in order to get back its number, must remember rightly how many digits it had, and must hope the player coming before remembered its number of digits too. E.g., if the player 1 deletes just 0674, even though player 2 remembered he wrote 3 digits, he obtains 499, which is wrong... Hope this stream of consciousness helps you to help me to understand how to write the task clearly and (oh yes!) briefly! Rather than as maths, I would say it in a manipulation-of-bits-data way.
# Can't understand the sentence. HL languages sometimes make it harder to make things that in assembly would be a lot easier and "direct". C is HL, but not so HL. The sentence "Maybe this is easier in Ada (I don't know), but it would be '''harder''' in C" (did you refer to this "hard"?), refer to the fact that it would be hard (difficult) to implement in C the task the way you implemented in Ada (i.e. using array to hold the bits): it would make the code not so usable (for LZW, e.g.); the best way (at least, in C) is to manipulate bits with shifts, ands and ors (nearest to operations a processor can do). Not all implementation will follow this way; I wonder how the code can be used to implement a real-world LZW compressor based on the code given in [[LZW compression]].
# Mathematically, there's a "first" bit and a "last" bit. Mathematicians can feel disgust for the expression, but once one learn about numerical positional systems, it is easily understandable what one can mean by saying first digit and last digit of a number. Take it as a short form for "the least significative digit" (LSB for binary) and "the most significative non-zero digit" (but in computer world where we are used to consider integers packed into "fixed size" container, so this can be simply "the most significative digit", MSB for binary). My usage of ''first'' is clear by the context (or so I believed), and sometimes refers to the order of intended output rather than mathematical stuffs, so that the first bit of a stream is the one we wrote first, and the last the one we wrote last, e.g. F1100.....0101L. Users of the functions must know if the first bit of output will be the rightmost (or the bit held into the first/last element of the array) or the leftmost (or the bit held into the last/first element of the array) of the input "bit string", so they can ''accomodate'' the bits to obtain the output they wanted. Rappresentations of numbers are numbers, unless you want to consider physical processes, since bits are "rappresentations" of physical processes into the hardware; but noone is interested in these details while programming. I suppose "representation" is what I've called "packing" (packaging?). We pack numbers into fixed finite size container since for a computer numbers must be "real", not abstract entities. The concept of LSB (least significative bit) is ok for abstract entities too. The MSB no, we must use the "most significative non-zero bit" form. A file has an ordering too. We can imagine a file as a sequences of bytes, so there's a first byte and a last byte. Since each byte "contains" bits, if we look at this bitstream, there's again a first bit and a last bit in the sequence, as said before. If we consider it like a whole huge number with N digits (the number 00030 has 5 digits, even if we can write as 30, and it's the same number but with 2 digits only), as I explained before, then the first being the LSB, would be the "last" in the precious speech. (But I believed it was clear the meaning of my usage). Noone is interested in the physical ordering of the bits into the memory; likely they are rather scattered; but these are uninteresting details we can disregard while programming: we can consider all bits as if they had an ordering. The same for file: we are not interested in how the data are organized on disk; it is enough we can get them in the order we expect!
 
I will try to rewrite the task taking into account these misunderstandings, but not before I make the LZW working (which by the way is also a test-bed for the bits_read/write functions:D!), understand how to split into smaller tasks, and iff this Xmas won't kill me. --[[User:ShinTakezou|ShinTakezou]] 00:19, 22 December 2008 (UTC)