Talk:Bitwise IO: Difference between revisions

No edit summary
Line 34:
::I see Shin's point. In the real world, the LZW symbol stream is always packed into octets. The GIF and TIFF standards each specify a packing scheme. However, the method is not as simple as you may realize. As the symbol table is filled, the maximum bit length goes up one bit at specific points (i.e. 511 -> 512), and the packing scheme takes advantage of that. Perhaps Shin could reference one of these standards for the task.
::If Shin's actual goal is to measure the compression achieved by LZW compared to the input stream, that is more easily accomplished. The output symbols start at 9-bits, so simply multiply output symbols by 9 and divide by 8. For the test string of the task"TOBEORNOTTOBEORTOBEORNOT" (24 bytes), it compresses to 16 symbols which would pack into ((9*16)+7)/8 = 18 bytes. The calculation becomes more complex if there are more than 256 output symbols, because the symbol size increases. (I implemented an 8086 assembly TIFF LZW codec once upon a time.) --[[User:IanOsgood|IanOsgood]] 17:09, 22 December 2008 (UTC)
 
:::I think I have to rewrite the text of the task. The approach I've used of course has a direct analog into streaming of bytes. No one would have talked about endianness or fine maths points if I had said "write a stream of bytes", and read them "keeping the streaming order"; said badly, but just an example would have been enough to understand: <tt>printf("hello world")</tt>, or <tt>fwrite("\x45\x20\x50\xAA", 1, 4, out)</tt> ... the two functions printf and fwrite (in C) would have completed the task (fwrite being more general, while printf can't allow a "null" byte). Simply I wanted something similar, but for bits (and it is what my C impl does). One can also implement the whole stuff so that what was leftshift in my previous speech is now a rightshift; it is enough to pick bytes from "the bottom" and walk backward, which means seeking into a file (also if using buffering, and process the bytes from the end of the buffer); it will be inefficient if one would like to use the technics on a pipe, since in order to start the decompression, one must wait for the end of the stream (that could mean: buffering the whole stream... memory consuming). I wanted no to have a way of misuring efficiency (as you say, this can be done by exstimation); I've just implemented, in C, a working LZW compressor: instead of outputting an array, I output bits; added a big endian 32 bit integer to store how many bytes there were in the uncompressed file, and I've a compressor. I needed these bits routines to achieve my aim. The LZW C compression code for now can't fit into RC (I've just written one in Obj-C, which has a dictionary object..., the way the others did, ie. outputting an array), since I had to implement a Dictionary and a general way to handle strings of arbitrary bytes, so that I was able to translate the Java code.
 
4. You are dealing with words of a language, which always have someway elastic meaning and usage, and it was what I tried to stress. If a number is abstract for a mathematician (what is an abstract object without representation? a definition could be counted as a form of representation?), or if it is defined just as a successor of a successor of a successor ... of 0, e.g. 0SSSS would be a rapresentation of 4, which is a widespread common representation itself, or what, it's not so interesting after all. There's no single human being (mathematicians included) able to use abstract numbers. You can talk about these, but they would be a lot unuseful for a lot of tasks. Bit is the short form for Binary digIT; so let's talk about digits. What is a digits? A matter of definition. I say a matching couple {} is a digit. Here's it comes digits into Zermelo–Fraenkel. You could say {} itself is a representation of something. Yes, indeed every symbols could be considered a representation of something, more or less abstract. So that mathematicians using symbols to talk about abstract numbers, are using a representation of something; so that I can define the word "digit" to match a unit of their rapresentation. And I can also define someway what is first digit and last digit. Philosophically, I can say that a representation of something could be itself; my body is a representation of myself, and I feel also it is myself (My body is really me, sort of citation of a Cohen's song). And it becomes harder if we consider that the way we communicate is done by symbols and representations. If representations are nothing (since they exist Real Thing which are no representations and are not even only representations of themselves), our speechs are void. We exist by reference. Everything exists by reference. (I use ''referencing'' to mean "assigning a representation", or definition). Referencing is what makes the things exist. And it's why I believe there's such a digit in a number. I've just "created" it by referencing it in my speech (I've defined it, more or less explicitly). Back to the earthground, we use representations of numbers which has "concept" of digits, and first and last digit as I've defined before; human languages allow us to say simply that numbers are made of digits. If you write an math article about number theory, you can write as you prefer; but out of that world and article (or out of the world of the article), it makes few sense: for all practical and communicative use, numbers have digits. By the way, I define the less significative digit of ZF as the innermost {}. As you can see, there's no escape, since to express your thoughts (no matter the concepts involved!) you '''need''' using symbols, which always are representations of something!!
 
Definitions are important, make the things exist, as said before. A byte is an entity that is made of bits (bits are binary digits). By convention, eight. A byte-oriented file has bytes, and bytes being made by bits, contain bits (sometimes I could put "", but now I won't). A byte can be interpreted (interpretation and representation are different, even though one can link them some conventional way) as character or number. We are interested in the number interpretation. Since a byte is made by 8 binary digits, we understand that a byte is just a number rapresented in base 2 with at most 8 digits. So, it is a number from 0 to 255, no more. We learned about numerical system, and so we know what is a less significative digit (and if not, I can reference my previous talk and it is enough to continue speaking about LS digits). When I must write my age, I write 30, no 03. The same apply to a number written in base 2. It has a well know order. Conventional, but of that kind of common widespread convention that one can't ignore, since it's one of the basis of the communication; one can change the convention, but then can't be surprised a lot of persons misunderstand what s/he says. So, we have data called bytes that we can interpret as binary numbers, and this interpretation is suitable for many tasks, and there are simple way we can link between different interpretations. All, maybe by chance maybe not, is related.
And after all, it is all related to how the hardware is built and to the physical processes involved in keeping/retrieving a "''quantum''" of information into a thing we call simply "memory". Of course you are right, the meaning is in the interpretation. And haven't I given you a lot of different (but correlated) interpretations of the task? I've talk about a huge number, ''represented'' in base 2, and how to "manipulate" it in order to obtain smaller base 2 numbers, and how these are related to the inputs of the functions of the task. '''But''', the fact that byte in a computer a made of bits, is a little bit (!) below the level of an application. It is just something an application "must" know before, it is not it that decided how bits are packed into a byte, nor the order of them that makes it meaningful to say a+b=c i.e. and take a human representation that (by chance!) matchs what we know about sum of binary numbers (with special conventions for the computer world). Applications just use those conventions. They indeed can change it totally... but you need writing a lof of code to do so! Relying on the underlaying conventions, make it all easier. (In fact, try to use any of the other 8! arrangements of bits into a byte and still use your preferred language +, = and so on... I suppose you must write a lot of code to use them the same way as you normally do).
 
I believed my "mathematical" approach of the "huge {''rapresented in'' base 2} number" was clear; maybe I will rewrite the task using the byte streaming analogy (print "hello world" or print bin"11010100101001010110" should be similar...). Good Xmas and 2009 to all --[[User:ShinTakezou|ShinTakezou]] 02:01, 23 December 2008 (UTC)