Jump to content

Talk:Bitwise IO: Difference between revisions

no edit summary
(how to calculate real world LZW compression)
No edit summary
Line 32:
:4. Ooch, no, in mathematics numbers have neither bits nor digits. For that matter, in [http://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_theory Zermelo–Fraenkel set theory] numbers are first introduced as sets {} (empty set is 0), {{}} (set with 0 inside is 1), {{{}}} (set with 1 inside is 2). No any bits in sight. Nor it means that numbers have first and last brackets! Digit, bit, LSB, MSB, exponent, bracket etc are entities of representations. There are countless ways to representation numbers. Representations themselves are not numbers. Representation R is in a mapping from some set S to N (the set of numbers): R:S->N. When S is the set of English words, then "two" is a representation of the number {{{}}}. When S is the set of binary numerals then 10<sub>2</sub> is a representation of the same number {{{}}}. Division between "physical" and what? is below my radar, because you could not define "physical" anyway. About files. A byte-oriented file has bytes, these bytes contain themselves. They don't contain bits, characters, images or Shakespear's plays in PDF format. They just do not. It is a ''media layer'' in [http://en.wikipedia.org/wiki/OSI_layer#Layer_2:_Data_Link_Layer OSI] terms. The content (meaning), e.g. bits, characters etc, lies above it in the application that deals with the file. Your '''application''' knows the meaning of bytes, as defined by the task, i.e. to keep sequences of bits in a '''certain''' way. Merry Christmas! --[[User:Dmitry-kazakov|Dmitry-kazakov]] 08:59, 22 December 2008 (UTC)
 
::I see Shin's point. In the real world, the LZW symbol stream is always packed into octets. The GIF and TIFF standards each specify a packing scheme. However, the method is not as simple as you may realize. As the symbol table is filled, the maximum bit length goes up one bit at specific points (i.e. 511 -> 512), and the packing scheme takes advantage of that. Perhaps Shin could reference one of these standards for the task.
::If Shin's actual goal is to measure the compression achieved by LZW compared to the input stream, that is more easily accomplished. The output symbols start at 9-bits, so simply multiply output symbols by 9 and divide by 8. For the test string of the task"TOBEORNOTTOBEORTOBEORNOT" (24 bytes), it compresses to 16 symbols which would pack into ((9*16)+7)/8 = 18 bytes. The calculation becomes more complex if there are more than 256 output symbols, because the symbol size increases. (I implemented an 8086 assembly TIFF LZW codec once upon a time.) ~~----[[User:IanOsgood|IanOsgood]] 17:09, 22 December 2008 (UTC)
Anonymous user
Cookies help us deliver our services. By using our services, you agree to our use of cookies.