Talk:Bitwise IO: Difference between revisions

Add comment about the "naturalness" of big-endian order
(bits and task specification issues (blah blah! :D))
(Add comment about the "naturalness" of big-endian order)
 
(27 intermediate revisions by 5 users not shown)
Line 55:
 
: But you don't need all this philosophy in order to unambiguously specify the task... (:-)) Merry Christmas! --[[User:Dmitry-kazakov|Dmitry-kazakov]] 09:51, 23 December 2008 (UTC)
 
 
==After task rewriting; but still on the task sense dialog==
Line 125 ⟶ 124:
 
And so on. If the task is not well explained in these details, examples should clarify it. But maybe I am wrong. --[[User:ShinTakezou|ShinTakezou]] 01:14, 28 December 2008 (UTC)
 
: My point is about [http://en.wikipedia.org/wiki/Bit_numbering bit-endianness], it must be specified. You did it by providing an example in the task description. Note that the talk about writing strings into (binary?) files is superfluous and, worse, it is misleading, for exactly same reasons, BTW. Text is '''encoded''' when written as bytes (or any other storage unit). It is sloppiness of [[C]] language, which lets you mix these issues. If the file were UCS-32 encoded your text output would be rubbish.
 
: TCP header defines bits because it contains fields shorter than one byte, and because the physical layer is bit-oriented, which is '''not''' the case for byte-oriented files.
 
: If MIPS architecture has preferred bit-endianness, why should that wonder anybody? --[[User:Dmitry-kazakov|Dmitry-kazakov]] 10:32, 28 December 2008 (UTC)
 
The task specifies we are handling ASCII (encoded) strings. Hopely this is enough to avoid loosing information, that it would happen with any other encoding that uses "full" byte. The bit-endianness is just a labelling problem. Even in the wikipage you linked, no matter if the left bit is labelled as 7 or 0, the byte (binary number with at most 8 digit) is still 10010110. That we can read as the "number" 96 in hex (too lazy to get it in decimal now:D); and if I write such a byte "formed" by such bits into a file, I expect that a hexadecimal dump of that file will give me 96 (string) as output. These are details hidden into the code; no matter how you label the bits; the important fact is that when you use the functions to write the bits 10010110 as a "whole",
you get the byte 96 into the output; and viceversa, when you read first 8 bits from a file having as first byte 96, you must get 10010110 (i.e. 96 :D). And the same if you write an arbitrary sequence, like 100101101100, as a "whole"; when you read back 12 bits, you get back 100101101100 (which is the "integer" 96C in hex)
 
I still can't get the point of the statement of the second paragraph. When I "code" software at some not too low hw level, I deal with bytes, I can't see the bit-orientation. And it is why the RFC can talk that way letting programmer understand and code in the right way application handling that TCP data, disregarding the physical layer. These things could be an issue when writing lowlevel drivers, dealing with serial communication or whatever... But we are a little ''bit'' higher than this! Files are byte-oriented, and it is the reason why we need to pad with "spurious" bits if the bits sequence we want to write has no a number of bits multiple of 8 (supposing a byte "contains" 8 bit); but if we "expand" the bits of each byte, we have a sequence of bits (and maybe the last bits are padding bits...); this is the "vision" of the task.
 
It does not wonder; it just hasn't specified a bit-endiannes, that as said before, is a labelling problem; encoding of the addu instruction is
 
<pre>
0000 00ss ssst tttt dddd d000 0010 0001
</pre>
 
and nobody is telling that the leftmost bit is the 0, or 31. No matter, since encoding of the instruction remain the same, and it is written into memory in the same way. So here indeed we don't know if MIPS prefers to call the leftmost bit 0 or 31.
 
One could think about what's happening with a little endian processor; to have a feel on it
 
<pre>
0000033A 681000 push word 0x10
</pre>
 
from a disassembly; we have 68 (binary 01101000) followed by 16bit LE encoded value. If bits into the first instruction byte have meaning, we could say the encoding would be:
 
<pre>
push byte/word/dword + DATA -> 0110AB00 + DATA
</pre>
 
(It is a fantasy example, x86 push is not this way!) Where bits AB specifies if we are pushing a byte a word or a dword (32bit); AB=00 push byte, AB=10 push word, AB=11 push dword (while AB=01 could be used to specify another kind of instruction); and somewhere it will be stated that DATA must be LE encoded. But one must remember that once the DATA is got from memory, into the (8/16/32 bits) register there's no endianness; if stored DATA is hex 1000, this is, into the reg, just the 16 bit "integer" hex 1000. To talk about the encoding of push byte/word/dword, I don't need to specify a bit-endianness. I must know it when using intruction that manipulates single bits of a register (a said before, motorola 680x0 label the LSB as 0).
 
<pre>
00000336 50 push ax
00000395 51 push cx
0000038A 52 push dx
0000045F 53 push bx
00000136 55 push bp
00000300 58 pop ax
</pre>
 
These "pushes"/pop suggest us that we could say the encoding of the push REG instruction is something like
 
<pre>
0101PRRR RRR = 000 ax P = 0 push
011 bx 1 pop
001 cx
010 dx
101 bp
</pre>
 
It happens that x86 instrunctions are not all of the same length, but it should not throw confusion; the way we use to say how x86 instructions are encoded is the same as the one for the MIPS, the 680x0 or whatelse. And despite of the ''preferred'' endianness(!!) if we like to say it in a bit-wise manner:
 
<pre>
push word DATA -> 0110 1000 LLLL LLLL HHHH HHHH
L = bits of the LS byte (Low)
H = bits of the MS byte (High)
</pre>
 
And this way, which is sometime used, don't need to specify any "bit-endianness": it is clear how bits of the LS byte LLLL LLLL must be put. E.g. for the "integer" 0010, LS byte is 10 (binary 00010000) and MS byte is 00, so we fill L and H this way:
 
<pre>
LLLL LLLL HHHH HHHH
0001 0000 0000 0000
</pre>
 
The endiannes which could lead to problems is the endianness regarding bytes for "integers" stored with more than a single byte. At this (not so low) level, bit-endianness is just a labelling issue and matter just when using instructions like 680x0 bset, bclr and so on.
 
Hopely the task is clear(er) (at least an OCaml programmer seemed to have got it!), and I've learned
«Ada allows specifying a bit order for data type representation» (but underlying implementation will need to map to hardware convention, so it would be faster just to use the "default", I suppose!) --[[User:ShinTakezou|ShinTakezou]] 00:10, 6 January 2009 (UTC)
: Everything in mathematics is just a labeling problem. Mathematics is a process of labeling, no more. As well as computing itself is, by the way. Your following reasoning makes no sense to me. When byte is a container of bits, you cannot name the ordinal number corresponding to the byte '''before''' you label its bits (more precisely define the encoding). The fallacy of your further reasoning is that you use a certain encoding (binary, positional, right to left) without naming it, and then start to argue that there is no any other, that this is natural (so there are others?), that everything else is superfluous etc. In logic A=>A, but proves nothing.
: Here are some examples of encoding in between bits and bytes: [http://en.wikipedia.org/wiki/RADIX-50 4-bit character codes], [http://en.wikipedia.org/wiki/Binary-coded_decimal packed decimal numbers].
: This is an example of a serial bit-oriented protocol [http://en.wikipedia.org/wiki/Controller_Area_Network CAN], note how transmission conflicts are resolved in CAN using the identifier's bits put on the wire. Also note that a CAN controller is responsible to deliver CAN messages to the CPU in the endianness of the later. I.e. it must recode sequences of bits on the wire into 8-bytes data frames + identifiers.
: More about [http://www.linuxjournal.com/article/6788 endianness] --[[User:Dmitry-kazakov|Dmitry-kazakov]] 10:08, 6 January 2009 (UTC)
 
:: Sorry at this point I think we can understand each others. I believe I've explained in a rather straightforward (even though too long) way the point, and can't do better myself. In my computer experience, the "problem" and the task is understandable, clear and not ambiguous. In a implementation-driven way I can say that you've got it as I intended iff the output of the program feeded with the bytes sequence (bytes written in hex)
 
<pre>
41 42 41 43 55 53
</pre>
 
:: (which in ASCII can be read as "ABACUS") is
 
<pre>
83 0a 0c 3a b4 c0
</pre>
 
:: i.e. if you save the output in a file and see it with a hexdumper, you see it; e.g.
 
<pre>
[mauro@beowulf-1w bitio]$ echo -n "ABACUS" |./asciicompress |hexdump -C
00000000 83 0a 0c 3a b4 c0 |...:..|
00000006
[mauro@beowulf-1w bitio]$ echo -n "ABACUS" |./asciicompress |./asciidecompress
ABACUS[mauro@beowulf-1w bitio]$
</pre>
 
::--[[User:ShinTakezou|ShinTakezou]] 18:10, 13 January 2009 (UTC)
 
::: As a small point: you said that most-to-least significant is the "natural" order, but I'd like to point out that that is only true in Western languages that are written left-to-right. In Arabic and Hebrew, decimal digits appear in the same order despite the surrounding text being read right-to-left, so the digits appear in least-to-most significant order. --[[User:Markjreed|Markjreed]] ([[User talk:Markjreed|talk]]) 13:12, 28 March 2024 (UTC)
 
 
== PL/I bitstring longer than REXX'... ==
 
because the input seems to be STRINGS followed by '0D0A00'x
[[User:Walterpachl|Walterpachl]] 20:22, 2 November 2012 (UTC)
 
 
[[User:Carl Johnson|Carl Johnson]]
1,479

edits