Talk:Binary strings: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎Break out?: about "binary/byte string")
Line 7: Line 7:
: Then I've taken a look at those tasks and they do not focus on the concept of "byte strings", rather they refer to text strings. This is an ''issue'' if the text string implementation uses a terminator character, like C; and in fact the C solutions to those tasks ([[Copy a string]], [[String concatenation]], [[String length]]) work only for null-terminated string (i.e. "null" char can't be part of the string). (Of course this does not happen in every languages; but C is among those having this "problem"). I think it is enough to add some more C code to those tasks... '''Or''' maybe I gave the wrong name, should it be "Basic binary string manipulation functions"? (binary or according to Wikipedia bytestring) --[[User:ShinTakezou|ShinTakezou]] 15:23, 15 April 2009 (UTC)
: Then I've taken a look at those tasks and they do not focus on the concept of "byte strings", rather they refer to text strings. This is an ''issue'' if the text string implementation uses a terminator character, like C; and in fact the C solutions to those tasks ([[Copy a string]], [[String concatenation]], [[String length]]) work only for null-terminated string (i.e. "null" char can't be part of the string). (Of course this does not happen in every languages; but C is among those having this "problem"). I think it is enough to add some more C code to those tasks... '''Or''' maybe I gave the wrong name, should it be "Basic binary string manipulation functions"? (binary or according to Wikipedia bytestring) --[[User:ShinTakezou|ShinTakezou]] 15:23, 15 April 2009 (UTC)
::You're in a better position to figure that out than I am. I don't think I was ever really clear on byte strings and binary strings (probably because most of the string work I've done is in Java where most of the details are hidden or irrelevant). --[[User:Mwn3d|Mwn3d]] 19:58, 15 April 2009 (UTC)
::You're in a better position to figure that out than I am. I don't think I was ever really clear on byte strings and binary strings (probably because most of the string work I've done is in Java where most of the details are hidden or irrelevant). --[[User:Mwn3d|Mwn3d]] 19:58, 15 April 2009 (UTC)
::: There's nothing but a conventional distinction (but the following Java example says that after all it can be not so conventional after all...). Generally a string is just a sequence of "symbols" (bytes), even text are made of bytes of course... The distinction just stresses the fact that the bytes can be interpreted as text (according to which encoding...?) and are not generic binary data. Strings are not exactly "binary safe" in Java, but there's no terminator in use:

<lang java>public class binsafe {
public static void main(String[] args) {
System.out.print("\000\000test\001\377");
}
}</lang>

::: Outputs

<pre>$ java -cp . binsafe |hexdump -C
00000000 00 00 74 65 73 74 01 c3 bf |..test...|
00000009</pre>

::: Which looks odd since the byte 255 (octal 377) is oddly UTF-8 encoded, infact

<pre>$ printf "\xc3\xbf" |iconv -f utf-8 -t latin1 |hexdump -C
00000000 ff |.|
00000001</pre>

::: Maybe there's a method in the String class that says Java not to "interpret" the string, or maybe such a task in Java should be accomplished using a custom class innerly using byte[]. --[[User:ShinTakezou|ShinTakezou]] 21:53, 15 April 2009 (UTC)

Revision as of 21:53, 15 April 2009

Break out?

I think this task should be broken out into smaller tasks and then put into a category. String concatenation is already a task, so it would be grouped with these tasks. --Mwn3d 18:04, 14 April 2009 (UTC)

Maybe is a good idea... But a path similar to Basic bitmap storage should be begun. I mean, the struct String is shared among all the tasks... so we need also a task like "provide a basic storage for a (binary) string", so that next tasks can refer to it instead of replicating the struct, or linking to where it is defined since it is needed by a specific function, e.g. see "String concatenation" for struct... --ShinTakezou 11:51, 15 April 2009 (UTC)
Then I've taken a look at those tasks and they do not focus on the concept of "byte strings", rather they refer to text strings. This is an issue if the text string implementation uses a terminator character, like C; and in fact the C solutions to those tasks (Copy a string, String concatenation, String length) work only for null-terminated string (i.e. "null" char can't be part of the string). (Of course this does not happen in every languages; but C is among those having this "problem"). I think it is enough to add some more C code to those tasks... Or maybe I gave the wrong name, should it be "Basic binary string manipulation functions"? (binary or according to Wikipedia bytestring) --ShinTakezou 15:23, 15 April 2009 (UTC)
You're in a better position to figure that out than I am. I don't think I was ever really clear on byte strings and binary strings (probably because most of the string work I've done is in Java where most of the details are hidden or irrelevant). --Mwn3d 19:58, 15 April 2009 (UTC)
There's nothing but a conventional distinction (but the following Java example says that after all it can be not so conventional after all...). Generally a string is just a sequence of "symbols" (bytes), even text are made of bytes of course... The distinction just stresses the fact that the bytes can be interpreted as text (according to which encoding...?) and are not generic binary data. Strings are not exactly "binary safe" in Java, but there's no terminator in use:

<lang java>public class binsafe {

 public static void main(String[] args) {
   System.out.print("\000\000test\001\377");
 }

}</lang>

Outputs
$ java -cp . binsafe |hexdump -C
00000000  00 00 74 65 73 74 01 c3  bf                       |..test...|
00000009
Which looks odd since the byte 255 (octal 377) is oddly UTF-8 encoded, infact
$ printf "\xc3\xbf" |iconv -f utf-8 -t latin1 |hexdump -C
00000000  ff                                                |.|
00000001
Maybe there's a method in the String class that says Java not to "interpret" the string, or maybe such a task in Java should be accomplished using a custom class innerly using byte[]. --ShinTakezou 21:53, 15 April 2009 (UTC)