Strip control codes and extended characters from a string: Difference between revisions
(→{{header|Python}}: off by 1) |
|||
Line 128: | Line 128: | ||
=={{header|Python}}== |
=={{header|Python}}== |
||
<lang Python>def stripped(x): |
<lang Python>def stripped(x): |
||
return "".join([i for i in x if ord(i) in range(32, |
return "".join([i for i in x if ord(i) in range(32, 127)]) |
||
print stripped("\ba\x00b\n\rc\fd\xc3")</lang>Output:<lang>abcd</lang> |
print stripped("\ba\x00b\n\rc\fd\xc3")</lang>Output:<lang>abcd</lang> |
Revision as of 15:13, 25 June 2011
You are encouraged to solve this task according to the task description, using any language you may know.
The task is to strip control codes and extended characters from a string. The solution should demonstrate how to achieve each of the following results:
- a string with control codes stripped (but extended characters not stripped)
- a string with control codes and extended characters stripped
In ASCII, the control codes have decimal codes 0 through to 31 and 127 and the extended characters have decimal codes greater than 127. On an ASCII based system, if the control codes and the extended characters are stripped, the resultant string would have all of its characters within the range of 32 to 126 decimal on the ascii table.
On a non-ASCII based system, we consider characters that do not have a corresponding glyph on the ASCII table (within the ASCII range of 32 to 126 decimal) to be an extended character for the purpose of this task.
Icon and Unicon
We'll use deletec to remove unwanted characters (2nd argument) from a string (1st argument). The procedure below coerces types back and forth between string and cset. The character set of unwanted characters is the difference of all ASCII characters and the ASCII characters from 33 to 126. <lang Icon>procedure main(A) write(image(deletec(&ascii,&ascii--(&ascii)[33:127]))) end link strings </lang>
The IPL procedure deletec is equivalent to this: <lang Icon>procedure deletec(s, c) #: delete characters
result := "" s ? { while result ||:= tab(upto(c)) do tab(many(c)) return result ||:= tab(0) }
end</lang>
Output:
" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}"
J
Solution: <lang j>stripControlCodes=: -.&(DEL,32{.a.) stripControlExtCodes=: ([ -. -.)&(32}.127{.a.)</lang> Usage: <lang j> mystring=: a. {~ ?~256 NB. ascii chars 0-255 in random order
#mystring NB. length of string
256
#stripControlCodes mystring NB. length of string without control codes
223
#stripControlExtCodes mystring NB. length of string without control codes or extended chars
95
#myunicodestring=: u: ?~1000 NB. unicode characters 0-999 in random order
1000
#stripControlCodes myunicodestring
967
#stripControlExtCodes myunicodestring
95
stripControlExtCodes myunicodestring
k}w:]U3xEh9"GZdr/#^B.Sn%\uFOo[(`t2-J6*IA=Vf&N;lQ8,${XLz5?D0~s)'Y7Kq|ip4<WRCaM!b@cgv_T +mH>1ejPy</lang>
PicoLisp
Control characters in strings are written with a hat (^) in PicoLisp. ^? is the DEL character. <lang PicoLisp>(de stripCtrl (Str)
(pack (filter '((C) (nor (= "^?" C) (> " " C "^A")) ) (chop Str) ) ) )
(de stripCtrlExt (Str)
(pack (filter '((C) (> "^?" C "^_")) (chop Str) ) ) )</lang>
Test:
: (char "^?") -> 127 : (char "^_") -> 31 : (stripCtrl "^I^M a b c^? d äöüß") -> " a b c d äöüß" : (stripCtrlExt "^I^M a b c^? d äöüß") -> " a b c d "
PureBasic
<lang PureBasic>Procedure.s stripControlCodes(source.s)
Protected i, *ptrChar.Character, length = Len(source), result.s *ptrChar = @source For i = 1 To length If *ptrChar\c > 31 result + Chr(*ptrChar\c) EndIf *ptrChar + SizeOf(Character) Next ProcedureReturn result
EndProcedure
Procedure.s stripControlExtCodes(source.s)
Protected i, *ptrChar.Character, length = Len(source), result.s *ptrChar = @source For i = 1 To length If *ptrChar\c > 31 And *ptrChar\c < 128 result + Chr(*ptrChar\c) EndIf *ptrChar + SizeOf(Character) Next ProcedureReturn result
EndProcedure
If OpenConsole()
;create sample string Define i, s.s For i = 1 To 80 s + Chr(Random(254) + 1) ;include character values from 1 to 255 Next
PrintN(stripControlCodes(s)) ;string without control codes PrintN("---------") PrintN(stripControlExtCodes(s)) ;string without control codes or extended chars Print(#CRLF$ + #CRLF$ + "Press ENTER to exit"): Input() CloseConsole()
EndIf</lang> Sample output:
»╫=┐C─≡G(═ç╤â√╝÷╔¬ÿ▌x è4∞|)ï└⌐ƒ9²òτ┌ºáj)▓<~-vPÿφQ╨ù¿╖îFh"[ü╗dÉ₧q#óé├p╫■ --------- =CG(x 4|)9j)<~-vPQFh"[dq#p
Python
<lang Python>def stripped(x): return "".join([i for i in x if ord(i) in range(32, 127)])
print stripped("\ba\x00b\n\rc\fd\xc3")</lang>Output:<lang>abcd</lang>
Tcl
<lang tcl>proc stripAsciiCC str {
regsub -all {[\u0000-\u001f\u007f]+} $str ""
} proc stripCC str {
regsub -all {[^\u0020-\u007e]+} $str ""
}</lang>