Deepcopy: Difference between revisions

From Rosetta Code
Content added Content deleted
(Added PicoLisp)
m ({{omit from|UNIX Shell}})
Line 435: Line 435:
{{omit from|GUISS}}
{{omit from|GUISS}}
{{omit from|PureBasic}}
{{omit from|PureBasic}}
{{omit from|UNIX Shell|Does not have complex data structures}}
{{omit from|ZX Spectrum Basic}}
{{omit from|ZX Spectrum Basic}}

Revision as of 16:17, 5 September 2011

Task
Deepcopy
You are encouraged to solve this task according to the task description, using any language you may know.

Demonstrate how to copy data structures containing complex hetrogeneous and cyclic semantics. This is often referred to as deep copying, and is normally required where structures are mutable and to ensure that independent copies can be manipulated without side-effects.

If this facility is not built into the language, it is permissible to use functions from a common library, or a coded procedure.

The task should show:

  • Relevant semantics of structures, such as their homogeneous or heterogeneous properties, or containment of (self- or mutual-reference) cycles.
  • Any limitations of the method.
  • That the structure and its copy are different.
  • Suitable links to external documentation for common libraries.

E

In E, serialization is generalized to transforming object graphs from one representation to another. Deep copying, therefore, consists of transforming a live object graph into a live object graph, by connecting deSubgraphKit's output to its input. No intermediate serialized form is needed.

<lang e>def deSubgraphKit := <elib:serial.deSubgraphKit> def deepcopy(x) {

 return deSubgraphKit.recognize(x, deSubgraphKit.makeBuilder())

}</lang>

As befits a serialization system, this deep copy may operate on any serializable structure, whether standard or user-defined, and the structure may contain cycles.

<lang e>? def x := ["a" => 1, "b" => [x].diverge()]

  1. value: ["a" => 1, "b" => [<***CYCLE***>].diverge()]

? def y := deepcopy(x)

  1. value: ["a" => 1, "b" => [<***CYCLE***>].diverge()]

? y["b"].push(2)

? y

  1. value: ["a" => 1, "b" => [<***CYCLE***>, 2].diverge()]

? x

  1. value: ["a" => 1, "b" => [<***CYCLE***>].diverge()]

? y["b"][0] == y

  1. value: true

? y["b"][0] == x

  1. value: false

? x["b"][0] == x

  1. value: true</lang>

(.diverge() produces mutable data structures, and <***CYCLE***> is what is printed when printing some object meets itself again.)

See also: Polymorphic copy#E

Go

Go does not have direct support for deep copy. To make deep copies of specific data structures, it is most efficient to write your own copy function and just copy what needs to be copied. <lang go>package main

import "fmt"

// a complex data structure type cds struct {

   i int            // no special handling needed for deep copy
   s string         // no special handling
   b []byte         // copied easily with append function
   m map[int]bool   // deep copy requires looping

}

// a method func (c *cds) deepcopy() *cds {

   // copy what you can in one line
   r := &cds{c.i, c.s, append([]byte{}, c.b...), make(map[int]bool)}
   // populate map with a loop
   for k, v := range c.m {
       r.m[k] = v
   }
   return r

}

// demo func main() {

   // create and populate a structure 
   c1 := &cds{1, "one", []byte("unit"), map[int]bool{1: true}}
   fmt.Println(c1)      // show it
   c2 := c1.deepcopy()  // copy it
   fmt.Println(c2)      // show copy
   c1.i = 0             // change original
   c1.s = "nil"
   copy(c1.b, "zero")
   c1.m[1] = false
   fmt.Println(c1)      // show changes
   fmt.Println(c2)      // show copy unaffected

}</lang> Output:

&{1 one [117 110 105 116] map[1:true]}
&{1 one [117 110 105 116] map[1:true]}
&{0 nil [122 101 114 111] map[1:false]}
&{1 one [117 110 105 116] map[1:true]}

If you need a generalized deep copy, one can be cobbled with the gob package, which does type safe serialization, and an os.Pipe. The deepcopy function shown below works on arbitrary data with a few limitations. It handles data types with recursive or cyclic definitions, but does not handle cycles in the data itself. For example, it handles a linked list, but not a ring data structure. Another limitiation is that struct fields must be exported. (That is, fields must start with an upper case letter. This makes the field visible outside the package.) <lang go>package main

import (

   "fmt"
   "gob"
   "os"

)

// capability requested by task func deepcopy(dst, src interface{}) os.Error {

   r, w, err := os.Pipe()
   if err != nil {
       return err
   }
   enc := gob.NewEncoder(w)
   err = enc.Encode(src)
   if err != nil {
       return err
   }
   dec := gob.NewDecoder(r)
   return dec.Decode(dst)

}

// define linked list type, an example of a recursive type type link struct {

   Value string
   Next *link

}

// method satisfies stringer interface for fmt.Println func (l *link) String() string {

   if l == nil {
       return "nil"
   }
   s := "(" + l.Value
   for l = l.Next; l != nil; l = l.Next {
       s += " " + l.Value
   }
   return s + ")"

}

func main() {

   // create a linked list with two elements
   l1 := &link{"a", &link{Value: "b"}}
   // print original
   fmt.Println(l1)
   // declare a variable to hold deep copy
   var l2 *link
   // copy
   if err := deepcopy(&l2, l1); err != nil {
       fmt.Println(err)
       return
   }
   // print copy
   fmt.Println(l2)
   // now change contents of original list
   l1.Value, l1.Next.Value = "c", "d"
   // show that it is changed
   fmt.Println(l1)
   // show that copy is unaffected
   fmt.Println(l2)

}</lang> Output:

(a b)
(a b)
(c d)
(a b)

A final limitation to mention of the technique above is the unnecessary overhead of serialization and piping. Using the reflect package, it would be possible to write a generalized deep copy function in Go with fewer limitations. You could copy data directly without going through a pipe, for example, and you could code an algorithm to detect and properly handle cyclic data. Estimated LOC is about 500, and not recomended when a simpler solution will do.

Icon and Unicon

Unicon and Icon support heterogeneous structures with loops. The Unicon book has an example of a simple algorithm for producing a deep copy of a structured value (set, list, table, or record); however, that code did not handle graph structures that are not trees. The code for deepcopy below from Unilib is a modification that addresses loops.

The code requires modification to run under Icon as Unicon extended key(X) to operate on lists and records not just tables.

<lang Unicon>procedure deepcopy(A, cache) #: return a deepcopy of A

   local k
   /cache := table()        # used to handle multireferenced objects
   if \cache[A] then return cache[A]
   case type(A) of {
       "table"|"list": {
           cache[A] := copy(A)
           every cache[A][k := key(A)] := deepcopy(A[k], cache)
           }
       "set": {
           cache[A] := set()
           every insert(cache[A], deepcopy(!A, cache))
           }
       default: {           # records and objects (encoded as records)
           cache[A] := copy(A)
           if match("record ",image(A)) then {
               every cache[A][k := key(A)] := deepcopy(A[k], cache)
               }
           }
       }
   return .cache[A]

end</lang>

The following code demonstrates deepcopy usage and that the resulting structure is different from the original by comparing assignment, copy, and deepcopy.

<lang Icon>link printf,ximage

procedure main()

  knot    := makeknot()     # create a structure with loops
  knota   := knot           # copy by assignment (reference)
  knotc   := copy(knot)     # built-in copy (shallow)
  knotdc  := deepcopy(knot) # deep copy


  showdeep("knota  (assignment) vs. knot",knota,knot) 
  showdeep("knotc  (copy)  vs. knot",knotc,knot) 
  showdeep("knotdc (deepcopy)  vs. knot",knotdc,knot) 
  xdump("knot   (original)",knot)
  xdump("knota  (assignment)",knota)
  xdump("knotc  (copy)",knotc)
  xdump("knotdc (deepcopy)",knotdc)

end

record rec1(a,b,c) # record for example

class Class1(a1,a2) # class - looks like a record under the covers

  method one()
     self.a1 := 1
     return
  end

initially

  self.a1 := 0

end


procedure makeknot() #: return a homogeneous structure with loops

  L := [9,8,7]
  T := table()
  T["a"] := 1
  R := rec1(T)
  S := set(R)
  C := Class1()
  C.one()
  T["knot"] := [L,R,S,C]
  put(L,R,S,T,C)
  return L

end

procedure showdeep(tag,XC,X) #: demo to show (non-)equivalence of list elements

  printf("Analysis of copy depth for %s:\n",tag)
  showequiv(XC,X)
  every showequiv(XC[i := 1 to *X],X[i])

end

procedure showequiv(x,y) #: show (non-)equivalence of two values

  return printf("   %i %s %i\n",x,if x === y then "===" else "~===",y)

end</lang>

DeepCopy.icn

printf.icn provides printf ximage.icn provides xdump

The sample of output below compares all elements of a copy of a structure against the original. Immutable types like numbers, strings, and csets will show as the same (i.e. ===) and different mutable types will show as not the same (i.e. ~===). This clearly shows the difference between assignment, copy, and deepcopy.

Analysis of copy depth for knota  (assignment) vs. knot:
   list_11(7) === list_11(7)
   9 === 9
   8 === 8
   7 === 7
   record rec1_2(3) === record rec1_2(3)
   set_2(1) === set_2(1)
   table_2(2) === table_2(2)
   record Class1__state_2(4) === record Class1__state_2(4)
Analysis of copy depth for knotc  (copy)  vs. knot:
   list_13(7) ~=== list_11(7)
   9 === 9
   8 === 8
   7 === 7
   record rec1_2(3) === record rec1_2(3)
   set_2(1) === set_2(1)
   table_2(2) === table_2(2)
   record Class1__state_2(4) === record Class1__state_2(4)
Analysis of copy depth for knotdc (deepcopy)  vs. knot:
   list_14(7) ~=== list_11(7)
   9 === 9
   8 === 8
   7 === 7
   record rec1_3(3) ~=== record rec1_2(3)
   set_3(1) ~=== set_2(1)
   table_4(2) ~=== table_2(2)
   record Class1__state_3(4) ~=== record Class1__state_2(4)
...

Another way to show the difference in the structures is to use the xdump procedure will produce the following in stderr (&errout):

knot   (original)"
L11 := list(7)
   L11[1] := 9
   L11[2] := 8
   L11[3] := 7
   L11[4] := R_rec1_2 := rec1()
      R_rec1_2.a := T2 := table(&null)
         T2["a"] := 1
         T2["knot"] := L12 := list(4)
            L12[1] := L11
            L12[2] := R_rec1_2
            L12[3] := S2 := set()
               insert(S2,R_rec1_2)
            L12[4] := R_Class1__state_2 := Class1__state()
               R_Class1__state_2.a1 := 1
   L11[5] := S2
   L11[6] := T2
   L11[7] := R_Class1__state_2
...

J

J uses pass by value semantics (typically implemented as copy on write) so Deepcopy is trivial -- values inside the language are immutable.

<lang j> a=:b=: 2 2 2 2 2 NB. two copies of the same array

  b=: 3 (2)} b  NB. modify one of the arrays
  b

2 2 3 2 2

  a

2 2 2 2 2</lang>

That said, J can reference values outside the language. But Deepcopy of those values is, by definition, outside the scope of the language. Usually, bringing the values into the language is sufficient.

Another possible exception would be classes and objects (which are not values but collections of references to values). But as a general rule copying of an object should be delegated to the object itself rather than imposed from the outside. Also, "deepcopy" violates the Law of Demeter as well as the concept of black-box reuse -- if you need deepcopy in J, you probably should not be representing your data structure as objects.

PARI/GP

All copies in GP are deep; this is one of its major inefficiencies when working with large objects.

In PARI, this is accomplished with the command gcopy rather than shallowcopy or its specialized versions (leafcopy, icopy, icopy). The function takes and returns a GEN. See section 10.6 of the User's Guide to the PARI Library.

PicoLisp

A shallow copy can be done with 'copy'. This function takes care of cons pairs and lists, no matter whether they are cyclic, or end in NIL or some other data structure.

For a known depth, it might be used in combination with other list functions. For example, to copy a non-cyclic structure of depth 2 with 'mapcar': <lang PicoLisp>(mapcar copy List)</lang> Copying non-cyclic structures of arbitrary depth and list-termination could be handled with a custom function (using 'cons'): <lang PicoLisp>(de deepCopy (X)

  (if (atom X)
     X
     (cons (deepCopy (car X)) (deepCopy (cdr X))) ) )</lang>

Test: <lang PicoLisp>: (setq A '((a . b) (c d e) f g . e)) -> ((a . b) (c d e) f g . e)

(setq B (deepCopy A))

-> ((a . b) (c d e) f g . e)

A

-> ((a . b) (c d e) f g . e)

B

-> ((a . b) (c d e) f g . e)

(= A B)

-> T # A and its copy B are structure-equal

(== A B)

-> NIL # but they are not identical (pointer-equal)

(cadr A)

-> (c d e)

(cadr B)

-> (c d e)

(== (cadr A) (cadr B))

-> NIL # The same holds for sub-structures</lang> For cyclic structures, the above 'deepCopy' function could be extended, to remember already visited structures and their copies in a mark list: <lang PicoLisp>(de deepCopy (X)

  (let Mark NIL
     (recur (X)
        (cond
           ((atom X) X)
           ((asoq X Mark) (cdr @))
           (T
              (prog1 (cons)
                 (push 'Mark (cons X @))
                 (set @ (recurse (car X)))
                 (con @ (recurse (cdr X))) ) ) ) ) ) )</lang>

Test: <lang PicoLisp>: (setq A '(a b .) B (deepCopy A)) -> (a b .)

A

-> (a b .)

B

-> (a b .)

(= A B)

-> T # A and its copy B are structure-equal

(== A B)

-> NIL # but they are not identical (pointer-equal)</lang>

Ruby

Rubyists can hack a deep copy by using the core class Marshal. The intermediate form of Marshal.load(Marshal.dump object) saves the object and any descendant objects.

<lang ruby># _orig_ is a Hash that contains an Array. orig = { :num => 1, :ary => [2, 3] } orig[:cycle] = orig # _orig_ also contains itself.

  1. _copy_ becomes a deep copy of _orig_.

copy = Marshal.load(Marshal.dump orig)

  1. These changes to _orig_ never affect _copy_,
  2. because _orig_ and _copy_ are disjoint structures.

orig[:ary] << 4 orig[:rng] = (5..6)

  1. Because of deep copy, orig[:ary] and copy[:ary]
  2. refer to different Arrays.

p orig # => {:num=>1, :ary=>[2, 3, 4], :cycle=>{...}, :rng=>5..6} p copy # => {:num=>1, :ary=>[2, 3], :cycle=>{...}}

  1. The original contains itself, and the copy contains itself,
  2. but the original and the copy are not the same object.

p [(orig.equal? orig[:cycle]),

  (copy.equal? copy[:cycle]),
  (not orig.equal? copy)]	# => [true, true, true]</lang>

Marshal cannot dump an object that relates to the system (like Dir or IO), relates to the program (like MatchData or Thread), uses an anonymous class or module, or has a singleton method. (ri Marshal.dump documents this restriction.) If Marshal encounters any such object, then the deep copy fails.

Marshal can dump internal objects, but never copies them. The internal objects are nil, false, true and instances of Fixnum or Symbol. For example, Marshal.load(Marshal.dump :sym) returns the original :sym, not a copy.

The internal objects are almost immutable, so there is almost no reason to copy them. Yet, there are esoteric ways to modify them. For example, nil.instance_eval { @i = 1 } would modify nil. A program cannot have another copy of nil to escape such modification. If there was a deep copy of some object that contains nil, then such modification would also affect nil inside such copy.

Tcl

Tcl uses an immutable value model that is implemented as copy-on-write at the low level, so deep copies are generally not required. However, they can be achieved by simply appending a letter to the value and stripping it off again: <lang tcl>set deepCopy [string range ${valueToCopy}x 0 end-1]</lang> For objects (as introduced in Tcl 8.6), there is a command to create a copy: <lang tcl>set copiedObject [oo::copy $originalObject]</lang>