Subset sum problem

Revision as of 03:54, 3 January 2012 by rosettacode>Ledrug (→‎{{header|C}}: fix: previous reasoning was wrong)

Implement a function/procedure/method/subroutine that takes a set/array/list/stream/table/collection of words with integer weights, and identifies a non-empty subset of them whose weights sum to zero (cf. the Dropbox Diet candidate screening exercise and the Subset sum problem Wikipedia article).

Subset sum problem is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

For example, for this set of weighted words, one solution would be the set of words {elysee, efferent, deploy, departure, centipede, bonnet, balm, archbishop}, because their respective weights of -326, 54, 44, 952, -658, 452, 397, and -915 sum to zero.

Table of weighted words
word weight
alliance -624
archbishop -915
balm 397
bonnet 452
brute 870
centipede -658
cobol 362
covariate 590
departure 952
deploy 44
diophantine 645
efferent 54
elysee -326
eradicate 376
escritoire 856
exorcism -983
fiat 170
filmy -874
flatworm 503
gestapo 915
infra -847
isis -982
lindholm 999
markham 475
mincemeat -880
moresby 756
mycenae 183
plugging -266
smokescreen 423
speakeasy -745
vein 813

Another solution would be the set of words {flatworm, gestapo, infra, isis, lindholm, plugging, smokescreen, speakeasy}, because their respective weights of 503, 915, -847, -982, 999, -266, 423, and -745 also sum to zero.

You may assume the weights range from -1000 to 1000. If there are multiple solutions, only one needs to be found. Use any algorithm you want and demonstrate it on a set of at least 30 weighted words with the results shown in a human readable form. Note that an implementation that depends on enumerating all possible subsets is likely to be infeasible.

C

<lang c>#include <stdio.h>

  1. include <stdlib.h>
  2. include <values.h>
  3. include <string.h>
  1. define MAX_WEIGHT 1024

typedef struct { void* data; int weight; } item;

inline void setbit(int *x, int n) { x[n/INTBITS] |= (1 << (n % INTBITS)); }

inline int hasbit(int *x, int n) { return x[n/INTBITS] & (1 << (n % INTBITS)); }

int* subsum(item *x, int n) { int *buf, *from, *to, sums, alloc, i; int len = 1 + (n + (INTBITS - 1)) / INTBITS; /* 1 weight, rest bit field */ int rlen = len * sizeof(int); int *seen = calloc(sizeof(int), 2 * n * (MAX_WEIGHT / INTBITS) + 2);

buf = calloc(len, sizeof(int)); alloc = sums = 1;

for (i = 0; i < n; i++) { if (alloc < sums * 2) { alloc = sums * 2; buf = realloc(buf, rlen * alloc); }

for (from = to = buf + len * sums; from > buf; to += len) { from -= len; if (hasbit(seen, from[0] + x[i].weight + MAX_WEIGHT * n)) continue;

setbit(seen, from[0] + x[i].weight + MAX_WEIGHT * n);

memcpy(to, from, rlen); setbit(to + 1, i); sums++;

if (to[0] += x[i].weight) continue;

/* found it */ memcpy(buf, to, rlen); free(seen); return realloc(buf, rlen); } }

/* no solution */ free(seen); free(buf); return 0; }

int main(void) { item em[] = { {"alliance", -624}, {"archbishop", -915}, {"balm", 397}, {"bonnet", 452}, {"brute", 870}, {"centipede", -658}, {"cobol", 362}, {"covariate", 590}, {"departure", 952}, {"deploy", 44}, {"diophantine", 645}, {"efferent", 54}, {"elysee", -326}, {"eradicate", 376}, {"escritoire", 856}, {"exorcism", -983}, {"fiat", 170}, {"filmy", -874}, {"flatworm", 503}, {"gestapo", 915}, {"infra", -847}, {"isis", -982}, {"lindholm", 999}, {"markham", 475}, {"mincemeat", -880}, {"moresby", 756}, {"mycenae", 183}, {"plugging", -266}, {"smokescreen", 423}, {"speakeasy", -745}, {"vein", 813} };

int *ret, n = sizeof(em)/sizeof(em[0]);

if (!(ret = subsum(em, n))) { puts("no zero sums\n"); return 1; }

while (n--) if (hasbit(ret + 1, n)) printf("%2d| %5d %s\n", n, em[n].weight, (char*)em[n].data);

free(ret);

return 0; }</lang> output<lang>12| -326 elysee 11| 54 efferent

9|    44 deploy
8|   952 departure
5|  -658 centipede
3|   452 bonnet
2|   397 balm
1|  -915 archbishop</lang>

Ursala

This solution scans the set sequentially while maintaining a record of all distinct sums obtainable by words encountered thus far, and stops when a zero sum is found. <lang Ursala>#import std

  1. import int

weights =

{

  'alliance': -624,
  'archbishop': -915,
  'balm': 397,
  'bonnet': 452,
  'brute': 870,
  'centipede': -658,
  'cobol': 362,
  'covariate': 590,
  'departure': 952,
  'deploy': 44,
  'diophantine': 645,
  'efferent': 54,
  'elysee': -326,
  'eradicate': 376,
  'escritoire': 856,
  'exorcism': -983,
  'fiat': 170,
  'filmy': -874,
  'flatworm': 503,
  'gestapo': 915,
  'infra': -847,
  'isis': -982,
  'lindholm': 999,
  'markham': 475,
  'mincemeat': -880,
  'moresby': 756,
  'mycenae': 183,
  'plugging': -266,
  'smokescreen': 423,
  'speakeasy': -745,
  'vein': 813}

nullset = ~&nZFihmPB+ =><> ~&ng?r\~&r ^TnK2hS\~&r ^C/~&lmPlNCX *D ^A/sum@lmPrnPX ~&lrmPC

  1. cast %zm

main = nullset weights</lang> The name of the function that takes the weighted set is nullset. It manipulates a partial result represented as a list of pairs, each containing a subset of weighted words and the sum of their weights. Here is a rough translation:

  • =><> fold right combinator with the empty list as the vacuuous case
  • ~&ng?r\~&r If the partial result contains a zero sum, return it.
  • ^TnK2hS\~&r Concatenate the partial result with the new list of subsets (computed as follows) and delete duplicate sums.
  • ^C/~&lmPlNCX Cons a singleton subset containing the next word to the partial results.
  • *D Distribute the next word in the set to the partial results and do the following to each.
  • sum@lmPrnPX Add the weight of the new word to the existing sum.
  • ~&lrmPC Cons the new word to the list of existing ones.
  • ~&nZFihmPB+ To conclude, search for a result with a zero sum, if any, and return its associated subset of weighted words.

output:

<
   'flatworm': 503,
   'gestapo': 915,
   'infra': -847,
   'isis': -982,
   'lindholm': 999,
   'plugging': -266,
   'smokescreen': 423,
   'speakeasy': -745>