Subset sum problem

Implement a function/procedure/method/subroutine that takes a set/array/list/stream/table/collection of words with integer weights, and identifies a non-empty subset of them whose weights sum to zero (cf. the Dropbox Diet candidate screening exercise and the Subset sum problem Wikipedia article).

For example, for this set of weighted words, one solution would be the set of words {elysee, efferent, deploy, departure, centipede, bonnet, balm, archbishop}, because their respective weights of -326, 54, 44, 952, -658, 452, 397, and -915 sum to zero.

Table of weighted words
word	weight
alliance	-624
archbishop	-915
balm	397
bonnet	452
brute	870
centipede	-658
cobol	362
covariate	590
departure	952
deploy	44
diophantine	645
efferent	54
elysee	-326
eradicate	376
escritoire	856
exorcism	-983
fiat	170
filmy	-874
flatworm	503
gestapo	915
infra	-847
isis	-982
lindholm	999
markham	475
mincemeat	-880
moresby	756
mycenae	183
plugging	-266
smokescreen	423
speakeasy	-745
vein	813

Another solution would be the set of words {flatworm, gestapo, infra, isis, lindholm, plugging, smokescreen, speakeasy}, because their respective weights of 503, 915, -847, -982, 999, -266, 423, and -745 also sum to zero.

You may assume the weights range from -1000 to 1000. If there are multiple solutions, only one needs to be found. Use any algorithm you want and demonstrate it on a set of at least 30 weighted words with the results shown in a human readable form. Note that an implementation that depends on enumerating all possible subsets is likely to be infeasible.

C

<lang c>#include <stdio.h>

include <stdlib.h>
include <values.h>
include <string.h>

define MAX_WEIGHT 1024

typedef struct { void* data; int weight; } item;

inline void setbit(int *x, int n) { x[n/INTBITS] |= (1 << (n % INTBITS)); }

inline int hasbit(int *x, int n) { return x[n/INTBITS] & (1 << (n % INTBITS)); }

int* subsum(item *x, int n) { int *buf, *from, *to, sums, alloc, i; int len = 1 + (n + (INTBITS - 1)) / INTBITS; /* 1 weight, rest bit field */ int rlen = len * sizeof(int); int *seen = calloc(sizeof(int), 2 * n * (MAX_WEIGHT / INTBITS) + 2);

buf = calloc(len, sizeof(int)); alloc = sums = 1;

for (i = 0; i < n; i++) { if (alloc < sums * 2) { alloc = sums * 2; buf = realloc(buf, rlen * alloc); }

for (from = to = buf + len * sums; from > buf; to += len) { from -= len; if (hasbit(seen, from[0] + x[i].weight + MAX_WEIGHT * n)) continue;

setbit(seen, from[0] + x[i].weight + MAX_WEIGHT * n);

memcpy(to, from, rlen); setbit(to + 1, i); sums++;

if (to[0] += x[i].weight) continue;

/* found it */ memcpy(buf, to, rlen); free(seen); return realloc(buf, rlen); } }

/* no solution */ free(seen); free(buf); return 0; }

int main(void) { item em[] = { {"alliance", -624}, {"archbishop", -915}, {"balm", 397}, {"bonnet", 452}, {"brute", 870}, {"centipede", -658}, {"cobol", 362}, {"covariate", 590}, {"departure", 952}, {"deploy", 44}, {"diophantine", 645}, {"efferent", 54}, {"elysee", -326}, {"eradicate", 376}, {"escritoire", 856}, {"exorcism", -983}, {"fiat", 170}, {"filmy", -874}, {"flatworm", 503}, {"gestapo", 915}, {"infra", -847}, {"isis", -982}, {"lindholm", 999}, {"markham", 475}, {"mincemeat", -880}, {"moresby", 756}, {"mycenae", 183}, {"plugging", -266}, {"smokescreen", 423}, {"speakeasy", -745}, {"vein", 813} };

int *ret, n = sizeof(em)/sizeof(em[0]);

if (!(ret = subsum(em, n))) { puts("no zero sums\n"); return 1; }

while (n--) if (hasbit(ret + 1, n)) printf("%2d| %5d %s\n", n, em[n].weight, (char*)em[n].data);

free(ret);

return 0; }</lang> output<lang>12| -326 elysee 11| 54 efferent

9|    44 deploy
8|   952 departure
5|  -658 centipede
3|   452 bonnet
2|   397 balm
1|  -915 archbishop</lang>

Ursala

This solution scans the set sequentially while maintaining a record of all distinct sums obtainable by words encountered thus far, and stops when a zero sum is found. <lang Ursala>#import std

import int

weights =

{

  'alliance': -624,
  'archbishop': -915,
  'balm': 397,
  'bonnet': 452,
  'brute': 870,
  'centipede': -658,
  'cobol': 362,
  'covariate': 590,
  'departure': 952,
  'deploy': 44,
  'diophantine': 645,
  'efferent': 54,
  'elysee': -326,
  'eradicate': 376,
  'escritoire': 856,
  'exorcism': -983,
  'fiat': 170,
  'filmy': -874,
  'flatworm': 503,
  'gestapo': 915,
  'infra': -847,
  'isis': -982,
  'lindholm': 999,
  'markham': 475,
  'mincemeat': -880,
  'moresby': 756,
  'mycenae': 183,
  'plugging': -266,
  'smokescreen': 423,
  'speakeasy': -745,
  'vein': 813}

nullset = ~&nZFihmPB+ =><> ~&ng?r\~&r ^TnK2hS\~&r ^C/~&lmPlNCX *D ^A/sum@lmPrnPX ~&lrmPC

cast %zm

main = nullset weights</lang> The name of the function that takes the weighted set is nullset. It manipulates a partial result represented as a list of pairs, each containing a subset of weighted words and the sum of their weights. Here is a rough translation:

=><> fold right combinator with the empty list as the vacuuous case
~&ng?r\~&r If the partial result contains a zero sum, return it.
^TnK2hS\~&r Concatenate the partial result with the new list of subsets (computed as follows) and delete duplicate sums.
^C/~&lmPlNCX Cons a singleton subset containing the next word to the partial results.
*D Distribute the next word in the set to the partial results and do the following to each.
sum@lmPrnPX Add the weight of the new word to the existing sum.
~&lrmPC Cons the new word to the list of existing ones.
~&nZFihmPB+ To conclude, search for a result with a zero sum, if any, and return its associated subset of weighted words.

output:

<
   'flatworm': 503,
   'gestapo': 915,
   'infra': -847,
   'isis': -982,
   'lindholm': 999,
   'plugging': -266,
   'smokescreen': 423,
   'speakeasy': -745>