Search a list of records: Difference between revisions

Content added Content deleted

Inline

Revision as of 04:23, 19 October 2015

Given a vector type (such as list or array), which may be homogeneous or heterogeneous, search for something. Demonstrate by searching for at least two types of things (which may require multiple examples).

For one of your examples, you should:

Show an ordered list or array in which each element has two properties: one string and one numeric. For example, use the names and populations of the 10 most populous municipalities in Africa, as reported in https://en.wikipedia.org/wiki/List_of_metropolitan_areas_in_Africa
Return the index (in this list/array) of a particular city as returned by a search on its name
Return the name of the first city with a population exceeding a particular threshold.

For example, from an array of the African municipality data (population in millions), shown here in a JSON format (but use whatever representation is most idiomatic in your language):

<lang JavaScript>[{"name":"Lagos", "population":21}, {"name":"Cairo", "population":15.2}, {"name":"Kinshasa-Brazzaville", "population":11.3}, {"name":"Greater Johannesburg", "population":7.55}, {"name":"Mogadishu", "population":5.85}, {"name":"Khartoum-Omdurman", "population":4.98}, {"name":"Dar Es Salaam", "population":4.7}, {"name":"Alexandria", "population":4.58}, {"name":"Abidjan", "population":4.4}, {"name":"Casablanca", "population":3.98}]</lang>

Show code which uses general search functions to return:

. The (zero-based) index of Dar Es Salaam in the list,
. The name of the first city in this list (sorted by descending size of municipality) which has a population of less than 5 million.

The search function(s) which you demonstrate should normally take at least two arguments:

The vector (array/list) itself
A predicate function which, applied to an element in the list/array, returns true if that element matches the search requirement.

( If this is not the approach which would be most natural or idiomatic in your language, explain why, and show an alternative approach to obtaining the same results )

EchoLisp

We demonstrate the vector-search primitive, which takes as input a vector, and a predicate. <lang scheme> (require 'struct) (require 'json)

importing data

(define cities

<<

[{"name":"Lagos", "population":21}, {"name":"Cairo", "population":15.2}, {"name":"Kinshasa-Brazzaville", "population":11.3}, {"name":"Greater Johannesburg", "population":7.55}, {"name":"Mogadishu", "population":5.85}, {"name":"Khartoum-Omdurman", "population":4.98}, {"name":"Dar Es Salaam", "population":4.7}, {"name":"Alexandria", "population":4.58}, {"name":"Abidjan", "population":4.4}, {"name":"Casablanca", "population":3.98}] >>#)

define a structure matching data heterogenous slots values

(struct city (name population))

convert JSON to EchoLisp instances of structures

(set! cities (vector-map (lambda(x) (json->struct x struct:city)) (json-import cities)))

search by name, case indifferent

(define (city-index name) (vector-search (lambda(x) (string-ci=? (city-name x) name)) cities))

returns first city name such as population < seuil

(define (city-pop seuil) (define idx (vector-search (lambda(x) (< (city-population x) seuil)) cities)) (if idx (city-name (vector-ref cities idx)) (cons seuil 'not-found)))

(city-index "Dar Es Salaam") → 6 (city-pop 5) → "Khartoum-Omdurman" (city-pop -666) → (-666 . not-found) (city-index "alexandra") → #f </lang>

J

J supports several "searching" primitives.

i. finds the indices of the things being looked for (with 0 being the first index). Nonmatches get a result of 1+largest valid index.

<lang j> 1 2 3 4 5 6 7 8 9 i. 2 3 5 60 1 2 4 9

  (;:'one two three four five six seven eight nine') i. ;:'two three five sixty'

1 2 4 9</lang>

e. finds whether items are members of a set, returning a bitmask to select the members:

<lang j> 1 2 3 4 5 6 7 8 9 e. 2 3 5 60 0 1 1 0 1 0 0 0 0

  (;:'one two three four five six seven eight nine') e. ;:'two three five sixty'

0 1 1 0 1 0 0 0 0</lang>

I. finds indices, but performs a binary search (which requires that the list being searched is sorted). This can be useful for finding non-exact matches (the index of the next value is returned for non-exact matches).

<lang j> 1 2 3 4 5 6 7 8 9 I. 2 3 5 60 6.66 1 2 4 9 6

  (;:'eight five four nine one seven six three two') I. ;:'two three five sixty'

8 7 1 7</lang>

And, for the tabular example in the current task description, here is the data we will be using:

<lang J>colnumeric=: 0&".&.>@{`[`]}

data=: 1 colnumeric |: fixcsv 0 :0 Lagos, 21 Cairo, 15.2 Kinshasa-Brazzaville, 11.3 Greater Johannesburg, 7.55 Mogadishu, 5.85 Khartoum-Omdurman, 4.98 Dar Es Salaam, 4.7 Alexandria, 4.58 Abidjan, 4.4 Casablanca, 3.98 )</lang>

And here are the required computations:

<lang J> (0 { data) i. <'Dar Es Salaam' 6

  (i. >./)@(* 5&>)@:>@{: data

5

  5 {:: 0 {data

Khartoum-Omdurman</lang>

The "general search function" mentioned in the task does not seem a natural fit for this set of data, because of the multi-column nature of this data. Nevertheless, we could for example define:

<lang j>gsf=: 1 :0

  I. u x { y

)</lang>

This uses the single argument aspect of the definition of I. to convert a bit mask to the corresponding sequence of indices. And the column(s) we are searching on are exposed as a parameter for the interface, which allows us to ignore (for this problem) the irrelevant columns...

Thus, we could say:

But this doesn't seem any clearer or more concise than our previous expression which finds the index of the first example of the most populous city with a population less than five million. Not only that, but if there were multiple cities which had the same population number which satisfied this constraint, this version would return all of those indices where the task explicitly required we return the first example.

J: Another approach

<lang j> city=: <;._1 ';Lagos;Cairo;Kinshasa-Brazzaville;Greater Johannesburg;Mogadishu;Khartoum-Omdurman;Dar Es Salaam;Alexandria;Abidjan;Casablanca'

  popln=: 21 15.2 11.3 7.55 5.85 4.98 4.7 4.58 4.4 3.98
  city i. <'Dar Es Salaam'            NB. index of Dar Es Salaam

6

  city {~ (popln < 5) {.@# \: popln   NB. name of first city with population less than 5 million

┌─────────────────┐ │Khartoum-Omdurman│ └─────────────────┘</lang>

JavaScript

ES5

For arrays containing simple string and numeric datatypes, JavaScript provides Array.toIndex().

Note that JS uses two different equality operators for simple types:

Under Abstract equality, 3.0 == '3' -> true (http://www.ecma-international.org/ecma-262/5.1/#sec-9.12)
Under Strict equality, 3.0 === '3' -> false http://www.ecma-international.org/ecma-262/5.1/#sec-11.9.6

and Array.toIndex matches only on Strict equality. Therefore:

<lang JavaScript>(function () {

 var blnAbstractEquality = (3.0 == '3'), // true  http://www.ecma-international.org/ecma-262/5.1/#sec-9.12
     blnStrictEquality = (3.0 === '3'); // false;  http://www.ecma-international.org/ecma-262/5.1/#sec-11.9.6

 var lstNumerics = [1, 1.0, '1', 2, 2.0, '2', 3, 3.0, '3'];

 return [
   blnAbstractEquality,
   blnStrictEquality,
   lstNumerics.indexOf(3.0),
   lstNumerics.indexOf('3')
 ]

})();</lang>

Returns:

<lang JavaScript>[

 true,
 false,
 6,
 8

]</lang>

Strict Equality does not, however, return true for two instances of objects which match identically in terms of their keys and values. This means that Array.indexOf() will alway return a -1 value (not found) in searches for objects other than simple strings and numbers. For strings, Strict Equality is case-sensitive.

To find more complex objects in a JS ES5 array, or search more flexibly, we can define find(), and findIndex(), which take predicate functions (defining the match required) as arguments:

The following code:

<lang JavaScript>(function (fnNameMatch, fnPopulationMatch) {

   function find(fnPredicate, list) {
     for (var i = 0, lng = list.length; i < lng; i++) {
       if (fnPredicate(list[i])) {
         return list[i];
       }
     }
     return undefined;
   };

   function findIndex(fnPredicate, list) {
     for (var i = 0, lng = list.length; i < lng; i++) {
       if (fnPredicate(list[i])) {
         return i;
       }
     }
     return undefined;
   };

   var lstCities = [{
     "name": "Lagos",
     "population": 21
   }, {
     "name": "Cairo",
     "population": 15.2
   }, {
     "name": "Kinshasa-Brazzaville",
     "population": 11.3
   }, {
     "name": "Greater Johannesburg",
     "population": 7.55
   }, {
     "name": "Mogadishu",
     "population": 5.85
   }, {
     "name": "Khartoum-Omdurman",
     "population": 4.98
   }, {
     "name": "Dar Es Salaam",
     "population": 4.7
   }, {
     "name": "Alexandria",
     "population": 4.58
   }, {
     "name": "Abidjan",
     "population": 4.4
   }, {
     "name": "Casablanca",
     "population": 3.98
 }];

   return [
     lstCities.indexOf({
       "name": "Alexandria",
       "population": 4.58
     }),

     find(fnNameMatch, lstCities),
     findIndex(fnNameMatch, lstCities),

     find(fnPopulationMatch, lstCities),
     findIndex(fnPopulationMatch, lstCities)
   ]

 })(
   function (e) {
     return e.name && e.name.toLowerCase() === 'dar es salaam';

   },
   function (e) {
     return e.population && e.population < 5.0;
   }
 );

})();</lang>

returns: <lang JavaScript>[

 -1, // the Alexandria object can not be found with Array.indexOf()
 {
   "name": "Dar Es Salaam",
   "population": 4.7
 },
 6,
 {
   "name": "Khartoum-Omdurman",
   "population": 4.98
 },
 5

]</lang>

ES6

In ES6, Array.find() and Array.findIndex() are provided as built-in methods.

Perl

<lang perl>if(grep $_ eq $needle, @haystack) { print 'Found'; }

print grep($_ eq 'needle', ('apple', 'orange')) ? 'Found' : 'Not found';</lang>

Perl 6

There are several search operations that may be used. It mostly depends on whether you want to find actual values or pointers, and/or all possible values or a single value matching your criteria. The most appropriate for the given test data/operations are shown here.

<lang perl6>use JSON::Tiny;

my $cities = from-json(' [{"name":"Lagos", "population":21}, {"name":"Cairo", "population":15.2}, {"name":"Kinshasa-Brazzaville", "population":11.3}, {"name":"Greater Johannesburg", "population":7.55}, {"name":"Mogadishu", "population":5.85}, {"name":"Khartoum-Omdurman", "population":4.98}, {"name":"Dar Es Salaam", "population":4.7}, {"name":"Alexandria", "population":4.58}, {"name":"Abidjan", "population":4.4}, {"name":"Casablanca", "population":3.98}] ');

Find the indicies of the cities named 'Dar Es Salaam'.

say grep-index( *<name> eq 'Dar Es Salaam', @$cities ); # (6)

Find the name of the first city with a population less
than 5 when sorted by population, largest to smallest.

say ($cities.sort( -*.<population> ).first: *.<population> < 5)<name>; # Khartoum-Omdurman

Find all of the city names that contain an 'm'

say join ', ', sort grep( {$_<name>.lc ~~ /'m'/}, @$cities )»<name>; # Dar Es Salaam, Khartoum-Omdurman, Mogadishu</lang>

PHP

<lang PHP>echo in_array('needle', ['hay', 'stack']) ? 'Found' : 'Not found';</lang>

Racket

The more idiomatic functions for the task is findf but it doesn't provide the position of the element in the list, so we write a variant. If the item is not found we return #f as most of the Racket primitives do in these cases. <lang Racket>#lang racket

(define (findf/pos proc lst)

 (let loop ([lst lst] [pos 0])
   (cond
     [(null? lst) #f]
     [(proc (car lst)) pos]
     [else (loop (cdr lst) (add1 pos))])))</lang>

Now we define the list that has the data for the task. <lang Racket>(define data '(("Lagos" 21)

              ("Cairo" 15.2)
              ("Kinshasa-Brazzaville" 11.3)
              ("Greater Johannesburg" 7.55)
              ("Mogadishu" 5.85)
              ("Khartoum-Omdurman" 4.98)
              ("Dar Es Salaam" 4.7)
              ("Alexandria" 4.58)
              ("Abidjan" 4.4)
              ("Casablanca" 3.98)))</lang>

We write tiny wrappers to solve the specific task. <lang Racket>(define (city-pos name)

 (findf/pos (lambda (x) (string=? (car x) name)) data))

(city-pos "Dar Es Salaam") (city-pos "Buenos Aires")

(define (city-smaller pop)

 (let ([city (findf (lambda (x) (< (cadr x) pop)) data)])
   (and city (car city))))

(city-smaller 5) (city-smaller -1)</lang>

Output:

6
#f
"Khartoum-Omdurman"
#f

REXX

It is more idiomatic in REXX to use sparse arrays to express a list of CSV values which have embedded blanks in them.

Most REXX interpreters use hashing to index spare arrays, which is much faster than performing a sequential search. <lang rexx>/*REXX pgm (using criteria) finds values or indices in a list (of 2 datatypes)*/ $= 'Lagos=21, Cairo=15.2, Kinshasa-Brazzaville=11.3,' ,

  'Greater Johannesburg=7.55,    Mogadishu=5.85,     Khartoum-Omdurman=4.98,' ,
  'Dar Es Salaam=4.7,    Alexandria=4.58,    Abidjan=4.4,     Casablanca=3.98'

@.='(not found)' /*default value for "not found" cities.*/ w=8 /*W: used for formatting city's name. */

    do #=0  while $\=               /* [↓]  extract all cities&populations.*/
    parse var  $  c  '='  p  ','  $   /*destructive parse of the  $  string. */
    c=space(c);   w=max(w, length(c)) /*remove superfluous spaces in table.  */
    town.#=c; pop.#=p; pop.c=#; @.c=# /*assign city&pop──arrays; & city──►idx*/
    end   /*#*/                       /* [↑]   TOWN. array starts at 0 index.*/

_='═' city= 'Dar Es Salaam' /*show (zero─based) index of this city.*/ many= 5 /*show 1st city whose population is <5.*/ /*▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ task 1: show the ordered city list.*/ say center('index',9,_) center('city',w,_) center('population (millions)',27,_)

    do j=0  for #                                       /*get pertinent info.*/
    say center(j,9) center(town.j,w) center(pop.j,27)   /*show index,city,pop*/
    end   /*j*/

say /*▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ task 2: show the INDEX of a city.*/ say 'The city of ' city " has an index of: " findIndex(city) /*▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ task 3: show 1st city whose pop<5 m*/ say

    do k=0  for #;       p=getPop(k)  /*get a city's population from an index*/
    if \lessPop(p,many)  then iterate /*does the city fail predicate test?   */
    say 'The city of ' town.k " has a population less than "  many   'million.'
    leave                             /*only show the first city of pop < 5 m*/
    end   /*k*/

if k># then say 'no city found with a population of less than ' many "million." exit /*stick a fork in it, we're all done. */ /*────────────────────────────────────────────────────────────────────────────*/ findIndex: parse arg _; return @._ /*returns the INDEX of a city or null. */ getPop: parse arg _; return pop._ /*returns the pop of an index of city. */ lessPop: return arg(1) < arg(2) /*predicate function tests if pop < 5 m*/</lang> output when using the default inputs:

══index══ ════════city════════ ═══population (millions)═══
    0            Lagos                     21
    1            Cairo                    15.2
    2     Kinshasa-Brazzaville            11.3
    3     Greater Johannesburg            7.55
    4          Mogadishu                  5.85
    5      Khartoum-Omdurman              4.98
    6        Dar Es Salaam                 4.7
    7          Alexandria                 4.58
    8           Abidjan                    4.4
    9          Casablanca                 3.98

The city of  Dar Es Salaam  has an index of:  6

The city of  Khartoum-Omdurman  has a population less than  5 million.

Tcl

Tcl's lsearch command takes many useful options. This task illustrates the -index and -bisect options.

<lang Tcl>set cities {

   {"Lagos"    21}
   {"Cairo"    15.2}
   {"Kinshasa Brazzaville"     11.3}
   {"Greater Johannesburg"     7.55}
   {"Mogadishu"        5.85}
   {"Khartoum Omdurman"        4.98}
   {"Dar Es Salaam"    4.7}
   {"Alexandria"       4.58}
   {"Abidjan"  4.4}
   {"Casablanca"       3.98}

} puts "Dar Es Salaam is at position: [

   lsearch -index 0 $cities "Dar Es Salaam"

]."

set cities_by_size [

   lsort -index 1 -real $cities

]

puts "The largest city of < 5m is: [

   lsearch -inline -index 1 -bisect -real $cities_by_size 5.0

]"</lang>

Output:

Dar Es Salaam is at position: 6.
The largest city of < 5m is: "Khartoum Omdurman"        4.98

zkl

<lang zkl>list:=T(SD("name","Lagos", "population",21), SD("name","Cairo", "population",15.2), SD("name","Kinshasa-Brazzaville", "population",11.3), SD("name","Greater Johannesburg", "population",7.55), SD("name","Mogadishu", "population",5.85), SD("name","Khartoum-Omdurman", "population",4.98), SD("name","Dar Es Salaam", "population",4.7), SD("name","Alexandria", "population",4.58), SD("name","Abidjan", "population",4.4), SD("name","Casablanca", "population",3.98));

n:=list.filter1n(fcn(city){ city["name"]=="Dar Es Salaam" });

  // or city["name"].matches("dar es salaam") for case insensitive wild card match

n.println(" == index of ",list[n].values);

city:=list.filter1(fcn(city){ city["population"]<5 }); city.values.println(" is the first city with population under 5 million."); list.filter(fcn(city){ city["population"]<5.0 }).apply("get","name").println(); list.filterNs(fcn(city){ city["population"]<5.0 }).println();

list.filter1n(fcn(city){ city["name"]=="alexandra" }).println();</lang> where a SD is a small read only dictionary and filter1 is a filter that stops at the first match (returning the matched item), filter1N is the same but returns the index. Both filter methods return False on failure.

Note that the dictionarys contain either int or float values and the filter could use either int or float (in this case, the 5 is converted to the type in the dictionary).

The YAJL library could be used to parse the JSON data directly.

Output:

6 == index of L("Dar Es Salaam",4.7)
L("Khartoum-Omdurman",4.98) is the first city with population under 5 million.
L("Khartoum-Omdurman","Dar Es Salaam","Alexandria","Abidjan","Casablanca")
L(5,6,7,8,9)
False