Inverted index: Difference between revisions

From Rosetta Code
Content added Content deleted
m (moved Inverted Index to Inverted index: Conventions.)
m (whitespace)
Line 7: Line 7:
{{works with|AutoHotkey_L}}
{{works with|AutoHotkey_L}}


<lang AutoHotkey>
<lang AutoHotkey>; http://www.autohotkey.com/forum/viewtopic.php?t=41479
; http://www.autohotkey.com/forum/viewtopic.php?t=41479
inputbox, files, files, file pattern such as c:\files\*.txt
inputbox, files, files, file pattern such as c:\files\*.txt


Line 72: Line 71:
else
else
return word2docs[word2find]
return word2docs[word2find]
}</lang>
}
</lang>

Revision as of 07:49, 21 April 2010

Task
Inverted index
You are encouraged to solve this task according to the task description, using any language you may know.

An Inverted Index is a data structure used to create full text search.

Given a set of text files, implement a program to create an inverted index. Also create a user interface to do a search using that inverted index which returns a list of files that contain the query term / terms. The search index can be in memory.

AutoHotkey

Works with: AutoHotkey_L

<lang AutoHotkey>; http://www.autohotkey.com/forum/viewtopic.php?t=41479 inputbox, files, files, file pattern such as c:\files\*.txt

word2docs := object() ; autohotkey_L is needed.

stime := A_tickcount Loop, %files%, 0,1 {

  tooltip,%A_index%  / 500  
  
  wordList := WordsIn(A_LoopFileFullPath)
  InvertedIndex(wordList, A_loopFileFullpath)   

}

tooltip msgbox, % "total time " (A_tickcount-stime)/1000

gosub, search return

search: Loop {

  InputBox, keyword , input single keyword only
  msgbox, % foundDocs := findword(keyword)

} return

WordsIn(docpath) {

  FileRead, content, %docpath%
 spos = 1
  Loop
  {
    if !(spos := Regexmatch(content, "[a-zA-Z]{2,}",match, spos))
      break
    spos += strlen(match)
    this_wordList .= match "`n"
  }
 
 Sort, this_wordList, U  
 return this_wordList   

}

InvertedIndex(byref words, docpath) {

  global word2docs
 loop, parse, words, `n,`r 
 {                          
   if A_loopField =
     continue
   word2docs[A_loopField] := word2docs[A_loopField] docpath "`n"
 }

}

findWord(word2find) {

 global word2docs
 if (word2docs[word2find] = "")
    return ""
 else
   return word2docs[word2find]

}</lang>