Inverted index: Difference between revisions
Content added Content deleted
Underscore (talk | contribs) m (moved Inverted Index to Inverted index: Conventions.) |
m (whitespace) |
||
Line 7: | Line 7: | ||
{{works with|AutoHotkey_L}} |
{{works with|AutoHotkey_L}} |
||
<lang AutoHotkey> |
<lang AutoHotkey>; http://www.autohotkey.com/forum/viewtopic.php?t=41479 |
||
; http://www.autohotkey.com/forum/viewtopic.php?t=41479 |
|||
inputbox, files, files, file pattern such as c:\files\*.txt |
inputbox, files, files, file pattern such as c:\files\*.txt |
||
Line 72: | Line 71: | ||
else |
else |
||
return word2docs[word2find] |
return word2docs[word2find] |
||
⚫ | |||
} |
|||
⚫ |
Revision as of 07:49, 21 April 2010
Inverted index
You are encouraged to solve this task according to the task description, using any language you may know.
You are encouraged to solve this task according to the task description, using any language you may know.
An Inverted Index is a data structure used to create full text search.
Given a set of text files, implement a program to create an inverted index. Also create a user interface to do a search using that inverted index which returns a list of files that contain the query term / terms. The search index can be in memory.
AutoHotkey
<lang AutoHotkey>; http://www.autohotkey.com/forum/viewtopic.php?t=41479 inputbox, files, files, file pattern such as c:\files\*.txt
word2docs := object() ; autohotkey_L is needed.
stime := A_tickcount Loop, %files%, 0,1 {
tooltip,%A_index% / 500 wordList := WordsIn(A_LoopFileFullPath) InvertedIndex(wordList, A_loopFileFullpath)
}
tooltip msgbox, % "total time " (A_tickcount-stime)/1000
gosub, search return
search: Loop {
InputBox, keyword , input single keyword only msgbox, % foundDocs := findword(keyword)
} return
WordsIn(docpath) {
FileRead, content, %docpath% spos = 1 Loop { if !(spos := Regexmatch(content, "[a-zA-Z]{2,}",match, spos)) break spos += strlen(match) this_wordList .= match "`n" } Sort, this_wordList, U return this_wordList
}
InvertedIndex(byref words, docpath) {
global word2docs
loop, parse, words, `n,`r { if A_loopField = continue word2docs[A_loopField] := word2docs[A_loopField] docpath "`n" }
}
findWord(word2find) {
global word2docs
if (word2docs[word2find] = "") return "" else return word2docs[word2find]
}</lang>