FOSDEM 2015
/
Schedule
/
Events
/
Developer rooms
/
Open source search
/
Effective spelling correction with term relation graphs using Lucene FSTs

Effective spelling correction with term relation graphs using Lucene FSTs

Track: Open source search devroom
Room: UA2.114 (Baudoux)
Day: Saturday
Start: 11:25
End: 12:10

Our approach for doing spelling correction has deviated considerably from the default approach Lucene is offering. While Lucene's FuzzyQuery uses a compiled Automaton to filter the dictionary rather efficiently, we directly use normal Automatons and intersect that automaton with a separate FST. This proves to be more efficient for our use case, since it saves memory and time to compile the automaton and also part of the time to identify the matching terms in the dictionary.

This approach gives us the possibility to store any meta-information with each term, which is used then to pick the top N spellcorrections. We build a term co-occurence graph, where each vertex is a possible spelling correction of a term and each link is the co-occurence in the same document. Each vertex and link get a score based on the meta-information and edit distance. Then we use graph reduction techniques until the graph contains the desired number of spellcorrections.

Speakers

Anna Ohanyan

Attachments

Effective Spelling Correction using Lucene FSTs (slides)

FOSDEM15

Brussels / 31 January & 1 February 2015

Effective spelling correction with term relation graphs using Lucene FSTs

Speakers

Attachments

Links

FOSDEM

This year

Practical information

Media and press