#include "lexicon.h"

class Lexicon

This class is used to represent a lexicon, or word list. The main difference between a lexicon and a dictionary is that a lexicon does not provide any mechanism for storing definitions; the lexicon contains only words, with no associated information. It is therefore similar to a set of strings, but with a different internal representation. The Lexicon class supports efficient lookup operations for words and prefixes.

As an example of the use of the Lexicon class, the following program lists all the two-letter words in the lexicon stored in EnglishWords.txt:

   int main() {
      Lexicon english("EnglishWords.txt");
      for (string word : english) {
         if (word.length() == 2) {
            cout << word << endl;
         }
      }
      return 0;
   }

This version of Lexicon is implemented internally using a data structure called a prefix tree or trie (pronounced "try"), which allows for efficient prefix and word searching. Some runtimes below are expressed in terms of "W" where W represents the length of the word being added.

Constructor
Lexicon() O(1) Initializes a new empty lexicon.
Lexicon(filename) 
Lexicon(istream) 
O(WN) Initializes a new lexicon that reads words from the given file or input stream.
Methods
add(word)  O(W) Adds the specified word to the lexicon.
addWordsFromFile(filename) 
addWordsFromFile(istream)
O(WN) Reads the given file/stream and adds all of its words to the lexicon.
clear()  O(N) Removes all words from the lexicon.
contains(word)  O(W) Returns true if word is contained in the lexicon.
containsPrefix(prefix)  O(W) Returns true if any words in the lexicon begin with prefix.
equals(lex)  O(WN) Returns true if the two lexicons contain the same words.
isEmpty()  O(1) Returns true if the lexicon contains no words.
mapAll(fn)  O(N) Calls the specified function on each word in the lexicon.
remove(word)  O(W) Removes the specified word from the lexicon.
removePrefix(prefix)  O(W) Removes all words that begin with the specified prefix from the lexicon.
size()  O(1) Returns the number of words contained in the lexicon.
toString()  O(1) Converts the lexicon to a printable string representation.
Operators
lex1 == lex1  O(WN) Returns true if lex1 and lex2 contain the same elements.
lex1 != lex2  O(WN) Returns true if lex1 and lex2 are different.
ostream << lex O(N) Outputs the contents of the lexicon to the given output stream.
istream >> lex O(WN) Reads the contents of the given input stream into the lexicon.

Constructor detail


Lexicon();
Lexicon(istream& input);
Lexicon(string filename);
Initializes a new lexicon. The default constructor creates an empty lexicon. The second form reads in the contents of the lexicon from the specified data file or stream. The data file must be in one of two formats: (1) a space-efficient precompiled binary format or (2) a text file containing one word per line. The Stanford library distribution includes a binary lexicon file named EnglishWords.dat containing a list of words in English. The standard code pattern to initialize that lexicon looks like this:
Lexicon english("EnglishWords.dat");

Usage:

Lexicon lex;
Lexicon lex(filename);

Method detail


void add(string word);
Adds the specified word to the lexicon.

Usage:

lex.add(word);

void addWordsFromFile(istream& input);
void addWordsFromFile(string filename);
Reads the given file/stream and adds all of its words to the lexicon.

Usage:

lex.addWordsFromFile(filename);

void clear();
Removes all words from the lexicon.

Usage:

lex.clear();

bool contains(string word) const;
Returns true if word is contained in the lexicon. In the Lexicon class, the case of letters is ignored, so "Zoo" is the same as "ZOO" or "zoo".

Usage:

if (lex.contains(word)) ...

bool containsPrefix(string prefix) const;
Returns true if any words in the lexicon begin with prefix. Like containsWord, this method ignores the case of letters so that "MO" is a prefix of "monkey" or "Monday".

Usage:

if (lex.containsPrefix(prefix)) ...

bool equals(const Lexicon& lex) const;
Returns true if the two lexicons contain exactly the same set of words. Identical in behavior to the == operator.

Usage:

if (lex.equals(lex2)) ...

bool isEmpty() const;
Returns true if the lexicon contains no words.

Usage:

if (lex.isEmpty()) ...

void mapAll(void (*fn)(string)) const;
void mapAll(void (*fn)(const string&)) const;
void mapAll(FunctorType fn) const;
Calls the specified function on each word in the lexicon.

Usage:

lexicon.mapAll(fn);

bool remove(string word);
Removes the specified word from the lexicon, if it was present. Returns true if the word was previously contained in the lexicon; in other words, if a word was removed. The empty string cannot be contained in a lexicon, so passing the empty string to this method returns false.

Usage:

lex.remove(word);

bool removePrefix(string prefix);
Removes all words from the lexicon that begin with the given prefix. Returns true if the prefix was previously contained in the lexicon; in other words, if any words were removed. If the empty string is passed, since all words begin with the empty string, all words will be removed and this method will return true if the lexicon was non-empty prior to the call.

Usage:

lex.removePrefix(prefix);

int size() const;
Returns the number of words contained in the lexicon.

Usage:

int n = lex.size();

string toString() const;
Converts the map to a printable string representation, such as "{word1, word2, word3}".

Usage:

string str = lexicon.toString();