How we could “crowd”-source lexical data

WORDS can be maintained in conjunction with a web-based system for correcting and collecting lexical data; given an appropriate vetting system, a subset of the users of this system can be accredited as reliable Latin scholars, and their contributions adopted as part of the official dataset.

The web-based version of WORDS currently (August 2015) supplies a verbatim transcript of the output of the programme. It is proposed that this be augmented with the following facilities:

Data model

By “lexical item” we mean the collection of forms of a word, e.g., the lexical item mensa comprises the various forms mensam, mensae etc.

By “citation” we mean the tuple of a passage of text (thirty or forty words long) containing the lexical item, and a well-formed unique identifier for the passage.

There shall be a many-to-many relation between lexical items and citations.


It may be necessary to avoid circularity by excluding cited material from the test cases used to develop the programme itself.