Hack Jam Log Book: NLP: Extracting Meaning from Noun Phrases

3.11.08

NLP: Extracting Meaning from Noun Phrases

Today we'll look at one way to build up a representation of noun phrases as logical statements (see the first post for an introduction to the problem space we're working in). Once again, I'll be using Haskell code derived from my current project to demonstrate exactly what I mean. One thing that has become more and more apparent to me as I progress is that this kind of representation of the meaning of sentences is the wrong way to go. As I describe the structure I've been using, we'll take a look at some concepts that are difficult to represent in ways that don't depend on the english meaning of predicates themselves.


> module HackJamNLP2 where

First, we need to revisit and expand our grammar somewhat. This is still a simplified grammar relative to English taken as a whole.


> data NounPhrase     = ANoun Word
>                     | APronoun Word
>                     | Name Word
>                     | AdjectivesNoun [Word] Word
>                     | NPRelClause NounPhrase RelativeClause

> data RelativeClause = RelativeClause Word VerbPhrase

A quick grammar refresher. Relative clauses are a kind of extra, descriptive, verb phrase. In the sentence, "He who smelt it, dealt it," the relative clause "who smelt it" modifies he.


> data PartOfSpeech = Noun | Verb | Adjective | Article | Pronoun
>                   | Adverb | QueryWord | Unknown
>                     deriving Eq

We're going to classify the words who what where why when and how as query words. In this naive system, the presense of a query word is going to set off the whole sentence as a question. So this system will flag "That's not how you do it," as a query. This is not a fatal flaw, but requires a bit of analysis to get around.

Now let's update our logical notation.


> data LogicSentence = Atom ID [Term]
>                    | And LogicSentence LogicSentence
>                    | Implies LogicSentence LogicSentence
>                    | ForAll Term LogicSentence
>                    | Exists Term LogicSentence
>                    | MemberOf Term Term
>                      deriving Show
>
> data Term = Variable ID
>           | Set [Term]
>           | HigherOrder LogicSentence
>             deriving Show

From the definitions we used last time, we're adding some elements of higher order logic: sets and higher order terms. (Sets were in the last post but not used.) A number of concepts are much more natural to express in sets, and higher order terms are used sparingly to store meta information to pass on to a higher module, rather than passing around an extra structure to store intent, etc.

One more thing before we get into the meat of this post. Let's define a couple shorthand notations for patterns that will show up a couple of times.


> x                   = Variable "x"
> inContext           = Atom "in-context" [x]

> aSetWhere sent      = let s = Variable "s" in
>                       Exists s (ForAll x ((x `MemberOf` s) `Implies` sent))

aSetWhere defines a set where all it's members satisfy a given sentence.

Throughout convertNounPhrase, which we are about to (re)define, the variable "x" will represent the current noun phrase. We can avoid clashes by being carefull to substitute variables whenever we add two noun phrases together. We'll do that in combineNVSents, another function from last time that we'll expand in just a bit, and we'll use that function in a general way anytime we're combining sentences.

The first important change is that our convertNounPhrase will now return a list of LogicSentences, each representing a /possible/ interpretation of the given grammar parse.


> convertNounPhrase     :: NounPhrase -> [LogicSentence]
> convertNounPhrase (Name name)
>                       = [Exists x (Atom ("called-" ++ wordString name) [x])]

The conversion of Names is unchanged from last time, except for wrapping the result inside a list. Something I didn't point out last time was the awkwardness here in mapping, say, "Bob" to "Ex called-Bob(x)". What this does is put information inside the name of the object /inside/ the name of the predicate. Unfortunately, there's no where else to put it. My first system used a Const type of Term, but that doesn't scale to any system where two different objects have the same name.


> convertNounPhrase (ANoun n)
>       | partOfSpeech n == QueryWord
>                       = [Atom "speaker-asking"
>                               [HigherOrder (ForAll x baseSentence)]]
>       | isPlural n    = [ForAll x baseSentence, aSetWhere baseSentence]
>       | otherwise     = [Exists x baseSentence,
>                          ForAll x baseSentence,
>                          aSetWhere baseSentence]
>   where baseSentence  = Atom (wordString n) [x]

Here's the big reason why Higher order terms were added; it's much more elegant to state the meaning of a question to be "speaker-asking(q)" where q is a sentence describing what the question should answer, Jeopardy style, than to avoid higher oder terms and pass another parameter around all the functions.

The case of a plural noun brings up our first ambiguous meaning. So what does a word like "potatoes" refer to? All potates, or some? Compare the sentences "Potatoes come from the ground," and "Bob had potatoes for breakfast." It's impossible to tell the meaning of "potatoes" without context, so all we can do here is shrug and let something downstream try to disambiguate the meanings.

The last case is a bit odd. I haven't been able to come up with any sentence where it makes sense to use a sigular noun without an article outside of gems like "I can has cheeseburger?" Hopefully our parser throws out parses with this form when more correct parses are available, but if we are asked to convert the NounPhrase, we should do our best. Unfortunately in this case, our best is to throw all the possibilities out there.


> convertNounPhrase (APronoun n)
>                       = case wordString n of
>                               "I"       -> [Exists x (Atom "speaker" [x])]
>                               "you"     -> [Exists x (Atom "audience" [x])]
>                               otherwise -> [Exists x inContext]

Pronouns are the same as last time, just wrapped in lists. I'm still truncating this for brevity, but you can see here a good case for adding the meaning of at least some words into the Word type itself. We could define a "meaningOf" field for the Word type and use that instead of all these Atom clauses. This also allows an agent to modify the results it gets from this module by modifying its dictionary, rather than having to rewrite meanings built in here.

And if we extend our definition of words to include the occasional phrase like "on fire", we can easily have the meanings of "on fire" and "aflame" be the same without any special cases.


> convertNounPhrase (AdjectivesNoun (adj:adjs) noun)
>       | isThe adj && isPlural noun
>                       = [ForAll x (inContext `And` (baseSentence adjs noun)),
>                          aSetWhere (inContext `And` (baseSentence adjs noun))]
>       | isA adj && isPlural noun
>                       = []
>       | isPlural noun = [ForAll x (baseSentence (adj:adjs) noun),
>                          aSetWhere (baseSentence (adj:adjs) noun)]
>       | isThe adj     = [Exists x (inContext `And` (baseSentence adjs noun))]
>       | isA adj       = [ForAll x (baseSentence adjs noun),
>                          Exists x (baseSentence adjs noun)]
>       | otherwise     = [Exists x (baseSentence (adj:adjs) noun)]
>   where
>       isThe adj       = wordString adj == "the"
>       isA adj         = wordString adj == "a" || wordString adj == "an"
>       baseSentence [] n
>                       = Atom (wordString n) [x]
>       baseSentence (adj:adjs) n
>                       = Atom (wordString adj) [x] `And` (baseSentence adjs n)

I've broken an adjective phrase into six different cases based on the article, if any, it begins with and the plurality of the noun. You can see that we ignore phrases like "a potatoes." Above I argued that we should try to find a meaning for anything we get, but here we've got contradictory indicators about the plurality of the term rather than some grammar abuse.

One other part that was suprising to me was the seemingly contradictory interpretations of singular nouns preceeded by a or an. The first case is a sentence like "A book is written by an author," where the intent is to refer to all books through the idea of a book-ness. The second case is the intuitive one: "A book is misssing."

You can also see that adjectives are treated the same way as nouns: as atomic sentences. "A big ant" becomes "big(x) & ant(x)," which is problematic. Through And-Elimination, we get "big(x)"; this ant is no longer big for an ant, it is just /big/. We could introduce big-relative-to-ants(x), or possibly "Ey average-ant(y) & big-relative-to(x, y)," neither of which is particularly appealing. It gets worse however. From the Akhmatova paper I mentioned in my last progress posting: take the sentence "The gastronomic capital of France is Lyon." And-Elimination (and a proper interpretation of the rest of the sentence) will leave you with "The capital of France is Lyon." So adjectives are difficult to shoehorn in to logic like this, regardless of whether they are limiting or modifying the base noun.


> convertNounPhrase (NPRelClause np (RelativeClause word vp))
>                       = [combineNVSents nSent (vSent (subjectOf nSent))
>                                       | nSent <- convertNounPhrase np, >                                         vSent <- convertVerbPhrase vp]

Moving on, the conversion of relative clauses is fairly straightforward, funny notation notwithstanding. This funny looking structure is the one bit of Haskell I decided to sneak in here. It's a list comprehension, and it's best explained via a contrived example. Let's say you wanted a list of strings that consisted of all possible combinations of a set of prefixes and suffixes. In Haskell, you could say:


> contrived           = [ prefix ++ suffix | prefix <- ["un", "im", "non"], >                                            suffix <- ["bad", "good", "ugly"] ]

This creates a list of nine strings. Now compare that with definition for relative clauses above, and you can see that we are making all possible combinations of noun and verb phrases we can muster. (The definition of convertVerbPhrase from the last post returned only a single LogicSentence; here we're using one modified to return a list of Sentences.)

That's it for this post. There are, of course, more types of noun phrases, but the pattern is the same for them: determine the proper representation in your scheme and add a case to convertNounPhrase to handle it. Verb phrases are trickier, and I hope to get to them soon.

I'm including below the handful of defintions that were reused from last time, in case you want to refer to them (it also makes this post compile as well!). This includes the modified convertVerbPhrase and a new convertSentence that deals with convertNounPhrase and convertVerbPhrase returning lists. It should look familiar after the last example.

=====


> data VerbPhrase = AVerb Word

> data Word = Word {
>               wordString    :: String,
>               partOfSpeech  :: PartOfSpeech,
>               isPlural      :: Bool
>             }

>
> type ID = String

> convertVerbPhrase     :: VerbPhrase -> [Term -> LogicSentence]
> convertVerbPhrase (AVerb v) = [\subject -> Atom (wordString v) [subject]]

> subjectOf             :: LogicSentence -> Term
> subjectOf (ForAll t _) = t
> subjectOf (Exists t _) = t

> combineNVSents        :: LogicSentence -> LogicSentence -> LogicSentence
> combineNVSents (ForAll t npDescription) vpDescription
>                    = ForAll t (npDescription `Implies` vpDescription)
> combineNVSents (Exists t npDescription) vpDescription
>                    = Exists t (npDescription `And` vpDescription)

Labels: adam, haskell, nlp

posted by Adam Heck # 23:34

Comments:

Singular nouns without articles often happen with semi-ordinal nouns: "I spy land," and "I need water." (versus "I came to take the waters," and "I come from distant lands.")

I love the and-elimination examples. My brain will now do this on every adjectival phrase I hear. When I burst out giggling at inappropriate moments, I blame you.

# posted by

Sam Bleckley : November 4, 2008 at 10:52 AM