31.10.08
Words about Language
I've had a pretty productive week with regards to my NLP project; I added an implementation of backward chaining inference, so now I can start playing with interpreting context and unifying concepts. With that built, I'm more or less back up to where I was with my first system, only with a code base that still has room to grow. I can ask simple questions, and so long as the knowledge base is in the right state, get yes or I-don't-know answers. Unfortunately, my approach right now is so general that if I mention two facts about a specific thing, say, emacs, the system won't make the assumption that I'm not talking about two different things, both called emacs.
Which brings me to my next target, context and pronouns. Pronouns are obviously a problem for computers; if I say "Bob threw a brick at the window, and it shattered," what shattered? It's easy for us to say that the window shattered, but we get that from our knowledge of the world, not from any particular structure in the sentence. I could also say "Bob threw a vase at the wall, and it shattered," and this time, with the same structure, "it" now refers to the object being thrown, the vase.
We know this from one of two ways, and I'm not sure which. We either know that a vase or window can shatter, whereas a brick or wall is not likely to, or we visualize the event and our internal model predicts which object shatters. Neither is particularly easy for a computer; the latter has obvious computational problems, the former requires a search through potentially large amounts of data.
For my first crack at it, I'll be keeping a list of 'things' that have been in context, attempting to unify objects referred to by pronouns until I don't end up with a conflict. I'm planning on limiting the domain of 'things' that can be discussed to something small, like the nature of files, programs, and directories, in what I'm sure will be a vain attempt to limit the size of the knowledge base that this approach will require.
In other news, my posts on the comprehension module are still coming, but writing them keeps taking longer than I expect. Just when I get into the flow of writing, I notice something I'm doing wrong and have to go fix it.
I've read two papers in the past week. Modeling Semantic Containment and Exclusion in Natural Language Inference, MacCartney and Manning covers using natural logic for modeling the semantics of language, and the successes and problems they've found there. Recognizing Textual Entailment Via Atomic Propositions, Akhmatova and Molla describes an approach using traditional logic (like you saw in my Haskell-laden post).
Which brings me to my next target, context and pronouns. Pronouns are obviously a problem for computers; if I say "Bob threw a brick at the window, and it shattered," what shattered? It's easy for us to say that the window shattered, but we get that from our knowledge of the world, not from any particular structure in the sentence. I could also say "Bob threw a vase at the wall, and it shattered," and this time, with the same structure, "it" now refers to the object being thrown, the vase.
We know this from one of two ways, and I'm not sure which. We either know that a vase or window can shatter, whereas a brick or wall is not likely to, or we visualize the event and our internal model predicts which object shatters. Neither is particularly easy for a computer; the latter has obvious computational problems, the former requires a search through potentially large amounts of data.
For my first crack at it, I'll be keeping a list of 'things' that have been in context, attempting to unify objects referred to by pronouns until I don't end up with a conflict. I'm planning on limiting the domain of 'things' that can be discussed to something small, like the nature of files, programs, and directories, in what I'm sure will be a vain attempt to limit the size of the knowledge base that this approach will require.
In other news, my posts on the comprehension module are still coming, but writing them keeps taking longer than I expect. Just when I get into the flow of writing, I notice something I'm doing wrong and have to go fix it.
I've read two papers in the past week. Modeling Semantic Containment and Exclusion in Natural Language Inference, MacCartney and Manning covers using natural logic for modeling the semantics of language, and the successes and problems they've found there. Recognizing Textual Entailment Via Atomic Propositions, Akhmatova and Molla describes an approach using traditional logic (like you saw in my Haskell-laden post).