How and Why has Empiricist Approach has influenced the developments of NLP?

The biggest challenge in Natural Language Processing is how to make computers do language-based –tasks. Therefore, the machine-learning approach, which includes empiricism and data-oriented parsing, has greatly influenced the development of Natural Language Processing (NLP).

In my attempt to answer “why” did ‘empiricism’ and ‘data-oriented parsing’ influenced Natural Language Processing (NLP), I would say – Since the inception of Artificial Intelligence, it’s primary goal is to design computational methods which would better perform the task for Natural Language Processing. But, developing these systems requires a great deal of knowledge specific to engineering. While at the same time, recent years have seen a shift in NLP from rationalist methods based on hand-coded rules to another model, which considers empirical and data-based parsing. This empirical and data-based parsing is much more data-driven and partially automated using advanced statistical or machine learning techniques. This is why Empiricism and data-oriented parsing are much favored Natural Language Processing (NLP) techniques.

The question arises “how” do empiricism and data-oriented parsing performs such tasks that result in the feasibility of learning linguistic knowledge automatically from large text-based corpora. Most of the recent work in empirical NLP has involved statistical training techniques for probabilistic speech recognition models, which can analyze the accuracy of generalization using the theory of probabilities. These probabilistic methods and other such data-driven approaches spread from speech into part-of-speech tagging, parsing and attachment ambiguities, and semantics. This empirical direction was also accompanied by a new focus on model evaluation, based on using held-out data, developing quantitative metrics for evaluation, and emphasizing performance on these metrics with previously published research.

For data-oriented parsing, from the Syntactic analysis view, parsing by lexicalizing the grammar, meaningful statistics can be obtained. For example –in the phrase “The airplane flew,” The idea is that all constituents in the phrase contain information about the words in the sentence. In the non-lexicalized parses, S(Sentence) node is expanded into an NP (Noun Phrase) and VP (Verb Phrase). On the other hand, in a lexicalized parse, the S (Sentence) node expands into a “The airplane” NP node and “flew” VP (Verb Phrase) node.

In semantic parsing, however- Word-sense disambiguation becomes a challenge for machines. For example, ‘pen’ refers to a ‘writing instrument’ or ‘an enclosure in a particular sentence’ such as “John wrote the letter with a pen” or “John saw the pig in the pen” another example would include “silver” where one may refer to the noun as in “Silver is expensive these days,” or as a verb “He silvered the coins” or as an adjective as in “He wore a silver ring” and so on. Semantic parsing also identifies semantic roles of the entities referred to in a sentence, such as agent and instrument. And discourse analysis involves determining how a larger inter-sentential context influences the interpretation of a sentence. Hence, the two models, “empiricism” and “data-oriented parsing,” have contributed immenselyNLP to the develop NLP.