The last several LanguidSlog posts have looked microscopically at sentences. This time we widen the focus to show how use of NG should relate to other endeavours. One conclusion is that up to now theoretical linguistics has failed to serve any purpose beyond supporting the careers of its practitioners. It should instead be enabling the potential of computational linguistics to be fully realised.
I hope the NG approach is becoming broadly clear. Of course there’s much more analysis to do. Errors are bound to occur: please point them out. And it’s possible that there’ll be a crisis requiring rework to previous analysis. But the solution will never imply addressable memory!
Groves of academe
The title of this post is the sort of question that irks academics. Whatever the philosophical rights and wrongs, the politics behind university funding means that effective answers are needed.
Studying an innate human ability surely has some value. But what if the sceptic asks: When will you know everything about the subject? Haven’t you even got to where ‘diminishing returns’ kicks in? What subject-specific skills help bachelors/masters students with their subsequent, real-world careers?
Linguistics doesn’t provide convincing answers and could be vulnerable. What it should do is focus its critics’ attention on the undeniable benefits of applying linguistics – principally for clinical and IT purposes. Knowing nothing about the former, I’ll now concentrate on …
Computers had already arrived and attempts to use them on language began at about the same time as the Chomsky era. There were several potentially useful types of task. Generating natural language from structured data was comparatively easy. (In the 1980s I built a system that generated case-specific legal documents such as writs.) More difficult were speech recognition and synthesis, and translation. In all of these, to deal with sentences correctly the software needs to parse them, i.e. identify the role of each part.
Early work by the Chomskyans suggested that a grammar could be defined with a manageably-sized set of rules. This proved to be an illusion: rule-sets grew even faster than computing power. Computational linguists were disillusioned. When computing power had grown sufficiently to support brute force, they preferred statistics over rule-based methods. And the results are staggering: the charming voice on your car’s sat-nav can even pronounce that oddly-named street.
The aim of NG
Good as that may be, there is still room for improvement. For example, while Facebook translation of Christoph’s native-language post is pretty good, ChenHao’s is poor. Computational linguists are therefore looking again at possibilities for rule-based parsing. But which rule-set to use?
LS2 to LS6 showed that existing theories are only descriptive. (No one has challenged me on that.) Of course, any theory – even NG – must be empirically based and so there are potential issues about completeness. But the problem with other grammars is that they are derived from the behaviour of individual words – which are highly ambiguous. Therefore the rule-set must be complicated as well as very large. There’s no reason to expect such a grammar to remedy the problems still in statistical parsing.
LanguidSlog is working towards an explanatory theory based on junctions between pairs of lexical items. (Has anyone looked at the ambiguity of word-pairs, each with a specific syntactic relation? I don’t know but would bet a lot of money that, compared with individual words, a much smaller proportion are ambiguous.) Sure, building an NG-based parser requires the words and rules to be captured. That’s a big task. But it is achievable, given a very large corpus (everything in Wikipedia, say) and lots of manual effort from the start until enough words and rules have been captured for more to be captured algorithmically – a sort of snowball effect.
Linguistics without explanation is pointless as a theoretical subject and it’s been ineffective when applied to computing.
This harsh judgement on linguistics generally is unavoidable. But all the individuals I’ve met seem intelligent and well-intentioned. Their problem may be group-think. A professional drawing conclusions like those in LanguidSlog would need to be brave to promote them. Mishandling that could be severely career-limiting.
No such danger for me, but the Charmed Circle can ignore the ideas. After all I’m not qualified like them.
The database supporting the NG parser will have the potential to generate ‘computable meaning’ from natural language, and vice versa. Whoever works that out – and patents it – could become very rich. So how it might be done is something you’ll want to think about. I’ll not be giving an answer next week – but perhaps when everything else has been covered in LanguidSlog.