The Computations of Language

Monday, November 26, 2007

Response to the new Uriagereka article

#Full Interpretation and vacuous quantifiers: if Case can be vacuous, which Juan stated earlier but which I nonetheless strongly, strongly disagree with, why not operators that lack a variable?

# Aren't there cases were we find an operator after a variable in the speech stream? Sure, the LCA puts quantifiers first in English in the speech stream in subject quantification but not object quantification. So when the LCA linearizes something that gives us "I think I hate every chef", the quantifier's scope, which is one of its two variables, aside from the restrictor, comes before the quantifier in the speech stream. As I understood things, the parser having difficulties that was mentioned in the article refered to one that was parsing the speech stream, the PF, not the LF, which could look radically different and in which the quantifier would be covertly raised. Even though covert operations violate the Uniformity condition.......

#What is meant by a parser -- did Baader and Frazier do a mock up using a computer? Even if that's the case, because we admittedly (in this paper) know so little about brains, that mock up may be a very poor analogy. More to the point, if that's not the case, and we're talking about a psycholinguistics experiment, isn't this an example of the liberal borrowing of Computer Science buzzwords (parser, memory, computation) that we're trying to examine and adjudicate the correctness of?

#Why can't we define the element of 'last unit in a derivation'? In LaTeX, for example, if a reference comes before the the point where the antecedent abel was created, we simply compile twice in order to sort of the correct sequential numbering schema. Who's to say the mysterious brain doesn't do the same, engaging in one derivational cycle to get its bearings of length, then recompiling to set proper relations. But, oops, did I just use the sort of computational analogies to the mind that I've been rallying against? If so, I mean to say the two are simply related without supposing identical functionality--the brain might not 'recompile', but it can do whatever it does in language processing twice, as nebulous as that may seem.

#In discussing Probe/Goal relations, what happens with Verb final languages (final in the speech stream, even under an LCA linearizer)? Surely, the V-head whose phi-features need amending is encountered after the Noun Phrase Goal? This has to have been discussed somewhere by now, and if, not, then I'm sad for linguistics.

#Who really knows what proto-languages looked like? If we think it's something like "Me Tarzan, You Jane", aren't we watching too many movies? If we're thinking along those lines otherwise, is it because we're calling to mind infant speech? If the latter, do we explain the typology by invoking processing constraints on a young brain, and do we then hypothesize those same constraints on early humans? My beef: short of reanimating a caveman, we'll never know, as far as I can tell. Besides, assuming limited processing mechanisms is far from certain: just because it took us a long while to engage in the industrial revolution doesn't mean early people were simply slower. It was more likely a consequence of existing in hunter-gatherer societies that lacked the food base to support a diversified class of thinkers and scientists. Look at the native peoples of Australia or the Americas. They aren't stupid and their brains aren't different; the geography and climate details were simply stacked against them. Why should linguistic beings ages ago have been otherwise; i.e. if their brains are different, according to some metric of biological anthropology, then how do we know they had language at all, whether or not it's 'simpler'?

#Maybe I'm missing something, but isn't an MLCA language still context-sensitive? The MLCA parser still engages in derivations just like the LCA parser does; the only divergence between the two is precedence order in linearization, where in one case, we'd 'speak backwards', as Chris put it in an earlier class. If all this is correct, then I'm missing the strict and exclusive association between 'operational memory' and the LCA. Unless, of course, we mean that because an MLCA linearizer puts operators last and variables first (but maybe not Verb final languages) that prevents derivations from taking place (like Agree), but if vacuous operators are okay and Verb final languages prevent us from making a statement on how to enact Agree and quantifiers can still come after variables in an LCA-linearized speech stream (i.e. object quantification) in a speech stream, then this exclusive derivational-LCA relationship is tenuous, as LCA stream does nothing uniquely special.

#Unifying these two last points: Might the MLCA have allowed for context-sensitive processing, as the LCA-stream seems to inconclusively support operator-variable arragement parsing ease; Might there have been no proto-language that's relatively 'simple'? If so, have we lost a motivation for keeping the LCA over the MLCA?

Monday, October 29, 2007

A Unified Theory....Maybe or Maybe Not

Here's a comment from the Phillips and Wagers paper about the mental effects of islands: the two authors mentioned a disparity in the locus of disparity detection in island violations. In head-first languages, or, as much of the head-parameter we're as a discipline still willing to entertain, the confusion point was hypothesized to be the verb; in verb-final languages such as Japanese, I believe, the salient point was detected before the verb at the gap. The authors then questioned whether or not there was a test for detecting a uniform result or whether there was a formal theory under which both disparate results could be nestled comfortably. One possibility is to fall back to a Whorfian hypothesis and claim that differing languages have an impact on how speakers thereof encounter their world, hence the divergent results in English and head-final languages and the resultant conundrum of which formal theory to back or eschew. Either we simply haven't found the right test (which is possible) or, and stay with me here, there is no single theory of language. I'm willing to entertain at least that a Saussurean division of concepts into vocabulary affects the range of thought in 'mentalese', but perhaps this effect isn't mighty enough to affect language processing. I'm not sure, as I'm not a psycholinguist. But, just as there's no match for everyone on eHarmony.com, a unified theory of grammar may be just as elusive given such results.

Thursday, October 25, 2007

<, {, ?

In the most recent meeting, we discussed extensively the relationship between two observed kinds of syntactic operations: External Merge (née Merge) and Internal Merge (née Move). Furthermore, relations were set up between them in this form:

(1) EM < IM
(2) EM { IM

The first relation is read as "external merge precedes internal merge", where precedence is established in some sense of derivational time, however one prefers it. The second relation is read as "external merge is preferable to internal merge", at some given point of derivational time, possibly based on some notion of economy.

It was held, in class, that these two statements emerge from different sources, the former, at minimum, from empirical observation. And that, separate statements though they may be, they appear to lend support to one other.

Tim Hunter argued that at least one of them---the first one---may be entirely trivial given the second, which, if true, suggests that the second statement must cover every instance that the first is claimed to cover. After a brief discussion after class, I am even more inclined to agree with him than I was in class. Consequently, I---and possibly also he---are having difficulty engrokulating* what the attempt at making a distinction between (1) and (2) serves to explain, considering that it seems to be part of the basis for future discussion.

A major part of the distinction between the two appears to be the following reasoning and observations---which I may well have entirely misconstrued:

(3) a. Complex syntactic objects sometimes appear as the Initial Selection from the lexicon.
b. We never see a movement/IM within these objects before some EM occurs. If we did, we might see things like the "glob" and "gleb" verbs mentioned in class. We might also all get positive acceptability judgements for sentences like "The bucket was kicked by John's cohort."
c. Consequently, EM is prior to IM at least in this, so an IM never occurs before all EMs, even though in principle it could.

The bolded statement is the problematic statement, embedding an assumption that is rather too profound for both me and Tim, if I may take the liberty to speak for him in this. When does movement/IM occur? Typically, as I have understood minimalist syntax, it happens when an uninterpretable feature must be checked/valuated/whatever. So what situation is (1) actually blocking? It is blocking the situation in which a complex syntactic structure is selected wholesale from the lexicon, and it has an uninterpretable feature that needs checking at that point in the derivation.

If this is a situation that we need (1) to block, it follows that such an object must exist in the lexicon. Perhaps it is only my limited mind, but it's very hard for me to construct such a lexical item. Idioms like "kicked the bucket" were bandied about as unpassivizable (perhaps!) complex lexical units. But passivization and other operations typically always happen after a merge/EM involving T or something.

So if there are no such objects (an assumption of mine you may challenge), then it's difficult to see a nontrivial distinction between (1) and (2) that gives an independent meaning to (1).

*engrokulate = to cause to be grokked.

Saturday, October 20, 2007

Looking For Alternative Gurus

In the paper, C&M mention that a context free grammer can in no way represent the interior of a speaker/hearer, that we have to instead posit something akin to a transformational model. I'm not sure I entirely agree with this. Really, context free-grammars have done quite a bit of work in the NLP domain, and (S -> NP VP) is quite a standard thing to see, and if given enough of the rules with sufficient generatilty to allow for embedding and such, I don't see why this can't cover the range of acceptable sentences in a given language. I mean, do we really need transformations? We've gotten rid of D- and S- structure, so why not dispense with the entire paradigm? There are theories out there that do with out such frivoloties. Shouldn't we as informed consumers expand our intake and shop around? Quite frankly, if it's the goal of this class to critique the foundations of the Generative enterprise, shouldn't we be reading actual critiques? Lakoff, Pollard and Sag...."Women, Fire, and Dangerous Things" is actually a great read. I just don't think rehashing what Chomsky has to say will give us any insight into the potential flaws of the discipline that maybe others with alternative formulations may have already noticed.

Monday, October 15, 2007

Monday lulz: the computations of fashion

From Legal Theory Blog (hat tip Sullivan).

The relevant quote for us:

The reason why analytic philosophers (and similarly mathematicians and cognitive scientists) have a difficult time dressing themselves or dress poorly is that the satisfaction of any sentence involving the "goes with" relation is not finitely decidable. There is no algorithm by which one can in a finite amount of time, much less in the morning before you are too late for class, decide with deductive certainty whether an outfit is sharp and properly accessorized. Now, there are rules which by which we can rule out entire classes of ordered pairs, e.g., let x be a member of the class of checked clothing and y be a member of the class of striped clothing, it is fairly trivial to show that for all such x and all such y, Gxy must be false (I leave it as an exercise to the reader to provide a proof). But for the general case there is no finitely executable decision procedure such that for any two arbitrary articles of clothing one may determine the satisfaction of G.

Wednesday, October 10, 2007

The Case for Case

Send me back to Ling-610 if you want, but I still find something fishy or funny about the claim that Case is a vacuous, meaningless item void of any interpretation whatsoever. We have to admit that there's a high coincidence in, especially more synthetic languages, Case marking and theta-role interpretation. In fact, the first thing I go to when learning a language is the Case system, as that's intrumental in expressing a thought (the second would be relative clause markers). Besides, is there anything that we could pinpoint as a theta-role marker aside from Case, or, in some languages word order, or in some languages, both? I dearly hope the point of disparaging Case marking is not to light a candle at the alter of the supremacy of word order; just because some languages like English don't have profligate Case marking doesn't mean that Case is worthless other than serving as something that allegedly drives syntactic process by serving to highlight the availablity of an alleged Goal. After all, in languages with free word order, Case marking is the only savior in decoding the object-action schemas. And please don't tell me that that device is better served by a 'scrambled' underlying word-order, as that just smacks of English hedgemony.

So, the point up to here is that Case and Theta seem to overlap quite a bit. As for the two distractors offered in class: ECM and Passive, I think they're trivial. For one, the passive is a marked form. In English, there's something about the + participle that tells you Case interpretation isn't what it normally is. There's a Passive marker in Arabic telling you the same thing. In German, too. Secondly, the fact that the Agent in the subordinate class of an ECM verb is marked with the Accusative form may only be a synchronic fact. Has anyone looked at the history of this construction? Funny things happen all the time in the course of language development. Greek (modern, I think) has no infinitive, but that doesn't mean the infinitive is a useless form; on the contrary, when Greek lost the final 'n' in cases, the infinitive looked identical to the third person singular. Consequently, people started to reinterpret the syntax of an control infinitivals. But the fact that Greek lacks an infinitive shouldn't be used as evidence for some theory, because it had an infinitive at some point. Perhaps the same holds with the English case.

As an endnote to all of this, I just discussed these views with a fellow linguist trained in the Generative tradition who also happens to have a PhD in theoretical syntax, and he agrees with the above argument. Why not send him back to Ling-610?

Monday, October 8, 2007

Statistics and the independence of syntax and semantics

I know we're moving on to animal communication, but I had a comment on last week's materials that I felt should be made. It seemed to me that the "take-away" lesson we converged on during the course of our discussion of the LSLT excerpts was that Markovian processes, whether hidden (over POS tags) , high order (lots of history), or both, were dismantled by Chomsky as implausible accounts of natural language phenomena. This is undoubtedly true, but not even remotely controversial for most folks who are interested in statistical models of language. What I thought was also quite evident from reading LSLT but wasn't really addressed was the fact that Chomsky also argues for independence of meaning and grammaticality (I know this sounds so obvious as to almost be another example of the inanity of people who "work on statistics"). However, the implications of this claim are actually extremely precise for any statistical model. Specifically, we know that the independence of two events A & B has the following properties:
  1. P(A,B) = P(A)P(B)
  2. P(A|B) = P(A)
  3. P(B|A) = P(B)
Therefore, if we are interested in "grammaticality" and are working with a model which conflates meaning and grammaticality (ie, P(A,B)), we must immediately doubt whether we can really even address the later without factoring out the former. Conversely, if we have a model which predicts both, we would expect their relationship (if Chomsky is right) to conform approximately to the relationship expressed in (1).

Any thoughts? Did I miss something obvious?

Wednesday, September 26, 2007

Wouldn't it be Interesting......?

I for one would really like to see the HCF Hypothesis 3 proven wrong. That would be an interesting bit of happenstance. Perhaps we're so conditioned to believe that humans are the only ones lucky enough to indulge in linguistic behaviors that we're oblivious to what's actually going on in nature. Much like Jane Goodall's professor who was adamant that only humans solved problems with tools. So hence my inquiry about recursion. Are we sure birds lack it? I haven't studied this enough, so I don't know much of the facts, but I'd be happy to view them. On the other hand, I know it's probably difficult to deny that recursion obtains in language use (except maybe for the Piraha), but HCF did leave the door open to such a negation when claiming that that capacity may be a characteristic of other cognitive systems, such as navigation and social interpretation. Given this, might there not be minimally analogues of such a capability in other species? Other navigators or other beings that interact with others like themselves? I personally won't be so quick to write off birdsong to finite-state output, simply in deference to a theory, which is simply and only a theory. But maybe this is because I'd just be tickled to see it dashed in the name of science.

Friday, September 21, 2007

Thoughts on Evolution of Language Readings

So, I've read the Hauser/Chomsky/Fitch paper, and a few things struck me. For one, they claim that the language faculty, okay, syntax, is a (near) optimal mechanism for connecting the sensory motor system to the conceptual one. If true, why should displacement be a part of it?Why should recursion be a part of it? Ideas may be expressed in discrete units; the Piraha, in fact, may lack recursion altogether. Of course, at the end of the article, the authors claim that recursion may have evolved independently in a domain-general fashion, becoming specialized only later. I'm willing to accept this conclusion, but it says nothing on how our little group characterizes language. All of that is to say, then, supposed feature-checking mechanisms are optimal? For all our spouting of 'elegant rules', I would think a system without movement and D-features would be a better exemplar of the cleanliness of mental operations. I believe there are theories out there that do without movement, so maybe being an evolutionary linguist means taking a hard, difficult look at the level of theoretical complexity that we've devised.

Furthermore, if it turns out that recursion isn't a part of the language faculty (per the suggestion that it's used for other mental concerns, does this mean that birds also have language, albeit with a much reduced vocabulary? I believe the article said said that was a limiting factor to the classification of their abilities. Along this line, there is a professor at Duke who is studying songbird neurolgy in order to decode how language is learned, eventually applying it to human nuerobiology:

Department of Neurobiology, Duke University Medical Center, Durham, North Carolina 27710, USA
Address for correspondence: Eric D. Jarvis, Department of Neurology, Duke University Medical Center, Box 3209, Durham, NC 27710, USA. Voice: 919-681-1680; fax: 919-681-08772. jarvis@neuro.duke.edu

http://www.jarvislab.net/
Vocal learning, the substrate for human language, is a rare trait found to date in only three distantly related groups of mammals (humans, bats, and cetaceans) and three distantly related groups of birds (parrots, hummingbirds, and songbirds). Brain pathways for vocal learning have been studied in the three bird groups and in humans. Here I present a hypothesis on the relationships and evolution of brain pathways for vocal learning among birds and humans. The three vocal learning bird groups each appear to have seven similar but not identical cerebral vocal nuclei distributed into two vocal pathways, one posterior and one anterior. Humans also appear to have a posterior vocal pathway, which includes projections from the face motor cortex to brainstem vocal lower motor neurons, and an anterior vocal pathway, which includes a strip of premotor cortex, the anterior basal ganglia, and the anterior thalamus. These vocal pathways are not found in vocal non-learning birds or mammals, but are similar to brain pathways used for other types of learning. Thus, I argue that if vocal learning evolved independently among birds and humans, then it did so under strong genetic constraints of a pre-existing basic neural network of the vertebrate brain.

Wednesday, September 19, 2007

Reactions to September 19 class

Ok, my first reaction: Whew! As in, "Whew, it's over..." :-)

Seriously, though, I appreciate the really useful discussion. Which has two readings, doesn't it: "I appreciate the (entire) discussion, which was really useful", versus "I appreciate the useful subset of the discussion (in contrast with the useless bits). I meant the former.

That's all from me for now. I'm looking forward to seeing how the discussion progress.

Thursday, September 13, 2007

That function

Mwaha. I have been given the keys to the kingdom. Nothing can stop me now!

So, I get that we're now sort of converging on the idea that we want to characterize human language as a function from some set of representations to some form of kind-of-sort-of set membership measure---e.g. (PF, LF) -> {0,1} or (numeration, PF, LF) -> [0,1] or whatever.

But something has often struck me as a little odd when we go this route. It doesn't seem to have that much to do with the machinery of linguistic computation. If we worry whether there is an equivalence between the kind of memory that a Turing Machine has and the kind that the brain has, well, it seems that the machinery of linguistic computation is of central importance. So why aren't we instead characterizing the function as two (potentially inverse) functions: PF->LF (parsing) and LF->PF (generation), rather than attempting to characterize grammaticality judgements on (PF, LF) pairs?

Or am I missing something fundamental, it's late and I'm just wooly-headed, etc, etc?

Wednesday, September 12, 2007

Plans for September 19 class

I've added details for next class, including readings, on the schedule. In addition to requiring Abney (thanks, Tim, for posting the links!), I recommend Manning and Sorace/Keller.

Steven Abney paper for 19 September

http://www.vinartus.net/spa/95c.pdf

Also in the locker here:
http://www.ling.umd.edu/locker/ComputationsOfLanguage/Abney95c.pdf

Friday, September 7, 2007

Reactions to Class 2 and to reading for Class 3

The title says it all....I've posted some comments on the initial 'Reactions to Class 1 and Reading for class 2' and simply didn't think of initiating the post.

Wednesday, August 29, 2007

August 29 class -- reactions after class

Well, I was really pleased with how things went! People had very interesting things to say, and I felt as if the mix (computational, linguistic, etc.) was a good one.

Here are the pointers to the papers from today:

Marr, David. Vision. W.H. Freeman, 1982
http://web.archive.org/web/20051227154554/http://www.psych.upenn.edu/backuslab/psyc111/Readings/Marr_Chapter1.pdf

Kosslyn, S. M., and Maljkovic, V. (1990). Marr's metatheory revisited. Concepts in Neuroscience, 1, 239-251.
http://www.wjh.harvard.edu/~kwn/Kosslyn_pdfs/1990Kosslyn_ConceptsInNeurosci1_MarrMetatheory.pdf

Folks should comment on this posting in order to create their after-class reaction pieces.

For next class (and the one after), these readings are worth looking at:

"Turing Machine", Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/turing-machine/

Big-O Notation, http://en.wikipedia.org/wiki/Big_O_notation
Optionally,

"Church-Turing Thesis", Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/church-turing/

"Computational Theory of Mind", Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/computational-mind/

There will soon be a posting of a pointer to a PDF file for Howard's readings.

How to post reactions

I think it would make sense for each class session to have a single blog posting that includes everyone's reactions before and after as comments. Whoever gets there first can get the honor of creating the initial posting, and everyone else can add comments (and comments on comments, etc.) after that.

Chris and I will show you an example in a second...

(Of course, please also feel free to do fresh new postings on other related topics if you are inspired to so so.)

Welcome to The Computations of Language!

Ok, so we're going to try out Chris's suggestion and see how a blogging setup works as a communication mechanism for the class -- at least in terms of people's reactions and discussion. I'll also set up a regular old mailing list for class announcements, and I'll make sure to quickly provide pointers to the syllabus, readings, etc.

In terms of structure, just a reminder that Juan and I would like folks to now enter the virtuous cycle of (a) reacting to the previous class, (b) doing the readings for the next class, (c) reacting to the readings by, say, 5pm Tuesdays, (d) reading everyone's reactions in advance, (e) optionally returning to step d and reacting to people's reactions, (f) ... ok, you get the idea. Lather, Rinse, Repeat. (I don't think I would have written that if this were not a blog. Hmmm, wonder what this medium does to people...)

As always, also please feel free to e-mail Juan and/or me privately with any concerns, issues, discussions, etc.