Ok, my first reaction: Whew! As in, "Whew, it's over..." :-)
Seriously, though, I appreciate the really useful discussion. Which has two readings, doesn't it: "I appreciate the (entire) discussion, which was really useful", versus "I appreciate the useful subset of the discussion (in contrast with the useless bits). I meant the former.
That's all from me for now. I'm looking forward to seeing how the discussion progress.
Wednesday, September 19, 2007
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2007
(17)
- ► 11/25 - 12/02 (1)
- ► 10/28 - 11/04 (1)
- ► 10/21 - 10/28 (1)
- ► 10/14 - 10/21 (2)
- ► 10/07 - 10/14 (2)
- ► 09/23 - 09/30 (1)
- ► 09/09 - 09/16 (3)
- ► 09/02 - 09/09 (1)
- ► 08/26 - 09/02 (3)
5 comments:
Ok, I think I am going to go first, if only because if I don't post now, I am going to forget what people said, which make up a substantial part of this post.
First of all, I am *really* glad that I sat in on today's class. As I mentioned in the first lecture, my background is in Electrical and Computer Engineering and I came to computational linguistics purely by way of Computer Science without having any formal training in the discipline of Linguistics.
I think that this lack of background, while disadvantageous in most ways, was actually a big advantage for me in today's discussion because I got to hear the linguists' points of view on a premise whose truth I have always just taken at face value (because of my training). And that was extremely instructive !
Most of the arguments that I heard against Abney's proposals gave me pause at some point or the other - probably because I am completely unfamiliar with how theoretical syntacticians conduct research. I just couldn't tease apart the crux of any argument against the "squeezing the balloon" argument that Philip brought up. However, I think the strongest argument was made by Norbert. I want to make sure that I understand it and so I am going to try and repeat it here. The crux of Norbert's argument, IIRC, is that unless there is independent motivation and corroboration to impose probabilities on to the algebraic representational structures proposed by the (discrete) theory of syntax, there is absolutely no reason to assume that that is something we should just do. I also really liked the analogy he gave - Newtonian Physics and Thermodynamics. I believe that this argument makes perfect sense in any scientific domain, not just Linguistics.
However, I have a couple of follow-up questions that would help me understand this a lot better:
(1) What kind of independent corroboration or motivation would convince syntacticians to try and formally add the the statistical layer on top of the discrete algebraic structures of syntax ?
I remember Norbert distinguishing between two ways of adding such a layer. The first was to add probabilities only as a method for approximation calculations. The second, and this is the one that I refer to in the question above, was to actually realize that the laws (or structures) themselves are stochastic in nature and require probabilities in order to even be compatible with the evidence.
(2) People mentioned that researchers in Linguistics have actually gone back and repeated their experiments incorporating some of what Abney said and the results are still exactly the same. Could someone shed more light (references are perfectly fine) on this ?
That's all I can think of now. Please feel free to correct me if I misunderstood the argument. I am sure other people are going to have more intelligent things to say which might lead to further follow-up questions.
For those of you who don't know him, DesiLinguist is Nitin Madnani, a colleague of mine from my Other Universe---or, he introduced himself when I wasn't there? My bad.
I'm not close enough to the research to answer (2), but I think I have a concise way of putting the answer to (1).
The Stochastic Technique Relevance Criterion: do stochastic techniques tell us something about language that is not also equally likely to be true about the rest of the universe?
I suggest that the answer that syntacticians have generally accepted is "No." That doesn't mean that the techniques can't be *useful*. Nor do the ideas necessarily fail in explaining certain parts of acquisition.
BUT there is an irreducible component of language that is *different* from the rest of the universe in which stochastic techniques/ideas haven't been shown to shed any better light on the matter. Until someone can show a plausible way to get "at" that irreducible component with stochastic technologies AND tells us why those techniques would tell us more than "easier", traditional techniques of syntactic investigation, then there's not a lot of reason to pursue them. So far no one has shown this, including Abney, and it's not for lack of trying.
It happens that that component is what we are calling, here, the algebraic component of grammar.
And that's the answer to the "squeezing the balloon" critique, in my mind. Yes, a rule change results in perhaps complicated consequences for the rest of the system. That's actually not necessarily a BAD thing. That alone suggests a possibility to us---that perhaps that class of things that are affected by the change don't belong to the algebraic component of grammar, and instead belong to the "rest of the universe" where stochastic technologies might apply.
I think one way of putting the reason why a lot of linguists (me included) are a little irritated by claims like Abney's is that, well, no one said that stochastic explanations *can't* be used to account for *anything*. But what syntacticians at least are trying to account for is precisely *how much* of the underlying system is *different* from what can be accounted for stochastically. Abney attempts to cut short that discussion, apparently not seeing much value in it.
A good analogy in my mind is the so-called Central Dogma of molecular biology. Discrete "algebraic" expressions---genes---are transcribed to RNA and then transcribed to proteins. That's pretty discrete and algebraic---and can be described by a set of tables, in fact.
But in "real life" the process and conditions of transcription are pretty stochastic. What activates a gene to be transcribed is a stochastic process at one level. At a lower level, the molecular interactions themselves are quantumishly stochastic. But there's an easily-identifiable middle layer that is discrete.
We have no reason to think that language has no such middle layer, and lots of reasons to think it does, despite the naysayers.
(All of this assumes that you recognize that "stochastic" is not just a placeholder for something we don't know about yet, of course. I'm not always willing to make that assumption, but now we drift into metaphysics. Hence the closing parenthesis...)
I hope this clarifies the matter.
Two things here:
[1]: The pushing-the-balloon effect has entered what we consider to be a mental grammar proper, to a degree I that I can't see that it's part of a separate system: When the switch was made to a phase-based system in syntax, because CP was considered a phase head, and because phase-heads had to have escape Edges to things could eargerly move upward, we lost a lot of descriptive coverage given by Subjacency, e.g. things like '*what did you say the proof that you can't square' became confoundingly good again because Chomsky told us that we no longer believe in the effects of bounding nodes. We therefore can't star that sentence because two bounding nodes (CP and NP) were crossed in one fell-swoop. So, now, we have to make up different reasons why that sentence is bad, and I had one involving Genitive Case assignment and A-positions, but who will ever know if that's right? At any rate, I think it goes to show the downfall of rule based-systems' performance in regression tests when a change is often made: other parts of the systemmic coverage simply fall apart. That's not to say that statistical systems are immune from the same sorts of disheartening results, but, in this case, we can't simply write it off to some other component of the head.
[2]: We were discussing today whether or not a stochastic grammar is even possible. I mentioned that you could derive internal probabilites given your experience with language. It was then mentioned that this would involve a chicken-and-the-egg problem whereby it would come to pass that the first person to express language would have exposure to language to set their internal numbers. However, isn't this still a problem even without positing a statistical grammar? If we believe in the P&P approach to acquisition, and if we furthermore believe that you have to hear/see language to learn it (I can't see how someone wouldn't believe this) then how do we explain how the first language learners learned a language which hadn't existed before? So doesn't acquistion as we know it still confront a chicken-and-the-egg problem? If that's the case, then we can't really discount the experience-as-corpus argument from this direction.
Now, some would say that to take that last approach, we'd have to have some internal system that kept track of P(lang. instance)/P(Normalizer) to arrive at statistical calculations to assign to rules. Who's to say we don't have one? Children musn't acquire every single thing they hear as a rule. Else they would hear someone stutter ,'the-the' in hesitation and encode that as a regular expression in the lexicon, using it in free variation with 'the', I would think. Even when we talk about paramter setting (do we still believe in parameter settting? Or has that gone the way of GB?), surely just one occurence of a construction can't be enough to sway a hard decision? Even David Lightfoot said that there have to be so many occurences (taken as percentages) for rules to become set.
So, in actuality, I don't see why a stochastic grammar given this formulation is so necessarily out of the question, given concerns about how those probabilities would be set and learned.
In response to Nitin's request for references demonstrating that the result of more controlled experiments of grammaticality judgments leading to virtually the same results as the original method (that is, syntacticians asking a native speaker of a language for a grammaticality judgment of a sentence), the most recent and salient source would be Jon Sprouse's PhD thesis, successfully defended in the UMD ling dept this July. You can find it in the linguistics pdf locker under Islands/sprouse - 2007...
Hope this helps!
Sarah:
[1] As to the "squeezing the balloon" problem, I completely agree that it _is_ a problem and that it is a problem for the mental grammar proper-- changing a rule to account for some data might make you unable to account for other data. (This is actually something that worries me a lot in my own research.)
That said, I think a really important and interesting thing to do would therefore be to come up with algorithms for testing rules against a lot of data at once. This, however, is independent of whether you see grammatical rules as algebraic or not.
However, I'm not entirely sure how a 'stochastic' component to the grammar proper would get around this problem. Surely you would still have cases where changing one rule would have broader implications than foreseen?
I may be ignoring or forgetting important examples, though-- so please correct me.
[2] As for the 'chicken and egg' problem, I think it was an argument that there should be a certain amount of _innateness_ in language: we should be predisposed to interpret linguistic input in certain ways. These might actually lead us to a language that does not entirely match the input (as in creolization or home sign or language change).
This is _not_ an argument about whether grammar is stochastic. All it speaks to is that the processes underlying language acquisition should be partially innate.
Post a Comment