Thursday, September 13, 2007

That function

Mwaha. I have been given the keys to the kingdom. Nothing can stop me now!

So, I get that we're now sort of converging on the idea that we want to characterize human language as a function from some set of representations to some form of kind-of-sort-of set membership measure---e.g. (PF, LF) -> {0,1} or (numeration, PF, LF) -> [0,1] or whatever.

But something has often struck me as a little odd when we go this route. It doesn't seem to have that much to do with the machinery of linguistic computation. If we worry whether there is an equivalence between the kind of memory that a Turing Machine has and the kind that the brain has, well, it seems that the machinery of linguistic computation is of central importance. So why aren't we instead characterizing the function as two (potentially inverse) functions: PF->LF (parsing) and LF->PF (generation), rather than attempting to characterize grammaticality judgements on (PF, LF) pairs?

Or am I missing something fundamental, it's late and I'm just wooly-headed, etc, etc?

22 comments:

Asad Sayeed said...

To clarify, I can see why a *syntactician* would care about how representations map to {0,1}, in the aspect of the syntactic endeavour that attempts to characterize how some sentences appear but not others. It seems like in *this* particular intellectual voyage, we're more interested in how a sentence that *did* appear, appeared.

Tim Hunter said...

Here's my own attempt to quasi-formally justify the "standard" conception of how we study language ...

There can be no function from PFs to LFs, because we know that some sounds are ambiguous. Sometimes, upon hearing a particular sound/PF, humans compute a particular meaning/LF, but since the same PF can sometimes be associated with different LFs, this computation must have more than just the PF as an input. We don't know what it is, but to get things off the ground we can bundle all the input other than the PF into a big, inelegant, poorly-understood but arbitrarily fine-grained notion called something like "discourse context".

Then, one function which we can talk about, and which humans do compute, is
F : PF x D -> LF
where PF is the set of all sounds, LF is the set of all meanings, and D is the set of all discourse contexts.

If we agree that this function exists, then we must agree also that there is a particular relation R with PF as its domain and LF as its range, such that a pair (pi,lambda) is in R if and only if there exists some discourse context d such that F(pi,d) = lambda. So the pair (pi,lambda) is in R if and only if, in some discourse context, a human understands the sound pi to have the meaning lambda.

Of course, eventually we would like to have a good understanding of how the function F is computed, because this is "the thing that actually happens in the real world". But R is interesting because it does a massive amount of the work required of F. What I mean by that is, there are a priori an infinite number of meanings that a human listener *could* arrive at upon hearing a particular sound pi in a particular discourse context d, of which F must pick out one ... but for any given sound pi, there is a quite astoundingly small number of meanings that F *ever* outputs when the input includes pi, no matter what d is. Usually only one (eg. "John saw Mary") or two (eg. "I saw a man with binoculars"). In other words, F has two inputs, a sound and a discourse context, and just by looking at the sound it seems to narrow the question "Which of the infinitely many possible meanings should I output?" down to "Which of these one/two/three possible meanings should I output?" Even though R is not a function, because there are sounds which it pairs with more than one meaning, it's "interestingly close to a function", in the sense that there are no sounds which it pairs with a very large number of meanings.

In contrast, the discourse context component of the input to F seems to do very little of the work. Whereas considering just the sound component of the input lets F narrow the infinitely many possible outputs (meanings) down to a very small number of candidates, considering just the discourse context component seems to leave *all* the infinitely many possible meanings as candidates. For example, even if a discourse context d is so chock-full of the idea of spotting people carrying binoculars that the sound "I saw a man with binoculars" is interpreted with the PP attached to the NP without even noticing the ambiguity, this won't stop the sound "I used binoculars to see a man" from being interpreted such that the binoculars were only a tool used for seeing, and had no connection with the seen man. In other words the possible meanings that can be output by the computation of F are not constrained by even the richest discourse contexts.

So there are appear to be some very, very strict compatibility conditions on the (pi,lambda) pairs in R. No such compatibility conditions at all exist on the pairs in the "mirror" relation R' which contains the pair (d,lambda) if and only if some sound pi is such that f(pi,d) = lambda. (R' seems to be the entire Cartesian product of D and LF.) Observing this about R, keeping in mind that R is defined completely in terms of F, must tell us something about F. Whatever carries out the computation of F is getting bossed around pretty severely by something imposing a lot of rules about which sounds can go with which meanings.

If we ask where "computation" fits in to the study of this whole system, I think there are two very separate places.

The obvious one is in looking at how humans compute the function F. This is a true blue case of computation in all the standard ways, which the mind/brain actually carries out, and which is amenable to Marr-style analysis and so on. This is really hard, because we don't know much about (1) what discourse contexts are, or (2) how the brain computes anything at all.

The other one is in testing hypotheses about a particular aspect of how humans compute F: what the rules of sound-meaning compatibility are that the computation of F is subject to. In order to ask whether a particular set of rules could be the right ones, since we don't know much about the actual computation of F which these rules are fitting in to, we just ask whether a Turing machine could (hypothetically, were there a need) enumerate all and only the permissible (pi,lambda) pairs in R, given only those rules. This is the kind of computation that syntacticians call a derivation. The assumption is that if a Turing machine can't do this, then they can't be the rules imposed on the computation of F.

I think this is the picture that Norbert and Howard (and possibly Juan, though maybe only to the extent that he didn't want to lose his tenure) were describing at the end of class, but they should correct me if I'm wrong anywhere.

Some syntacticians, I suspect, are interested in the possibility that there could be a closer link between derivations and the computation of F. But what I've described above is my understanding of how the view where there need not be any real connection at all would be justified.

Some questions which I think it opens up are:
- Which computational properties (of the kind that are interesting when discussing real machines) of the hypothetical machine which enumerates/derives all the (pi,lambda) pairs in R are interesting/meaningful?
- What is the Marr-style level one/two/three description of the computation of F?
- What does a discourse context look like? (It probably doesn't contain the current temperature, it probably does contain something from the previous sentence uttered, ...)
- What parts of the mind/brain that let us compute F do non-human animals not have?

Some questions which don't make sense on this view are:
- Why does the hypothetical machine which enumerates/derives all the (pi,lambda) pairs in R fail to compute F?
- What is the Marr-style level two/three description of the hypothetical machine which enumerates/derives all the (pi,lambda) pairs in R? (The level one description is all we care about.)

sarah a. goodman said...

Why are we holding Turing machines to be a model of human computation, or, better said, psychological function, as computation could be a completely inappropriate word for what the extant mental reality actually is. I've been on psychiatric medications that claim the biochemists have little clue about how it actually works, only that it seems to ameliorate symptoms. So, if the physicians can't decode neural functioning, what hope do psychologists have? Such is to say, we really have little idea what's going on upstairs, so I think it's a bit premature to hold a Turing machine to human standards.

Asad Sayeed said...

So, I'd first like to say that Sarah asks a good question, and that I'd like to maybe recast Sarah's question in the terms that Tim is using to see if it clarifies matters.

To summarize Tim as I see it: we do not start with PF->LF/LF->PF, because there is a Secret Special Sauce involved in those functions that make them actually into functions (ie, unambiguous output for a given input). This Secret Special Sauce is beyond present-day linguistic characterization, so we cannot take a direct stab at determining the nature of the "actual" function. Consequently, we retreat to determining the process by which one enumerates all valid (PF,LF) pairs, hoping that this process will expose enough of the machinery that, by the time the Secret Special Sauce comes around, we will be ready to integrate it in well-defined ways into our relation.

Here's where Sarah's critique comes in. Computationally, we tend to characterize these enumerating procedures using formalisms like Turing Machines. Well, it doesn't look like there's any reason to think that this kind of characterization will yield a single solution.

What do I mean by "single solution"? If our ultimate goal is the formal characterization of the "actual" procedures of generation and parsing, then it's not enough to yield a *range* of formal systems that are *notationally equivalent* to one another. We also need a criterion by which one formalization is *better* than another formalization, even if the languages they yield are equivalent.

We know from the history of linguistics for the last few decades that for any given minute little detail of grammar, people are willing to come up with any number of potentially notationally equivalent (...I think that's a useful term, Potential Notational Equivalence, don't you?...) formal explanations. I can't think of any reason why characterizing the enumerating relation wouldn't just yield a lot of PNE desciptions...

This suggests to me that, if the Secret Special Sauce is the barrier to moving directly to characterizing the PF->LF and LF->PF functions, then we actually need to step back an examine the content of the Secret Special Sauce before we do anything else.

Uh oh.

Alex Drummond said...

I don't think characterizing PF→LF and LF→PF can be an alternative to characterizing acceptable <PF, LF> pairs. Production and comprehension have a lot in common (e.g. they both make use of the same lexicon), and a good way to characterize what they have in common is to characterize the set of acceptable <PF, LF> pairs. Of course, it might turn out that production and comprehension are actually completely psychologically distinct, but that seems like the worst case, not something we'd want to assume from the outset.

Focusing exclusively on production and comprehension leaves out a lot of our linguistic ability which seems like it ought to be amenable to theorizing. For example, my ability to give the corresponding yes/no question for any given declarative sentence doesn't follow from my having an LF→PF function or a PF→LF function. It seems to have more to do with my knowledge of relations between pairs of LFs.

I think trying to move to a full-fledged theory of production and/or comprehension (i.e. one which included a theory of the Special Sauce) would be misguided. Actual language production and comprehension involves virtually the entire psychology of the speaker (at least potentially), so it's not obvious that any theory of it can be constructed at all. However, on the assumption that there are linguistic representations which look something like <PF, LF> and which satisfy certain formal properties, we can start to study very small subcomponents of language production and comprehension (e.g. the parser), without having to worry too much about the Special Sauce.

I guess all of that is just the standard argument for the performance/competence distinction. I think it's sometimes forgotten that the performance/competence distinction doesn't mean that syntacticians are free to ignore results in psycholinguistics (or vice versa). From the fact that there is such a thing as competence as distinct from performance, it doesn't follow that we can study one without studying the other. AFAIK, Chomsky has always maintained that studies of performance systems could in principle have implications for theories of competence. However, he's also argued that in practice we know too little at the moment to be able to make many interesting connections.

Tim Hawes said...

In response to Tim’s entry….

While I think Tim’s entry nicely answers Asad’s question, I don’t think it’s a necessity that there be a “discourse context” or any other “special sauce” as input to the function that associates PF and LF. While this may well be the case, it is possible to imagine things from other plausible angles.

For example, nothing stipulates that for any given π language only happens when the appropriate(intended) λ is found. Personally, I’m ok with saying that the computations of language are proceeding just fine if ambiguous sentences don’t arrive at the correct meanings given the intention of the utterance. So, I can picture a computation associating PF and LF devoid of this discourse context, D. So, an ambigious π could pair with λ1, λ2…λn each of which maps to 1, and the computation is perfectly happy with that fact. Selection of the appropriate λ, given D could come after the pairing of π and λ. While this would mean that there is indeed a function F: PF x D -> LF, it doesn’t imply that this function would ever actually be necessary for computations of language.

This could go another way too (I think much less plausibly, but I’m not convinced it’s impossible), where PF isn’t really synonymous “sound”, rather that the PF is composed of non-ambiguous symbols. You could then have a function over D and sound that yields an unambiguous selection of symbols that make up PF. You then could have a function of LF to PF. This again doesn’t eliminate the a coherent function like F, it just means that such a function isn’t a necessity.

These examples aren't intended as any particular theory I want to adhere to, rather they are why I don't think a function like F is a necessity.

On to one of Tim’s questions, ignoring my above comments…Why wouldn’t the current temperature, position of the sun, or any other seemingly extralinguistic information be in the discourse context. Even if it is solely used as a disambiguator (which I imagine it wouldn’t be), there are probably examples of sentences that would require the state of the environment as context for disambiguation. “I just saw a man with night-vision goggles” ought to be far less ambiguous on a sunny afternoon than it would be in the middle of the night given no prior discourse, if you know that you aren't likely to see much of anything during the day with night vision goggles. (Sorry, this is a pretty lame example, but I’m having trouble coming up with anything better.) I agree with Tim that the discourse context isn’t doing the bulk of the work, but if I’m right here, it might need to consider the bulk of the available information. That seems odd to me.

Asad Sayeed said...

You know your use of fishy symbols is not portable, right? Blogger just ain't cool enough to do it correctly.

sarah a. goodman said...

Re: Tim Hunter's comment on the infinitude of possible meanings for a given sound sequence...is it really infinite? Aren't we forgetting about (strict) compositionality here? It may be infinite before any so called language processing gets a hold of it, just as a matter of formal supposition, I'm willing to maybe entertain, but, even so, the character of a sound sequence constrains the meaning that can be wrought from it. If I say the word "dog", there can't be 1 million meanings attached to that outside our internal processing of it, if anything, it should have none in that situation, if you take meaning to only be bestowed once it gets received by a hearer, sort of the equivalent of a tree falling in a deserted or populated forest. But that's pure philosophy, and, furthermore, something which I don't believe you can really prove one way or t'other. So, bottom line, what's to prevent us from claiming that I PF has no meaning until processed in grey matter (somehow, we don't really know how, mind any one of you), whereupon the lexical definition of each semantically significant segment affects those of other segements in a particular way determined by sociolinguistic constraints of syntactic compositionality (because SVO vs VSO etc is nothing but a social construct says a professor of mine, and I'm inclined to agree), which lead us to a set a set of tightly constrained meanings for a sentence. I don't see this as a process of picking out one out of many meanings; I see it as a process of constructing meaning from a blank slate. Can anyone really, honestly tell me I'm wrong? Probably not. Can anyone tell me I'm right? Again, probably not. An argument can be made either way, I'm sure.

Tim Hawes said...
This comment has been removed by the author.
Tim Hawes said...

really Asad? they show up just fine for me in Firefox 2 and IE 7 (on windows), with no complaints from blogger of illegal symbols (unlike when i try to use the sub tag for example).

Do they at least show up as something identifiable as lambda and pi, or do I need to repost?

Chris said...

In some of the comments (which I may well have misunderstood, because I'm outclassed by nearly everyone in terms of my understanding of syntactic etiquette), I sensed a certain despair about the state of affairs once we had taken the necessary step of trying to think about language without discourse context/world knowledge (Tim's D). The temptation seems to me that once we've taken that step, then we can dispense with anything (computational tractability for example), since we're dealing with an idealization and/or abstraction. I don’t think things are that bad. With a few more (reasonable?) assumptions, I don't think things are as bad as they seem. It seems that if I take Tim's formalization of language as F: PF × D → LF and mix it (curry it?) with a the Fodorian modularity hypothesis, which says that things like discourse and world-knowledge (ie, D) live at a level above input models where things like PFs are converted into LFs, then we have a fairly strong prediction about how this function will be implemented. The modularity hypothesis tells us that D necessarily cannot enter into the picture until the lower level information processing modules have delivered their outputs (whatever the granularity in the input systems really is, Fodor is quite clear about them not having access to D), so therefore we are licensed to do several things. 1) Ask about how it is we might compute a set of LFs from a PF. 2) Ask if, for the sake of efficiencies, the lower level system is optimized/becomes conditioned to deliver certain LF's faster or with higher probability than others. 3) Exclude certain relationships between PF and LF because even without knowing the specific details of the implementation, the lower bound on computational resources would make them impractical to compute given any implementation.

I like this viewpoint. On the other hand, Fodor may be completely wrong. We'll have to ask him in November.

Asad Sayeed said...

Um, I think there's something I said that's perhaps figuring too strongly on this discussion. I am not demanding a one to one mapping between PFs and LFs. I am perfectly happy with ambiguity. So F:PF->P(LF) and F':LF->P(PF) don't bother me. If that's what it takes to get rid of the discourse context as disambiguator. I'll call these the Conversion Functions.

That's *still* a slightly different project from (PF,LF)->[0,1] (which I'll call the Enumerator). The Conversion Functions at least encourage us to consider the procedural paths to output sets. Characterizing the Enumerator gives us greater freedom to fall into ranges of explanations that are merely descriptively adequate, rather than explanatorily adequate.

An analogy might be the algorithms that characterise membership in NP. For a problem to belong in NP, there must exist:

1. An algorithm that verifies (problem-instance,solution) pairs in at most polynomial time. (analogous to descriptive adequacy)

2. An algorithm that converts a problem instance into a solution in at most exponential time. (analogous to explanatory adequacy)

The Enumerator gets us something like (1). So if we are given a PF, then we can potentially write an Enumerator that tests every possible LF until we get a PF,LF pair that fit.

But there's no guarantee in my mind that (1) is anything like (2), just as there is no guarantee in the criteria for NP-membership as far as I know.

Tim #1 suggested D (the discourse context) as the Secret Special Sauce that gets us from Enumerator to Conversion Function, as I understand it. But we can factor D out by allowing the Conversion Functions to provide every possible conversion.

So we're left with no more Secret Special Sauce? Consequently, I'm even more dubious about the value of the Enumerator characterization.

Asad Sayeed said...

Sadly I'm using firefox 1 on a UMIACS administered machine, and it doesn't show up as anything like pi or lambda.

Asad Sayeed said...

...but that's OK because I can see them perfectly well on Firefox 2 on my home machine/notebook.

Tim Hunter said...

Tim writes: "Why wouldn’t the current temperature, position of the sun, or any other seemingly extralinguistic information be in the discourse context. Even if it is solely used as a disambiguator (which I imagine it wouldn’t be), there are probably examples of sentences that would require the state of the environment as context for disambiguation."

Totally true. I agree that pretty much anything can affect the choice of interpretations. The distinction I had in mind, but which I did a rather spectacularly poor job of expressing and finding examples of, was just the distinction between (1) things which might conceivably be "automatically" considered as useful information to take note of, because they're part of the speech act or something, and (2) things which are useful if the content of the utterance being understood happens to be related to them. A better example of (1) might be the tone of the speaker's voice, whereas something like the weather probably falls into category (2). But I can't think of anything that's *never* useful information of type (2), so absolutely anything the listener is aware of might be relevant, as you say.

And on that note, Alex writes: "Actual language production and comprehension involves virtually the entire psychology of the speaker (at least potentially), so it's not obvious that any theory of it can be constructed at all."

I think this is a really good point. I sometimes wonder whether there's any reason to believe that the study of how we process language needs to be more tightly tied to the study of our declarative knowledge of language, than it does to the study of how we process other things. Suppose we had perfect knowledge of how the brain actually carried out everything except language processing (mental arithmetic, motor functions, vision processing, etc.), right down to Marr's level three, but we still hadn't figured out what the constraints on sound-meaning compatibility imposed by the language faculty were. Then, one day, suddenly a bright syntactician/semanticist has a great idea which clears everything up, and we can write down a complete and detailed grammar for deriving exactly the right sound-meaning pairs. How sure are we that it wouldn't be completely obvious then, how the brain actually processes a sentence? Probably no one has any idea whether this is true or not at the moment, but I think it's at least possible that there wouldn't be much left to say about "language processing" at all, other than "well, it's the processing of language".

Of course, we're never going to get to that level of understanding of anything the brain does without asking level-three questions about neural processing somewhere, sometime, somehow. But I almost wonder, why beat our heads against the wall looking at the neural processing of something which we don't even understand at Marr's level one, when we have things like arithmetic as alternatives?

So-One said...

I had the same reaction as Asad after the last class. He says: “So why aren't we instead characterizing the function as two (potentially inverse) functions: PF->LF (parsing) and LF->PF (generation), rather than attempting to characterize grammaticality judgments on (PF, LF) pairs?”

When Philip begged the audience for an explicit INPUT and OUTPUT of language so that we can have an explicit (not abstract, not philosophical, not hand-waving, etc.) discussion about computations, I offered the one with numerations and the (PF, LF) pairs – which I once combed the Minimalist Program to find.

Chomsky says, “One component of the language faculty is a generative procedure (an I-language, henceforth language) that generates a structural description (SD), each a complex of properties, including those commonly called ‘semantic’ and ‘phonetic’. These SDs are the expressions of the language. The theory of a particular language is its grammar. The theory of language and the expressions they generate is Universal Grammar... With regard to the computational system, then, we assume that the initial state is constituted of invariant principles with options restricted to functional elements and general properties of the lexicon. A selection sigma among these options determines a language. A language, in turn, determines an infinite set of linguistic expressions (SDs), each a pair (pi, lambda) drawn from the interface levels (PF, LF), respectively.”

Language is not depicted as an evaluator that determines {0,1} Boolean values; it generates the SDs with sound and meaning properties. Language consists of the lexicon, the computational system that combines them together, and then the output of this interaction. When we had the discussion of the set theory definition of functions (a special kind of relation where each member of the domain is paired with exactly one member of the range with the exhaustivity requirement) two weeks ago, I asked whether ambiguity in language would prevent us from characterizing language as a function. We got around this problem by considering Boolean values as the output. However, if we had adequate ways of characterizing the PF and LF pairs, we may find that there really is no true ambiguity in language (think about Janet Fodor’s work with prosody). Word order is not the only information given to us in PF. It’s definitely a good place to start – and a good way of doing science – but it shouldn’t be the only thing we pick out of the data.

If language can serve as an evaluator, it can only do so by generating. Being able to evaluate the well-formedness of a sentence comes from trying to see if the grammar could have generated such a string with a certain interpretation. Thus, studying language as the evaluator (just for the purposes of being able to define language as a function) should not excuse us from having to look at the more interesting problem of understanding how SDs are generated.

Other Chomsky references suggest that narrow syntax doesn’t really have anything to do with sound either and that the object of study is the computation that takes the numeration to a structure that maps with the CI interface. So even if we take (numeration, LF) to be the domain, range pair of the function of language, we should be looking at a generative computation.

It just so happens that the numeration has to match with PF information in comprehension. As Norbert once suggested, perhaps a fairy whispers to us the numeration in other cases. Syntax doesn’t have to care where the numeration came from and how but that its requirements (features) are met through proper operations.

An important conceptual transition from GB to minimalism was a filter view of constraints to one that limited generation. Our reaction to ungrammatical sentences should no longer be depicted as “Oh, that was a filter violation; blah-blah filter didn’t apply once this string was generated” but “Such string is not generatable by the computational system I have.” I’m not sure we’ll get very far in our discussion about how the computation of language works unless we frame it on the generative property.

Tim Hunter said...

Asad writes: "I am not demanding a one to one mapping between PFs and LFs. I am perfectly happy with ambiguity. So F:PF->P(LF) and F':LF->P(PF) don't bother me. If that's what it takes to get rid of the discourse context as disambiguator."

That will certainly give us well-defined functions to talk about, with a PF as input and a single object, namely a set of LFs, as output (and vice-versa). But if it's the function which maps a sound to the set of all possible meanings that that sound can take on, then I don't think it's necessarily any better suited to characterising sentence processing of the everyday sort than the function from (sound,meaning) pairs to {0,1} is. Both of them are ways to "get from the relation to a function", without "adding anything extra", but you can't do this and consider the function to be characteristic of a particular mental computation without committing to the particular task that the function represents. So if you use the function from sounds to sets of meanings as the level-one description of your computation, and then look lower, then you're necessarily going to be looking at the process of finding all the meanings which can be associated with a particular sound; and if you use the function from (sound,meaning) pairs to {0,1} as the level-one description, then you're necessarily going to be looking at the process of giving a grammaticality judgement. Not that any of these processes are "the right one(s)" or "the wrong one(s)" to be studying, because they're all things we can do, to some degree of success. (Philip and I discussed this a bit last week; see our comments of 9-10 September.) But a lot of the recent discussion seems to have been focussed on describing language processing of the "natural", everyday sort which happens subconsciously, and it seems to me that the function that best characterises that process is, on the comprehension side, the one from (sound,context) pairs to meanings, and on the production side, maybe one from (meaning,context) pairs to sounds, although this isn't as clear.

If your idea wasn't necessarily to get at natural, everyday sentence processing, but just to get at some task involving PFs and LFs and nothing else, then I agree that we can look at how these functions are computed, but it seems to have the same "meta-linguistic" character as the task of assigning 0 or 1 to (PF,LF) pairs.

Asad Sayeed said...
This comment has been removed by the author.
Asad Sayeed said...

Tim#1: Well, yes and no. It's not merely the relation, but the algorithm that computes the relation. The relations represented by the two, well, representations, may be the same. But that's just because they propose to describe the same output strings. We do not want just any algorithm that describes the output strings, we want one that *better* satisfies the external criteria such as they may be.

The algorithm that computes the relation (PF,LF)->[0,1], ie, takes in a (PF,LF) pair and spits out a result, is merely a verifier that you can plug into an enumerator, to enumerate alphabetically all (PF,LF) pairs.

That's not what you want. Well, at least it's not what I want. It doesn't tell us that the relation is computable in polynomial time. It doesn't force us to commit to a polynomial-time algorithm.

It could be that there is no such thing. But that means that we have to look for the Secret Special Sauce, exponential time seems unreasonable.

It could also be that I am a dreadful pedant and am seeing distinctions that don't exist.

Both these things are possible, and perhaps the latter is even likely. But still.

Alex Drummond said...

Regarding So-One's comment, the term "generate" as used by Chomsky is completely neutral with regard to the question of inputs and outputs. In Aspects he says (something like) "To say that a grammar generates a particular sentence is to say nothing more than that the grammar assigns a structural description to that sentence." You can just as well define a generative grammar by a recognition procedure (e.g. by specifying a particular automaton) as you can by defining a procedure for enumerating grammatical sentences. At the level of abstraction of a generative grammar, there's no real difference.

brian d said...

Tim Hunter says: "I sometimes wonder whether there's any reason to believe that the study of how we process language needs to be more tightly tied to the study of our declarative knowledge of language, than it does to the study of how we process other things."

Unless I'm misunderstanding what you're saying (entirely possible, please correct if so!), I think it's safe to say that a good deal of folk who work on language processing do hold this as the null hypothesis: that there is very little Language in language processing, and that a more thorough understanding of the cognitive systems in which it's embedded would eventually provide an explanation for almost all language processing phenomena. I am a little curious as to what the 'Eureka grammar' from our brilliant syntactician you allude to could look like, though. Is it the kind of thing where this spot-on grammar just ends up describing the constraints on the (pf,lf) pairs as the net result of being passed along the processing stream? Or does the concept of a constraint on (pf,lf) pairs retain a privileged status as cognitive objects that are invoked at some point?

Depending on the answer to that question, I don't think it is fair to say it is a fruitless task to ask neural questions (not necessarily language-related) without a full level-one understanding of the system. It's not at all unrealistic to imagine pursuing analyses at both levels, using information about level-three analyses to 'prune away' bits of the level-one analysis that turn out to be epiphenomena of lower level systems.

So-One said...

In response to Alex, we then need to think about what it means to say that language is infinitely creative when all it is is a recognition procedure. Is a procedure infinitely creative when it can recognize products of a finite system that can generate those infinitely creative products? And although Chomsky wrote that in Aspects, when looking at models of grammar from PSR to now, aren’t we looking at productive rules? I see a disparity between the claim that grammar could be a recognition process to how it is actually modeled.