The Computations of Language: August 29 class -- reactions after class

Wednesday, August 29, 2007

August 29 class -- reactions after class

Well, I was really pleased with how things went! People had very interesting things to say, and I felt as if the mix (computational, linguistic, etc.) was a good one.

Here are the pointers to the papers from today:

Marr, David. Vision. W.H. Freeman, 1982
http://web.archive.org/web/20051227154554/http://www.psych.upenn.edu/backuslab/psyc111/Readings/Marr_Chapter1.pdf

Kosslyn, S. M., and Maljkovic, V. (1990). Marr's metatheory revisited. Concepts in Neuroscience, 1, 239-251.
http://www.wjh.harvard.edu/~kwn/Kosslyn_pdfs/1990Kosslyn_ConceptsInNeurosci1_MarrMetatheory.pdf

Folks should comment on this posting in order to create their after-class reaction pieces.

For next class (and the one after), these readings are worth looking at:

"Turing Machine", Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/turing-machine/

Big-O Notation, http://en.wikipedia.org/wiki/Big_O_notation

Optionally,

"Church-Turing Thesis", Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/church-turing/

"Computational Theory of Mind", Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/computational-mind/

There will soon be a posting of a pointer to a PDF file for Howard's readings.

27 comments:

Anonymous said...: I can't say how much I enjoyed the measures used to keep the class discussion on-track. I've been to courses where discussions simply got out-of-hand and the lesson plan, one probably in place for a good reason, was barely touched. Discussion isn't a bad thing, oh no, but when it's too tangential, fewer people probably benefit.; August 29, 2007 at 9:19 PM
Anonymous said...: Another thing: it was mentioned today in class that the function of language is to be learned. Do I have that right? If so, we can apply the old biological adage that 'function follows form' (my BIO-118 prof. would be proud), and ask if there are parts of grammatical theory that are extraneous to successful acquisition. One thing that comes to my mind are the so-called phases of current syntactic theory. Whether or not you have multiple spell-out has nothing to do with language acquisition, as far as I can see. The P&P model hummed along nicely with the simple 'Y' diagram. But, of course, this just brings to question whether or not anyone can actually disprove something like phrases, which I see as capricious more than anything. Syntactic theory, or even most of linguistic theory, is esoteric and self-sustaining. Ideas are proposed almost too easily, and the theoretical architecture can be rewritten in a seeming snap. This troubles me. I suppose notions of set-theory and mathematics can be brought to bear on linguistic theory, serving as a check on any proposal too outlandish, but math has nothing to say about phases, does it? Now, something like copies or maybe movement I think I could buy, if only because of that evidence from some South African dialect. But I'm sure there are non-movement theories out there (HPSG maybe?) that could account for the same data.; August 29, 2007 at 9:40 PM
Tim Hunter said...: The main thought I had after Wednesday's class was that, as Howard mentioned towards the end, I'm not sure how much the "why" question really comes into play when considering Marr's level 1. I usually think of level 1 as specifying purely a function: a set of (input,output) pairs, independent of any particular encoding, and certainly independent of purpose.

The question of "why" language evolved is, of course, a big one, but I usually think of it as a separate one.

An interesting thing to note about the "set of (input,output) pairs" conception, though, is that I don't think the language faculty can really be described as computing a function, it just specifies a (many-to-many) relation between sounds and meanings. At least, if we're interested in characterising the grammar; the processor might be more suitably described as a function, but the inputs to this function will include all sorts of extralinguistic stuff.; August 31, 2007 at 3:19 PM
Tim Hawes said...: Adding to what Tim said, I thought it was slightly odd after reading Marr that the “why” was a main point for the “computational theory”. I’d say it was helpful in thinking clearly about the problem of the adding machine/cash register, but it’s not a critical question for forming a “computational theory”; adding machines would presumably still add even if we couldn’t come up with a reason for it . I say this is "odd" because as we start moving up into higher level systems and the “why” is too obscure or too complex to be a useful tool for clear thinking.

I think this is related to the Kosslyn and Maljkovic point that “there is no clear distinction between Marr’s level of the computation and level of the algorithm”. So, you end up with a moving target, as whats/whys become hows, that is harder and harder to hit the further away you get from primitive operations.

Also, as Tim suggests, the what question is not a trivial one. I do worry, though, that saying it’s a sound-meaning mapping could run into trouble. I know we are probably just dealing with a handy term (sound) that isn’t quite meant literally, but it’s worth mentioning that the fact that sound is used for language is probably just a result of how convenient it is and that this isn’t necessarily the level of input its actually operating on. I think it’s safe to say that language is more likely a mapping from (probably multiple) subsystem output representations to its “meaning” output representation.; August 31, 2007 at 10:46 PM
brian d said...: I guess I don't really see the force of some of the arguments by Kosslyn and Maljkovic. In particular, I don't see how their argument against the independence of the levels really goes through. Presumably this problem (the fact that 'computational' level bits show up in algorithmic bits, and so on) is all over in the description of any complex system. Neurons can be described computationally and algorithmically just as the visual system can, but this shouldn't necessarily affect their viability as implementational primitives in higher order systems. I took Marr's levels to be tools for analyzing a single system, and if this system builds upon the computational properties of other systems in its algorithmic level, well, then that could be possibly interesting. But it doesn't change the fact that you can examine the levels independently within a single system . I also fail to see how this distinction falls apart in systems that conflate representation and process (aren't they both part of the algorithmic description?). None of these arguments impact the fact that one can in principle characterize the 'what' of a system performs without reference to 'how' or 'with what.' The point that in practice, information particular to each level may directly impact the others, is well taken; but I don't think this dismantles neither the notion that these levels form a hierarchy , nor the notion that each constitutes a useful construct in its own right.

Lastly, the Tims' commentary brings a question to mind: if it is not the case that the faculty of language is a function in the sense of specifying (input,output) pairs, does it still make sense to talk about it in the context of Marr's levels? Does it make sense to talk about the computational 'what' or the algorithmic 'how' of a relation? My hunch is that the answer is no... I suspect that if the answer were yes, there wouldn't be too relevant of a difference between functions and relations. Thoughts?; September 2, 2007 at 4:35 PM
Tim Hunter said...: Errr, is anyone else having trouble accessing the Marr chapter using the web archive URL?; September 3, 2007 at 8:45 PM
So-One said...: It was an important step when those in vision research realized that the problem of understanding vision – and extending that to engineering machine vision - is difficult. The structure of a camera seems simple enough; light (electromagnetic waves, pure energy) enters through a hole (aperture), becomes refracted through a lens (a process that turns the image upside down aside from focusing the light), and then changes the chemical properties of some substance, and these patterns of the chemical change match the pattern of light that was reflected from the object itself. What happens at the level of our eye is very similar; instead of film, however, we have the retina, which is made up of various nerve cells (rods and cones). We know that our visual information isn’t garbage (i.e., unreal) because it corresponds to other types of information, such as somatosensory input. If the camera is manipulated properly, what we see of the object matches what we see of the picture of the object. However, understanding the structure of this cool toy and the physics behind what makes Kodak moments possible will not explain to us how we recognize the final step – that what we see of the object matches what we see of the picture of the object. This recognition happens at the mental level.

Is the physical phenomenon behind vision any more straightforward than what happens when we process language (a comment that was raised in class)? Sound and light are both forms of energy that move through space as waves. The invention of a tape recorder that records speech appears to be at the same level of analogy as the camera is to vision. We can probably assume that animals that have similar eyes and angles between the eyes see the way we do; they don’t bump into objects or fall over cliffs. However, we know that animals that have ears don’t hear language the way we do.

If the problem of vision is to understanding the mapping between light and our representations of a visual system, the problem of language is taken to be the understanding of the mapping between sound and our representations of a conceptual system (meaning) (in linguistics, “LF” or logical form). Some computation takes the input and spits out the output. This computation provides the relation between the two.

What also happens in language is that sometimes the output is the input and the input is the output (comprehension vs. production). I wonder if it is naive to assume that this correlate does not exist in vision. I don’t know enough about mental imagery without the stimulus of external light to understand what visual production vs. perception involves.

When the input is sound (abstracting to the level of syntax – the word order) and output is LF (a hierarchical structure), the type of operation that is required is one that can expand something of precedence relations to something of a higher order of dimension that also carries dominance relations. When the input is LF and output is sound, the operation is just the reverse, one that compresses down dimensions. I know that functions for such operations must be distinct in linear algebra, and I wonder whether this means that these mappings are different at the computational level or same at the computational level but different at the algorithmic level. Without having a firm understanding of the difference between these two levels, this question is difficult for me.
Once this becomes more clear to me, I’d like to know, Are computations, in theory, supposed to be neutral to different algorithms and different modes of implementation? In reality, do interface constraints bias certain algorithms over others? Can the computational level be abstracted away from these constraints? In the “optimality” section of Kosslyn & Maljkovic, algorithm and computation seemed to be used interchangeably.; September 3, 2007 at 9:29 PM
Anonymous said...: I actually enjoyed the Marr reading; I thought it was well-written and carried sagacious introspections. I also agree with previous posts here -- I believed the Kosslyn and Maljkovic paper amounted to trivial critism at various points, e.g. the digressions regarding how optimality is relative and perhaps resistant to uniform metrics, and here I side with Marr. He used 'optimality' to equate with 'functioning correctly' in a fashion I thought transcended 4 nitpicky points. But maybe I just feel bad for the guy for succumbing to a terminal illness at such a young an age.

I'm also wondering if Marr really intended the three levels of information processing to stand as independently as Kosslyn and Maljkovic insisted. After all, as K and M point out, in the space of about 3 pages (20-23), Marr claims that represenation affects our comptuational theory as well as the algorithmic process, e.g. the woes of multiplication with Roman numerals. Surely he must have been aware of this...?

Finally, I thought the conclusion of K. and M. provided some interesting quotables. I particularly liked, "to the extent that such reasoning [of the nature of the brain] rests on assumptions about optimality, reason alone is not sufficient" (250). Isn't that what linguistic theorizing is in the habit of doing, claiming that such and such a process would be elegant in its optimality, or optimal in its elegance, just to use two buzzwords that are tossed about a good deal. And, furthermore, doesn't theoretical linguistics work by purely so-called insufficient reasoning alone? How often do you think about neural hardware in a semantics class? But maybe this is all irrelevant if we simply claim that semantics tells us absolutely nothing about the brain.

Here's another nice quote, from the same paragraph as the last: "computational analyses are always embedded in many assumptions about the underlying mechanism and representational systems, so these analyses cannot be taken to prove anything...[they] must be put to empirical test." Is all of linguistic theorizing such at this first, most abstract level? If so, is it just fruitless mental gymnastics, as M. and K. would seem to claim? Are probing starred sentences enough of an empirical test to validate such theorizing? Or would anything lending credence or confirmation to this pursuit have to involve some understanding of neural hardware and processing heuristics? I have my guesses to these answers, but I'd like to see what the general consensus is.; September 3, 2007 at 10:54 PM
Chris said...: I'm having trouble accessing the Marr as well-- can someone who has it downloaded it post a copy to the PDF locker or email it to me/the group?; September 4, 2007 at 8:10 AM
Tim Hawes said...: I have a downloaded copy of the Marr Paper. Anyone else aside from Tim or Chris need it?; September 4, 2007 at 9:40 AM
Chris said...: Thanks Tim--I've posted the Marr paper to the COL folder.; September 4, 2007 at 9:46 AM
DesiLinguist said...: To be honest, I am having trouble with the "why" part of Marr's level of computational theory. I think that it's much less confounding to think of the "why" question as a constraint satisfaction problem. I concede that this perhaps stems from the bias accumulating from all those years of numerical analysis and optimization classes but I find it much less elegant to think of the "why" question as one that must be answered subjectively. To me, it suffices to think of the computational theory as describing the problem the information processing task is trying to solve SUBJECT to such and such constraints. If I convince myself to think along these lines, I actually find Marr's theorized demarcation quite nice: here's the nature of the problem that we need to solve, here's a step-by-step description of how to solve it, and here's how you would build a physical system to do the same thing. In a sense, this is similar to TimH and Howard's claim that the "why" question seems to be orthogonal.

As for K&M, I want to talk about a slightly different topic, which I found interesting, rather than their take on Marr which everyone else has already analyzed quite well. This is related to the later part of the second paragraph on page 241. That's the part where they say that it's important for a theory to be accepted for other reasons than just the fact that it seems to account for a large range of phenomena. Unless I am mistaken in the way I am interpreting this, this viewpoint pretty much contrasts with the "eneralizability stance taken in machine learning and even NLP. For many, if not all, of those tasks, the available data is split into training data and test data and an algorithm instance ("theory") is trained ("formulated"") using the training data. It is then evaluated ("asked to account for") the phenomena in the test data and if it does so for a large majority of such phenomena, then the algorithm is considered reasonably sound. Of course, if there is overlap between the training and test data ("the finding is similar to the one used to formulate the theory in the first place"), then such a claim cannot be made. However, I am not aware of people actively rejecting hypotheses and theories just because there are no "other reasons" to accept them besides generalizability. I am not saying that it's a bad idea to do that - on the contrary, I am just saying that it doesn't seem to be done wuth that much vigor in the community.; September 4, 2007 at 1:09 PM
me said...: This isn’t at all related to the other comments, but it grabbed my attention. I was interested in the point made by K&M (p. 243-44) about optimality in the brain. It seems that in theoretical linguistics we are interested in finding the most economical or elegant explanation of, for example, a derivation or a phonemic representation. I think K&M make a good point in that the brain is not evolving towards some optimal state and has instead been slowly adapting based on what it is needed for. This makes me wonder where the assumption comes from that an ideal solution or explanation should be economical or elegant. I can see several reasons why the elegant/economical way of thinking is attractive, for one thing it gives us a straightforward way of deciding between two solutions (choose the more economical one) and secondly it guides us in spelling out theories (make them elegant), but I am not sure that we can assume that or even suspect that grammars are actually built or represented in this way. We may want them to be neat systems that can be concisely described and are mathematical in their precision, in this way they may be easier to conceptualize and (maybe) understand, but based on the way they may have evolved, there doesn’t seem to be any a priori reason to expect that they work this way. Additionally, there seems to be evidence that, at least in some ways, languages do not work this way. Morphological systems don’t appear to be either completely productive systems of combining individual morphemes (which would be economical in terms of what might be stored in the lexicon, and elegant in the small number of rules needed to combine them) or systems where all words are stored (uneconomical in that so much must be stored, an unelegant in that the various patterns are ignored), but instead some hybrid system where both morphemes and words (at least frequent ones) are stored (not very economical if they are decomposable). With evidence that the brain is not geared towards optimality, and this sort of linguistic evidence, why do we still look for economy and elegance in linguistic theory?; September 4, 2007 at 3:29 PM
Tim Hunter said...: This comment has been removed by the author.; September 4, 2007 at 4:41 PM
Tim Hunter said...: Brian writes: "if it is not the case that the faculty of language is a function in the sense of specifying (input,output) pairs, does it still make sense to talk about it in the context of Marr's levels?"

I also suspect that the answer is no. For Marr, vision is a process carried out by a particular device, and being a process carried out by a particular device is what makes it eligible for the three-level analysis. I can't really think of any process which I would call "language", so I don't think anything called "language" is eligible for the three-level analysis.

I think the object we're interested in is that thing which we humans have which other animals don't. Let's call it the language faculty. It probably takes the form of a bunch of facts about trees, transformations, traces and stuff, or whatever notational variants you prefer. Some processes which the brain carries out may use the language faculty (this pile of facts) to get their jobs done.

These processes will be characterised by functions (sets of input/output pairs), and thus are eligible for the three-level analysis. But the inputs will include plenty of non-linguistic stuff, so these processes won't be completely understood even if we know everything there is to know about the language faculty.; September 4, 2007 at 4:45 PM
Tim Hunter said...: Annie writes: "[If] the brain is not evolving towards some optimal state and has instead been slowly adapting based on what it is needed for ... why do we still look for economy and elegance in linguistic theory?"

I think there are two ways in which a linguistic theory can be simple or elegant. One is for it to propose that the underlying difference between an animal which has language (a human) and an animal which doesn't is small. The search for this kind of elegance could conceivably be jeopardised by Annie's/K&M's point. (But maybe the separate question remains of why, if the differences between humans and all other animals are many, we find no animals "in between".) The other way a theory can be elegant is to propose that the underlying difference between a speaker of language A and a speaker of language B is small. I think the search for this kind of elegance is still completely justified by poverty-of-stimulus arguments.; September 4, 2007 at 4:58 PM
Tim Hawes said...: So-one said: "The invention of a tape recorder that records speech appears to be at the same level of analogy as the camera is to vision."

Are you saying that a tape recorder is to language as a camera is to vision? If so, I agree and disagree. To disagree, I'd say that the tape recorder is to audition as a camera is to vision. But thats on a superficial level. There are presumably ways that humans see things that animals don't (I'd guess that Animals probably won't see a dragon in the clouds or Elvis in a potato chip any time soon) just as animals "don’t hear language the way we do". So, perhaps a tape recorder is to language in a similar way as a camera is to human vision.

In response to Tim and Annie:

I also think there is something to be said for boiling down to simple and elegant solutions as a tool for theorizing. (Just as Marr’s three levels are a tool for theorizing.) By starting with the simplest solution possible, you can escape the potential trap of creating something at the foundation that you will eventually need to get rid of. It’s often far easier to add something that proves to be necessary than it is to remove something that proves to be unnecessary. So, even if your elegant solution is wrong, it could help you get to the correct less elegant solution.

Finally, very briefly in response to the CTM reading. Though not the first time I’ve heard of Turing Machines, it was the first I heard of Searle’s “Chinese Room”. This was an appealing argument, since I think intuitively we don’t want to say the person in the “Chinese Room” understands Chinese. I was also a bit surprised that the replies all seem to place understanding as beyond the understander (the system argument in the obvious way, and the robot argument by suggesting that environmental interaction is required). Another response that comes to mind, that I didn’t see, was that if the rule book simply maps input sentences to output sentences, and the Chinese Room is intended to pass a Turing Test, the rule book needs to be infinite. I’d also ask, if this idea that seeming fluency in language isn’t necessarily understanding of language is one that we will be adopting. Put another way, is simply being able to do language (in the opinion of an outside observer) language or is the understanding (meaning) something more integral?; September 4, 2007 at 5:20 PM
brian d said...: Re: Tim [Re: Tim [Re: Annie]]:

I agree with Tim here: elegance is important within theory-building, and if as Annie (rightly, I think) suggests, it doesn't map so neatly onto the brain's state of affairs, I don't think we'll have theorized in vain.

To take the morphology example, Annie rightly points out that in practice it seems to be a hybrid of sorts that is a bit rough around the edges. But since the composition of morphologically complex words happens at least some of the time, it's worth building a theory of just that; it's at least plausible that there could prove to be interesting underlying principles that aren't exhibited elsewhere in cognition. And it's worth making any account of this elegant (read: simple) because, as Tim and Tim point out respectively, 1) there are a number of non-linguistic inputs to our hypothesized morphology function (e.g. we might expect the brain to love to store things left and right, a fact which will impact how and when morphological combinatorics are deployed) and 2) it's likely more profitable to start from a small nugget of theory and see how much of the remaining behavior can be apportioned out to these other inputs to your function, rather than spend your time paring down overly elaborate theories.; September 4, 2007 at 7:00 PM
me said...: Re: Tim, Tim, Brian

I agree with Tim Hawes and Brian that a notion of elegance/simplicity is useful in building a theory, at least as giving you a place to start, but what I think I have a problem with is the idea that language is underlyingly elegant and we just need to find the way to scrape away the messy parts to reveal the simple system underneath. While there is certainly a great deal of systematicity in language, and many parts of language do seem to be accountable for by very simple mechanisms, I am not convinced that this is all we are looking at and that the end state we are working towards in building a theory of language will necessarily be elegant. I think that as linguists we should be careful that we don't mistakenly expect to (always) find a simple/elegant solution.

In response to Tim Hunter's question about why we find no animals 'in between' (having language and not having language), I don't think its immediately clear that there aren't any. Presumably whatever gives humans language is the possession of a combination of cognitive processes - some domain general that probably are shared with other animals, and some domain specific that may not be, as well as the ability to use these processes in parallel. So there may be animals that are in between having language and not having it, in that they have some subset of the cognitive processes necessary for language, but this subset of processes wouldn't necessarily be evident (manifesting itself as some half language or anything) - because the animals wouldn't have the full set and/or wouldn't be able to put all of the processes together. So there may not be just a small difference between humans (with language) and animals (without). As for the argument that an elegant theory can account for the similarities between speakers of two different languages, I think that this reflects some elegance/simplicity in language but (once again) doesn't necessarily mean that the entire theory should be assumed to be simple. (I know that this last point needs more development - I haven't fully thought it through yet); September 4, 2007 at 9:14 PM
Chris said...: I was also fascinated by K&M's usage of "optimality" as a tool to critique Marr. First, it struck me that he actually makes no claims about expecting to find optimality in nature (cf. p. 19, col. 2), but only that given a rigorous formal theory of a computation, one can analyze its algorithmic realization for correctness and optimality. If as cognitive scientists, we are seeking to understand the algorithms underlying some behavior from a high level, a level of a theory of computation, [i.e., as linguists usually do, but neuroscientists usually don’t], then with a concept of algorithmic optimality, we have an a priori reason to expect a certain implementation. If we find that the algorithms used are in fact optimal according to some computational cost metric (but which one?), then we have learned something. Without Marr's distinction, it is not even clear to me how algorithmic optimality could be formulated, and this was the essential point that K&M missed/misrepresented. As a side note--K&M emphasize their misunderstanding of Marr by claiming that optimality is complex. Fitness functions are quite complex, optimality is not complex.

To return to the morphology example, I'll try to make this misunderstanding somewhat clear. It seems that there are in fact several abstract computations related to morphological system. Considered individually, they predict systems that are clearly different from what exists instantiated in the human head, but by considering them together under an assumption of a general tendency to find an optimal solution, one arrives at a prediction for the behavior of the lexicon that is perhaps similar to what we see. Morphology describes the structure of the lexicon and we know it must at least fulfill the following two functions: retrieval of syntactic/semantic information given a phonological key and the reverse lookup. In the implementation of retrieval systems such as these, there is a general tradeoff between space and computational cost. Using compression (morphological decomposition is a form of compression), one can save space but one pays with higher retrieval costs (these are due to increased operations associated with morpho-phonological concatenation/parsing on one side and lexical semantic operations on the other). By storing the elements with less compression (ie, as full forms), retrieval costs go down but the space demands increase. When studying morphology in isolation (at the level of a theory of its computation), we can imagine lexica that are optimized only for space or for speed, and by looking at behavioral data, we can start to determine how the system is actually organized. This system may not be optimal in any pure sense, but the formalization at least allows us to understand these tradeoffs.; September 4, 2007 at 9:38 PM
Rebecca said...: Re: Chris, Annie
Chris says: "Using compression (morphological decomposition is a form of compression), one can save space but one pays with higher retrieval costs (these are due to increased operations associated with morpho-phonological concatenation/parsing on one side and lexical semantic operations on the other). By storing the elements with less compression (ie, as full forms), retrieval costs go down but the space demands increase."

In that case it makes sense that irregular (presumably uncompressed) forms are usually among the most frequent. You can save time on a few frequent words (where you gain a lot of time for relatively little space) and save space with the rest.; September 4, 2007 at 10:12 PM
Philip said...: This is a great discussion -- thank you, everybody! I don't know about you, but I'm totally jazzed about the blogging-comments format.

Let me add my take on things to the mix. Well, actually, just the beginnings of my take; this was written before most of the commenting took place, and there's a whole lot more to react to in the discussions.

It seemed to me that we might have gotten a little bit distracted by the fact that "why" can mean a lot of things, including the notion of "purpose". Setting aside that particular word, let's focus on what Marr's saying about analysis at the computational level. In his introduction, he emphasizes that the brain is not just a computational device, but "a computer which is in the habit of performing some rather particular computations", and emphasizes that these "particular computations" are part of an information processing task. So if you're going to analyze a system at the computational level, you have to ask what is the information processing task?.

It's clear from Marr -- and, as you'll see, from the theory of abstract computational devices -- that when computationalists ask this question we're talking about inputs, outputs, and transformations from the former to the latter. So if we're going to characterize a phenomenon like language, vision, or Walmart purchases as an as an abstract computational system, we have to decide what the information processing task is; that is, we need to decide what aspects of the real-world phenomena to treat as input, output, and the relevant transformation.

In the cash register example, Marr shows how the computational-level characterization of a (simplistic) cash register relates to real-world events in the system he's trying to characterize. There are two kinds of events, buying and returning, and each one involves inputs (the prices of the items in the event) and output (the cost). [Side note: I believe that the distinction between functions and relations is a red herring, since any relation R(x1...xN) is equivalent to an indicator function whose value is 1 iff x1...xN is an element of R. E.g. as a function, + transforms input <2,7> to output 9. As a relation, + transforms input <2,7,9> to output 1 and <2,7,Y> to 0 for all Y note equal to 9. We should make sure to discuss this when Howard talks about set theory this week.] Not only does the computational characterization relate to the real-world events, it is constrained by them: if you were to observe transactions at this simplified cash register, and characterize the observed system just in terms of its input and output behavior, your observations of (event-type,price(s),total), e.g. (purchase,{$1,$4},$5) or (return,($5,$4),$1) would lead you to his constraints 1-4 and addition would turn out to be the uniquely suitable abstract operation whose information processing behavior corresponds to observations of the real-world system. [Note to mathetmatical purists: I'd be using angle brackets for the above tuples, but the blogging site won't let me and I don't feel like typing in SGML entities.]

So if you're going to think about "language" as a system from the point of view of Marr's computational level of description, one of the the first things you have to do is decide what the relevant information processing task is, in relation to the real world phenomenon you're trying to characterize, which means deciding what inputs, outputs, and transformation/relation you're trying to characterize. One possibility would be to take Ilhan's angle (as I interpreted it, anyway) and say that the information processing task takes a string of words as input and returns output 1 if and only if the sentence is a member of some set of strings. Or, you can adopt Jeff's variant (as I interpreted it), which says that the input is pairs of (string-of-words,meaning) and the output is 1 if and only if the string of words can have the given meaning. [If this strikes you as an odd way to express things, then Howard's review on Wednesday will be especially relevant!]

As we will see, though, there's another layer of information processing that's absent from either of these characterizations. Chomsky suggests that in the space of possible solutions to those information processing tasks, whatever abstract computational device we propose must be consistent with the real-world fact that children learn language. A lot of specifics hinge on how you interpret learn and language, of course, but the point is that this additional real-world fact places additional constraints on whatever computation-level answers we would give based on any of the information processing tasks above. A computational-level characterization of language, even if it could be argued to be a perfect solution to the above information processing tasks, could not be a completely adequate characterization of language if it were inconsistent with the facts of human language learning.

To the extent that I haven't misspoken (and if I have, please correct me!), I believe all of this is just a computationally-angled expression of basic Chomskian principles circa Aspects. That said, here are a couple of questions that come to my mind.

- First, why should the real-world facts of human language learning have a privileged role in constraining the computational theory, as compared to the real-world facts of human language use?

- Second, and related, when we choose the information-processing inputs, outputs, and transformations/relations to focus on, what are the real-world events (analogous to the cash register transactions) in which we are choosing to ground our theory? (Without such a grounding in real-world events, I would argue that we are engaged in some kind of mathematics, not science.)

Don't let the already fascinating conversation get sidetracked by these questions; they're just something I have going on in the background as the discussion continues.; September 5, 2007 at 9:05 AM
So-One said...: I have a comment about the theorizing (since it’s called a metatheory after all) ... From Marr’s outline, I take it that he was a top-down thinker. Such a hierarchy was implied in the three levels - and explicit in K&M. The bottom-up thinkers show us that there are other ways to do science. Is there something about studying higher cognitive functions that could benefit more from this top-down thinking, however? Or, if we take Annie’s comments, would this cause us to miss important – albeit messy – details?

This discussion is about the task of information processing. It’s very difficult to understand the task to be performed when we are uncertain about the information to be processed. Audition is not the only medium for language input, as we know from sign languages. Vision is not the only medium for spatial perception, as tactile and auditory information can also provide us important cues (as evidenced by the blind). What’s so fascinating is that some higher-level process somehow gives you similar if not same representations regardless of the form of the input.; September 5, 2007 at 12:08 PM
Anonymous said...: Wow...I've been busy LaTeXing and commuting to Georgetown, so I haven't checked in since Monday night! A few comments....

I'm interested in Morphology, but less so in theories related to its cognitive implementation, but I guess that's why I don't consider myself a cognitive scientist. But if we do want to discuss formats of lexical storage, I can say that believe I can agree with Rebecca -- there has to be some hybridization involved. Simply because some true roots aren't actually real words, which we might otherwise expect if roots proper were stored in the so-called Lexicon. Take 'anxi', which can be well said to derive both 'anxiety' (where 'eity' is probably a allomroph to 'ity', seen on the ends of such words as 'reality') and 'axious' (where we have the suffix 'ous', seen in complex goodies like 'monstrous' or disastrous').

On the other hand, I've written a morphological inducer and root detector, with the intent to aid automatic dictionary referrencing, and it's clear that real lexicons also take a hybrdization acount. We generally find items demonstating derivational morphology as head words (things like 'anxious' and 'anxiety') (but not something like 'talker'), but not similar terms demonstrating inflectional filigree, as in 'talks', or 'talking'. This could be due to the slight change in meaning offered by derivational affixation. Whether or not the human brain applies the same organizational adages, I can't say. If we want to answer this question by invoking something like 'elegance' or 'efficiency', I'd agree with Annie as well as Kosslyn and Malijkovic and claim that we're simply out of luck. Appeals of that sort have been used for seeming years, and it's always, to me at least, smacked of sheer selfishness, allowing the presumption that we've proved some artifact of behavior without anything in the way of supporting proof, appealing instead to a whim of "oh, well it just makes a beautiful sort of sense". Maybe this is why no one linguistic theory (that I've been exposed to) (expect maybe Skinner's) can really be proven or disproven yet. They all seem to simply co-exist.; September 7, 2007 at 1:30 PM
Anonymous said...: So I've just read the Turing Machine article. I've got a question about the halting machine defined near the end of the article. Take this passage:

This composed machine, call it M, halts if the machine with the input code n does not halt on an initial tape containing n (because if machine n does not halt on n, the halting machine will leave TRUE on the tape, and M will then go into its infinite sequence.) and vice versa.

Is there a typo here? It seems to say first that M halts if n does not halt, and then in the parenthetical say that M will not halt if n halts. I didn't think this was the ad absurdum part of the reducutio ad absurdum, that that part was reserved for the following paragraph about running M's code on M rather than n's code.

Any thoughts?; September 7, 2007 at 5:03 PM
Anonymous said...: So now I've read the Church-Turing article, and I found reference to something that's been nagging me about so-called computational modeling of human cognition -- the mind existed long before a modern computer. The computer as we know it could have taken different forms, organized along different architectures, languages, operations, and so on. So why do we try to delineate parallels to familiar technology in an effort to replicate what the brain does, like so-called Phonological Learner I once read about? We could be disastrously wrong in that, and any such enterprise, to me, just seems misdirected. Technology may be able to approximate human cognition in using operations completely distinct from neural analogues, but replicating cognition is a completely different story.

Maybe I've misinterpreted what some people are attempting, I'm not sure. Maybe neuroscience should lead the way in this respect, and technology could simply catch up at some later date.; September 7, 2007 at 5:56 PM
Anonymous said...: Ooops. I just read a typo of my own. In my Turing machine question, what I think the paragraph cited actually says is 'M will halt if n does not halt' and 'M will not halt if n does not halt'. Is that what the cited paragraph should actually be saying?; September 7, 2007 at 6:01 PM

The Computations of Language

Wednesday, August 29, 2007

August 29 class -- reactions after class

27 comments:

Blog Archive

Contributors